## Hyper Parameter Tuning Activity ##
---
Hyperparameters are like the knobs and switches of a machine learning model. They control how the model learns and makes predictions. Just as a musical instrument needs its strings tuned for the best sound, hyperparameters need to be tuned for the best model performance.

When we train a machine learning model, we're essentially teaching it to make accurate predictions based on data. However, the learning process involves making choices about how quickly or slowly the model adapts to the data, how complex it can become, and other important factors. These choices are set by hyperparameters.

### 1. Setting up the Environment
---

Firstly, set up a Python environment:

In [1]:
%%capture
#pip install scikit-learn numpy

### 2. Download and Prepare the Dataset
---
For this activity, we'll use a dataset available online called 20 newsgroups dataset. For simplicity, we'll be using bag of words representation. Although students can also integrate TF-idf

In [2]:
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import SVC
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# Fetch a subset of the 20 newsgroups dataset
newsgroups_data = fetch_20newsgroups(subset='all', categories=['comp.graphics', 'sci.med', 'rec.autos'], remove=('headers', 'footers', 'quotes'))

# Convert text to bag-of-words representation
vectorizer = CountVectorizer(stop_words='english', max_features=2000)
X = vectorizer.fit_transform(newsgroups_data.data).toarray()

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, newsgroups_data.target, test_size=0.3, random_state=42)


### 3. SVM Classifier
Setup SVM and calculate the accuracy

In [3]:
# Create an SVC model with custom hyperparameters (students can modify these)
clf = SVC()

# Train the model
clf.fit(X_train, y_train)

# Predict on the test set
y_pred = clf.predict(X_test)

# Calculate the accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")

Accuracy: 0.7720090293453724


### Challenge: ###
---
Students can experiment with hyperparameters of the SVC function. Some of the parameters to play around with include:

#### C:  Regularization parameter.

#### kernel: Specifies the kernel type ('linear', 'poly', 'rbf', etc.).

#### degree: Degree for polynomial kernel. Ignored by all other kernels.

#### gamma: [scale, auto, float value]

#### coef0: Independent term in kernel function. Only significant in 'poly' and 'sigmoid'.

In [4]:
#Sample SVM code where I set a few hyperparameter
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Setting hyperparameters
clf = SVC(C=1.0, kernel='linear', degree=1, gamma=5.2, coef0=45.0)

clf.fit(X_train, y_train)

# Get predictions and accuracy
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

Accuracy: 0.8340857787810384
