<a href="https://colab.research.google.com/github/moizarsalan/Artificial-Neural-Network/blob/main/Lab_01.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Load the Dataset**

# **Explanation:**

* We import the pandas library, which is used for data manipulation and analysis.
* We load the dataset using pd.read_csv(), specifying the encoding as 'latin-1' to handle special characters.
* We display the first few rows of the dataset using df.head() to understand its structure.

In [None]:
import pandas as pd

# Load the dataset
df = pd.read_csv('spam.csv', encoding='latin-1')

# Display the first few rows of the dataset
df.head()

Unnamed: 0,class,message,Unnamed: 2,Unnamed: 3,Unnamed: 4
0,ham,"Go until jurong point, crazy.. Available only ...",,,
1,ham,Ok lar... Joking wif u oni...,,,
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...,,,
3,ham,U dun say so early hor... U c already then say...,,,
4,ham,"Nah I don't think he goes to usf, he lives aro...",,,


# **Drop the unneccesary Columns**

# **Explanation:**

* We drop the columns 'Unnamed: 2', 'Unnamed: 3', and 'Unnamed: 4' as they are not needed for our analysis.
* We rename the remaining columns to 'label' and 'message' for better readability.
* We display the first few rows of the cleaned dataset to verify the changes.

In [None]:
# Drop unnecessary columns and rename the remaining ones
df = df.drop(columns=['Unnamed: 2', 'Unnamed: 3', 'Unnamed: 4'])
df.columns = ['label', 'message']

# Display the first few rows of the cleaned dataset
df.head()

Unnamed: 0,label,message
0,ham,"Go until jurong point, crazy.. Available only ..."
1,ham,Ok lar... Joking wif u oni...
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...
3,ham,U dun say so early hor... U c already then say...
4,ham,"Nah I don't think he goes to usf, he lives aro..."


# **Encode Labels**

# **Explanation:**

* We use the map() function to convert the 'ham' and 'spam' labels to binary values (0 and 1, respectively).
* We display the first few rows of the encoded dataset to verify the changes.

In [None]:
# Encode labels (ham=0, spam=1)
df['label'] = df['label'].map({'ham': 0, 'spam': 1})

# Display the first few rows of the encoded dataset
df.head()

Unnamed: 0,label,message
0,0,"Go until jurong point, crazy.. Available only ..."
1,0,Ok lar... Joking wif u oni...
2,1,Free entry in 2 a wkly comp to win FA Cup fina...
3,0,U dun say so early hor... U c already then say...
4,0,"Nah I don't think he goes to usf, he lives aro..."


# **Split the Dataset**

# **Explanation:**

* We import the TfidfVectorizer from sklearn.feature_extraction.text.
* We initialize the TfidfVectorizer with stop_words='english' to remove common English stop words and max_features=3000 to limit the number of features.
* We fit the vectorizer on the training data and transform both the training and testing data into TF-IDF matrices.
* We display the shapes of the TF-IDF matrices to verify the transformation.

In [None]:
from sklearn.model_selection import train_test_split

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df['message'], df['label'], test_size=0.2, random_state=42)

# Display the shapes of the training and testing sets
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((4457,), (1115,), (4457,), (1115,))

# **Vectorize the Text Data**

# **Explanation:**

* We import the TfidfVectorizer from sklearn.feature_extraction.text.
* We initialize the TfidfVectorizer with stop_words='english' to remove common English stop words and max_features=3000 to limit the number of features.
* We fit the vectorizer on the training data and transform both the training and testing data into TF-IDF matrices.
* We display the shapes of the TF-IDF matrices to verify the transformation.

In [None]:
from sklearn.feature_extraction.text import TfidfVectorizer

# Vectorize the text data using TF-IDF
vectorizer = TfidfVectorizer(stop_words='english', max_features=3000)
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

# Display the shape of the TF-IDF matrices
X_train_tfidf.shape, X_test_tfidf.shape

((4457, 3000), (1115, 3000))

# **Train and Evaluate the Model Using Different Kernels**

# **Explanation:**

*We import the SVC class from sklearn.svm and the classification_report and accuracy_score functions from sklearn.metrics.
* We define a list of kernels to evaluate: 'linear', 'rbf', and 'poly'.
* For each kernel, we initialize an SVM model, fit it on the training data, and make predictions on the testing data.
* We calculate the accuracy and generate a classification report for each kernel.
* We store the results in a dictionary and display them.

In [None]:
from sklearn.svm import SVC
from sklearn.metrics import classification_report, accuracy_score

# Train and evaluate SVM models with different kernels
kernels = ['linear', 'rbf', 'poly']
results = {}

for kernel in kernels:
    svm = SVC(kernel=kernel)
    svm.fit(X_train_tfidf, y_train)
    y_pred = svm.predict(X_test_tfidf)

    `accuracy = accuracy_score(y_test, y_pred)
    report = classification_report(y_test, y_pred)

    results[kernel] = {
        'accuracy': accuracy,
        'report': report
    }

# Display the results
results

{'linear': {'accuracy': 0.979372197309417,
  'report': '              precision    recall  f1-score   support\n\n           0       0.98      1.00      0.99       965\n           1       0.97      0.87      0.92       150\n\n    accuracy                           0.98      1115\n   macro avg       0.98      0.93      0.95      1115\nweighted avg       0.98      0.98      0.98      1115\n'},
 'rbf': {'accuracy': 0.9766816143497757,
  'report': '              precision    recall  f1-score   support\n\n           0       0.97      1.00      0.99       965\n           1       0.99      0.83      0.91       150\n\n    accuracy                           0.98      1115\n   macro avg       0.98      0.92      0.95      1115\nweighted avg       0.98      0.98      0.98      1115\n'},
 'poly': {'accuracy': 0.9443946188340807,
  'report': '              precision    recall  f1-score   support\n\n           0       0.94      1.00      0.97       965\n           1       0.98      0.60      0.74    

# **Hyperparameter Tuning**

# **Explanation:**

* We import the GridSearchCV class from sklearn.model_selection.
* We define a parameter grid for hyperparameter tuning, including different values for C and gamma.
* We perform grid search with cross-validation to find the best parameters for the RBF kernel.
* We display the best parameters and the best cross-validation score.
* We evaluate the model on the test set using the best parameters and display the accuracy and classification report.

In [None]:
from sklearn.model_selection import GridSearchCV

# Define the parameter grid for GridSearchCV
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [1, 0.1, 0.01, 0.001]
}

# Perform grid search with cross-validation
grid = GridSearchCV(SVC(kernel='rbf'), param_grid, refit=True, verbose=2)
grid.fit(X_train_tfidf, y_train)

# Display the best parameters and best score
print("Best Parameters: ", grid.best_params_)
print("Best cross-validation score: ", grid.best_score_)

# Evaluate the model on the test set
y_pred = grid.predict(X_test_tfidf)
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")
print(classification_report(y_test, y_pred))

Fitting 5 folds for each of 16 candidates, totalling 80 fits
[CV] END .....................................C=0.1, gamma=1; total time=   1.5s
[CV] END .....................................C=0.1, gamma=1; total time=   1.7s
[CV] END .....................................C=0.1, gamma=1; total time=   0.9s
[CV] END .....................................C=0.1, gamma=1; total time=   0.8s
[CV] END .....................................C=0.1, gamma=1; total time=   0.9s
[CV] END ...................................C=0.1, gamma=0.1; total time=   0.6s
[CV] END ...................................C=0.1, gamma=0.1; total time=   0.6s
[CV] END ...................................C=0.1, gamma=0.1; total time=   0.6s
[CV] END ...................................C=0.1, gamma=0.1; total time=   0.6s
[CV] END ...................................C=0.1, gamma=0.1; total time=   0.6s
[CV] END ..................................C=0.1, gamma=0.01; total time=   0.5s
[CV] END ..................................C=0.1