**Title:**

Comparative Analysis of Supervised Learning Algorithms: SVM, NB, DT, KNN, and ANN

**Abstract:**

This lab report presents a comprehensive examination of five prominent supervised learning algorithms—Support Vector Machine (SVM), Naive Bayes (NB), Decision Trees (DT), k-Nearest Neighbors (KNN), and Artificial Neural Networks (ANN). Through rigorous experimentation on a Lung Cancer, we assess and compare the performance of these algorithms, offering valuable insights for practitioners seeking to select the most suitable model for their specific tasks.

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import matplotlib as plt

# Load the preprocessed dataset
df = pd.read_csv('/content/survey lung cancer.csv')

# Handle missing values (if any)
df = df.dropna()

# Encode categorical variables
df = pd.get_dummies(df, columns=['GENDER'])

# Split the dataset into features and target variable
X = df.drop(columns=['LUNG_CANCER'])
y = df['LUNG_CANCER']

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)



**Support Vector Machine (SVM):**

 SVM is a powerful algorithm used for both classification and regression tasks. It works by finding a hyperplane that best separates the data into different classes. SVM is particularly effective in high-dimensional spaces and is widely used in applications such as image classification, text categorization, and bioinformatics.



**Naive Bayes (NB):**

 Naive Bayes is a probabilistic algorithm based on Bayes' theorem. It assumes that features are conditionally independent, which simplifies calculations. NB is commonly used in natural language processing tasks like spam detection and sentiment analysis. Its simplicity and efficiency make it suitable for large datasets.

**Decision Trees (DT):**

 Decision Trees are versatile and interpretable algorithms that use a tree-like model of decisions. Each internal node represents a decision based on a feature, and each leaf node represents the output label. DTs are used in fields such as finance for credit scoring, in healthcare for diagnosis, and in business for decision-making processes.

**k-Nearest Neighbors (KNN):**

 KNN is a non-parametric algorithm that classifies data points based on the majority class of their k-nearest neighbors. It is commonly used in pattern recognition and recommendation systems. KNN is suitable for both classification and regression tasks and does not make strong assumptions about the underlying data distribution.

**Artificial Neural Networks (ANN):**

 ANN is a type of deep learning algorithm inspired by the structure and function of the human brain. It consists of interconnected nodes (neurons) organized into layers. ANN is highly effective in capturing complex, non-linear relationships and is widely used in image recognition, natural language processing, and speech recognition.

In [None]:
# Initialize classifiers
svm_classifier = SVC(kernel='linear', random_state=42)
nb_classifier = GaussianNB()
dt_classifier = DecisionTreeClassifier(random_state=42)
knn_classifier = KNeighborsClassifier(n_neighbors=3)
ann_classifier = MLPClassifier(hidden_layer_sizes=(100,), max_iter=500, random_state=42)

classifiers = {
    'SVM': svm_classifier,
    'Naive Bayes': nb_classifier,
    'Decision Tree': dt_classifier,
    'KNN': knn_classifier,
    'ANN': ann_classifier
}

# Train and evaluate each classifier
for clf_name, clf in classifiers.items():
    # Train the model
    clf.fit(X_train, y_train)

    # Make predictions on the test set
    y_pred = clf.predict(X_test)

    # Evaluate the model
    accuracy = accuracy_score(y_test, y_pred)
    conf_matrix = confusion_matrix(y_test, y_pred)
    classification_rep = classification_report(y_test, y_pred)

    print(f"\nResults for {clf_name}:")
    print(f"Accuracy: {accuracy:.2f}")
    print("Confusion Matrix:")
    print(conf_matrix)
    print("Classification Report:")
    print(classification_rep)



Results for SVM:
Accuracy: 0.97
Confusion Matrix:
[[ 1  1]
 [ 1 59]]
Classification Report:
              precision    recall  f1-score   support

          NO       0.50      0.50      0.50         2
         YES       0.98      0.98      0.98        60

    accuracy                           0.97        62
   macro avg       0.74      0.74      0.74        62
weighted avg       0.97      0.97      0.97        62


Results for Naive Bayes:
Accuracy: 0.95
Confusion Matrix:
[[ 1  1]
 [ 2 58]]
Classification Report:
              precision    recall  f1-score   support

          NO       0.33      0.50      0.40         2
         YES       0.98      0.97      0.97        60

    accuracy                           0.95        62
   macro avg       0.66      0.73      0.69        62
weighted avg       0.96      0.95      0.96        62


Results for Decision Tree:
Accuracy: 0.97
Confusion Matrix:
[[ 1  1]
 [ 1 59]]
Classification Report:
              precision    recall  f1-score   sup

  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


**Results:**

Our experiments revealed varying performance across the algorithms. SVM achieved an accuracy of 97%, NB demonstrated 95%, DT showed 97%, KNN performed at 94%, and ANN outperformed with an accuracy of 97%. Precision, recall, and F1 score metrics provided additional insights into the strengths and weaknesses of each algorithm.

**Conclusion:**

In conclusion, this study offers a detailed comparative analysis of five supervised learning algorithms. SVM, DT,ANN demonstrated superior performance, but the selection of the most appropriate algorithm depends on the specific characteristics of the data and the goals of the classification task.