<a href="https://colab.research.google.com/github/iwonkawa/Business-Card/blob/master/Evaluation_of_Classifiers_on_Iris_and_Wine_Datasets.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris, load_wine
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# Load datasets
iris = load_iris()
wine = load_wine()

# Split data into training and testing sets (70% training, 30% testing)
x_train_iris, x_test_iris, y_train_iris, y_test_iris = train_test_split(iris.data, iris.target, test_size=0.3, random_state=42)
x_train_wine, x_test_wine, y_train_wine, y_test_wine = train_test_split(wine.data, wine.target, test_size=0.3, random_state=42)

# Create classifiers
dt_clf = DecisionTreeClassifier()
svc_clf = SVC()
knn_clf = KNeighborsClassifier()

# Put all classifiers in a list
classifiers = [dt_clf, svc_clf, knn_clf]
classifier_names = ["Decision Tree", "Support Vector Machine", "K-Nearest Neighbors"]

# Function to train and evaluate classifiers
def evaluate_classifiers(x_train, x_test, y_train, y_test, dataset_name):
    for clf, name in zip(classifiers, classifier_names):
        print(f"Evaluating {name} on {dataset_name} dataset")
        clf.fit(x_train, y_train)
        y_pred_train = clf.predict(x_train)
        y_pred_test = clf.predict(x_test)

        # Calculate metrics
        train_accuracy = accuracy_score(y_train, y_pred_train)
        test_accuracy = accuracy_score(y_test, y_pred_test)

        print(f"Training Accuracy: {train_accuracy:.4f}")
        print(f"Testing Accuracy: {test_accuracy:.4f}")
        print("Classification Report on Test Data:")
        print(classification_report(y_test, y_pred_test))
        print("------------")

# Evaluate on Iris dataset
evaluate_classifiers(x_train_iris, x_test_iris, y_train_iris, y_test_iris, "Iris")

# Evaluate on Wine dataset
evaluate_classifiers(x_train_wine, x_test_wine, y_train_wine, y_test_wine, "Wine")



Evaluating Decision Tree on Iris dataset
Training Accuracy: 1.0000
Testing Accuracy: 1.0000
Classification Report on Test Data:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00        13

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45

------------
Evaluating Support Vector Machine on Iris dataset
Training Accuracy: 0.9619
Testing Accuracy: 1.0000
Classification Report on Test Data:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00        13

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg   

**Report:**

**Datasets Used: **

**1.Iris Dataset:** This dataset contains 150 samples of iris flowers, classified into three species: Setosa, Versicolour, and Virginica. The features include sepal length, sepal width, petal length, and petal width.

**2.Wine Dataset**:This dataset contains 178 samples of wine, classified into three different classes. The features include chemical properties such as alcohol content, malic acid, ash, alkalinity of ash, and more.

**Classifiers Used:**

1.Decision Tree Classifier

2.Support Vector Machine (SVM)

3.K-Nearest Neighbors (KNN)

**Data Splitting:**

Both datasets were split into training and testing sets with a ratio of 70% training and 30% testing using the train_test_split function. A fixed random seed (42) was used to ensure reproducibility.

**Evaluation and Results:**

**1. Iris Dataset:**

**Decision Tree Classifier:**

*   Training Accuracy: 1.0000
*   Testing Accuracy: 1.0000
*   Observation: The Decision Tree performs well on both training and test sets, with a slight drop in accuracy on the test set, indicating minimal overfitting.



**Support Vector Machine:**


*   Training Accuracy: 0.9619
*   Testing Accuracy: 1.0000
*   Observation: The SVM model has slightly lower accuracy on the training set but achieves perfect accuracy on the test set. This indicates that the SVM generalizes well without overfitting, despite not perfectly fitting the training data.


**K-Nearest Neighbors:**
*   Training Accuracy: 0.9524
*   Testing Accuracy: 1.0000
*   Observation: KNN also achieves perfect accuracy on the test set, while the training accuracy is slightly lower. This suggests that KNN is a good fit for the Iris dataset, balancing between fitting the training data and generalizing to unseen data.

**2.Wine Dataset:**

**Decision Tree Classifier:**

*   Training Accuracy: 1.0000
*   Testing Accuracy: 0.9630
*   Observation: The Decision Tree classifier perfectly fits the training data but shows a slight decrease in accuracy on the test set. This suggests that the model is overfitting to some extent, as it captures the training data very well but loses some generalization ability.


**Support Vector Machine:**
*   Training Accuracy: 0.6694
*   Testing Accuracy: 0.7593
*   Observation: The SVM model shows moderate performance, with significant underfitting on the training data. This underfitting indicates that the model may not be complex enough to fully capture the patterns in the Wine dataset, resulting in lower accuracy.


**K-Nearest Neighbors:**
*   Training Accuracy: 0.7742
*   Testing Accuracy: 0.7407
*   Observation: KNN performs slightly better than SVM on the training data but shows similar accuracy on the test data. The gap between training and test accuracy suggests some degree of underfitting, meaning the model may not be fully capturing the complexity of the Wine dataset.


**Conclusions:**

**1.   Iris Dataset:**

All three classifiers perform exceptionally well, with perfect accuracy on the test set. This suggests that the Iris dataset is relatively simple, and all models are able to capture the relationships between features and classes effectively. However, the perfect accuracy of the Decision Tree on both training and test sets could indicate overfitting, although it still generalizes well.

**2. Wine Dataset:**

The Decision Tree classifier shows signs of overfitting, as evidenced by the perfect training accuracy but slightly lower test accuracy. Despite this, it still performs better than the other models.
SVM and KNN both show signs of underfitting, particularly on the Wine dataset, indicating that these models might require more tuning or a different approach to handle the complexity of the data.

**General Insight:**

The results highlight the importance of choosing the right model for a given dataset. While simpler datasets like Iris can be modeled effectively by various classifiers, more complex datasets like Wine may require careful tuning and model selection to avoid overfitting or underfitting. Regularization, pruning, and cross-validation are potential strategies to improve model performance, particularly in cases where overfitting or underfitting is observed.