## Comparison of Classification Algorithms

In machine learning, classification means training a model to specify which category an entry belongs to.

For this task, you must first choose a classification-based problem statement and determine all those classification algorithms that may be useful for your problem. Next, you need to train classification models and show a comparison based on their performance.

The performance of all classification algorithms will depend on the problem you are working on. So let’s start this task by importing the necessary Python libraries, a dataset based on the problem of classification, and some of the popular classification algorithms:

In [1]:
import numpy
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import BernoulliNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.metrics import classification_report

data = pd.read_csv("https://raw.githubusercontent.com/amankharwal/Website-data/master/social.csv")
print(data.head())

   Age  EstimatedSalary  Purchased
0   19            19000          0
1   35            20000          0
2   26            43000          0
3   27            57000          0
4   19            76000          0


Let’s move forward to the task of comparing the performance of classification algorithms in machine learning. Here you can either choose only one performance evaluation metric or more, but the process will remain the same as shown in the code below:

In [5]:
x = np.array(data[["Age", "EstimatedSalary"]])
y = np.array(data[["Purchased"]])

xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.10, random_state=42)
decisiontree = DecisionTreeClassifier()
logisticregression = LogisticRegression()
knearestclassifier = KNeighborsClassifier()
bernoulli_naiveBayes = BernoulliNB()
passiveAggressive = PassiveAggressiveClassifier()


knearestclassifier.fit(xtrain, ytrain)
decisiontree.fit(xtrain, ytrain)
logisticregression.fit(xtrain, ytrain)
passiveAggressive.fit(xtrain, ytrain)

data1 = {"Classification Algorithms": ["KNN Classifier", "Decision Tree Classifier", 
                                       "Logistic Regression", "Passive Aggressive Classifier"],
      "Score": [knearestclassifier.score(x,y), decisiontree.score(x, y), 
                logisticregression.score(x, y), passiveAggressive.score(x,y) ]}
score = pd.DataFrame(data1)
score

  knearestclassifier.fit(xtrain, ytrain)
  return f(**kwargs)
  return f(**kwargs)


Unnamed: 0,Classification Algorithms,Score
0,KNN Classifier,0.875
1,Decision Tree Classifier,0.98
2,Logistic Regression,0.6425
3,Passive Aggressive Classifier,0.6425


In the above code:

- I first divided the data into training and test sets;
- Then I stored all the classification algorithms provided by the scikit-learn library in Python in their respective variables;
- Then I used the fit method to fit the data in the algorithm;

Finally, I created a DataFrame, where I stored the model score on the data.

According to the above output, the Decision Tree classification algorithm performs the best on this dataset.

## Summary

So this is how you can compare classification algorithms in machine learning using the Python programming language.