

## 4 Algorithms Classifier
### We used 4 algorithms Classifier


* SGD Classifier
* Random Forest Classifier
* XGB Classifier
* KNeighbors Classifier


<img src="https://storage.googleapis.com/kaggle-competitions/kaggle/26479/logos/thumb76_76.png?t=2021-04-09-00-56-24" width="800px">

###  Data Description

The dataset is used for this competition is synthetic, but based on a real dataset and generated using a CTGAN. The original dataset deals with predicting the category on an eCommerce product given various attributes about the listing. Although the features are anonymized, they have properties relating to real-world features.


### Files
* train.csv - the training data, one product (id) per row, with the associated features (feature_*) and class label (target)
* test.csv - the test data; you must predict the probability the id belongs to each class
* sample_submission.csv - a sample submission file in the correct format



#### Dataset Link


##### [Here](https://www.kaggle.com/c/tabular-playground-series-may-2021/code)



In [None]:
import numpy as np
import pandas as pd
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split

import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns


In [None]:
data_train = pd.read_csv("/kaggle/input/tabular-playground-series-may-2021/train.csv")
data_test = pd.read_csv("/kaggle/input/tabular-playground-series-may-2021/test.csv")
data_sub = pd.read_csv("/kaggle/input/tabular-playground-series-may-2021/sample_submission.csv")

In [None]:
display(data_train.head())
display(data_test.head())
display(data_sub.head())

In [None]:
display(data_train.info())
display(data_test.info())
display(data_sub.info())

In [None]:
#drop the id
data_train = data_train.drop(['id'], axis=1)
data_test = data_test.drop(['id'], axis=1)

In [None]:
data_train['target'].value_counts()

In [None]:
plt.figure(figsize=(10,8))
sns.countplot(data_train['target'],
                   linewidth=5,
                   edgecolor=sns.color_palette("dark", 3),palette="Set3")

In [None]:
data_train['newtarget'] = data_train['target'].map({'Class_1':0,
                                                  'Class_2':1,
                                                  'Class_3':2, 
                                                  'Class_4':3})

In [None]:
data_train.head()

In [None]:
plt.figure(figsize=(18,25))
sns.boxplot(data=data_train, orient="h",palette="Set3");


In [None]:
plt.figure(figsize=(18,25))
sns.boxplot(data=data_test.iloc[:,1:], orient="h",palette="Set3");

In [None]:
data_train.corr()['newtarget']

In [None]:
# Independant variable
X = data_train.iloc[:,:-2]

# Dependant variable
y = data_train['newtarget']

In [None]:
# split  data into training and testing sets of 70:30 ratio
# 30% of test size selected
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.70, random_state=1)

In [None]:
from sklearn.linear_model import SGDClassifier
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, accuracy_score
model_1 = make_pipeline(StandardScaler(), SGDClassifier())

print(model_1.fit(X, y))

print(model_1.score(X_test,y_test))


y_pred = model_1.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)

In [None]:
from sklearn.ensemble import RandomForestClassifier
model_2 = make_pipeline(StandardScaler(), RandomForestClassifier())

print(model_2.fit(X, y))

print(model_2.score(X_test,y_test))

y_pred = model_2.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)

In [None]:
from sklearn import neighbors
model_3 = make_pipeline(StandardScaler(), neighbors.KNeighborsClassifier())

print(model_3.fit(X, y))

print(f'score Model:',model_3.score(X_test,y_test))

y_pred = model_3.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)

In [None]:
import xgboost as xgb
model_4 = make_pipeline(StandardScaler(),xgb.XGBClassifier())

print(model_4.fit(X, y))

print(f'score Model:',model_4.score(X_test,y_test))

y_pred = model_4.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)

In [None]:
models = pd.DataFrame({
    'Model': ['SGDClassifier','Random Forest Classifier',
              'K Neighbors Classifier', 'XGB Classifier'],

    'Score': [model_1.score(X_test,y_test)*100,
              model_2.score(X_test,y_test)*100,
              model_3.score(X_test,y_test)*100, 
              model_4.score(X_test,y_test)*100]})

In [None]:
models.sort_values(by='Score', ascending=True)

In [None]:
%%time

test_pred = model_2.predict(data_test)
print('Prediction for test set:\n{}\nShape = {}'.format(test_pred[:5], test_pred.shape))

In [None]:
test_pred

In [None]:
data_sub.to_csv('submission.csv', index=False)

In [None]:
data_sub