# Titanic Ship
__________________________________________________________________________________
# Introduction

The RMS Titanic was the biggest moveable man-made object of her day, a colossal presence in the water and the subject of a tragic story that fascinates us to this day. Read on for the key facts about the ship, then explore the site further for more fascinating and moving aspects of her life and loss.

46,328 tons – the internal usable volume of the Titanic (referred to as the ‘gross register tons’ or GRT).


# How long was the Titanic?

The Titanic was 882 feet 9 inches (269.1 metres) long, at the time the World’s largest man-made moving object. Today, the MS Allure of the Seas is largest passenger vessel afloat, at 1,187 feet (362 metres) long.

92 feet – her breadth (28 metres).
175 feet – the height of the Titanic, from the top of the funnels to the keel (53.3 metres).

![](https://titanicfacts.net/wp-content/uploads/2018/06/rms-titanic.jpg)

![](https://titanicfacts.net/wp-content/uploads/2018/06/titanic-promenade-deck.jpg)
10,000 – the approximate number of light bulbs on the ship.

3 – the number of engines used to power the ship, 2 outboard reciprocal engines (which could move the ship forward or backwards) and a central steam turbine engine, which ran forward only.

4 – the number of decks tall at which the two main engines stood, being the largest ever built at the time.

3 – the number of propellers, 2 outboard (3 x 10′ blades) driven by the main reciprocal engines, and a central prop (4 x 6′ blades) powered by the steam turbine engine.

38 tons – the weight of the two outer propellers, which were made of bronze.

22 tons – the weight of the central propeller, of a moulded construction.

2 – the number of anchors.

15 tons – the weight of each anchor.

15 – the number of transverse water-tight bulkheads.

69 feet – the length of the room in which the reciprocating engines were located.

57 feet – the length of the turbine room.

So thorough are the precautions which have been taken to prevent the ship from sinking in the event of a serious accident that any two compartments may be flooded without endangering the safety of the vessel. -From the Belfast Newsletter report on the launch of Titanic, published Thursday 01 June 1911.

131,428 – the official number of the ship.

1,200 miles – the typical range of the Titanic’s wireless equipment at night.

400 miles – the typical range of the same wireless equipment during daylight hours (due to heavier use of the airwaves by other shipping).

1955 – the year in which Walter Lord (1917 – 2002) published A Night To Remember, his popular non-fiction account of the Titanic tragedy, released as a movie in 1958 and credited with regenerating public interest in the story, which had wained over the years.

In the creation of the Titanic myth there were two defining moments: 1912, of course, and 1955. -Steven Biel, cultural historian, on the influence of A Night To Remember

![](https://upload.wikimedia.org/wikipedia/commons/thumb/a/a4/RMS_Titanic_Ad_April_10%2C_1912.jpg/400px-RMS_Titanic_Ad_April_10%2C_1912.jpg)
Reference -https://titanicfacts.net/titanic-ship/

In [None]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler ,LabelEncoder
from sklearn.linear_model import LogisticRegression 
from sklearn import svm 
from sklearn.ensemble import RandomForestClassifier 
from sklearn.neighbors import KNeighborsClassifier 
from sklearn.naive_bayes import GaussianNB 
from sklearn.tree import DecisionTreeClassifier 
from sklearn.model_selection import train_test_split 
from sklearn import metrics 
from sklearn.metrics import confusion_matrix ,classification_report
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline

Reading Data

In [None]:
train=pd.read_csv('../input/titanic/train.csv')
test=pd.read_csv('../input/titanic/test.csv')
train_pass=train.shape[0]

In [None]:
train.head()

# Dataset Information 

1-Variable Survived ______________  _______________Definition Survival 	__________ ________Key 0 = No, 1 = Yes

2-Variable Pclass ______________	_______________Definition  Ticket class __________ ________Key	1 = 1st, 2 = 2nd, 3 = 3rd

3-Variable Age ______________	_______________Definition  	 Age in years

4-Variable Sex ______________	_______________Definition    Sex  __________ ________Key	1 = Male, 2 = Female

5-Variable Sibsp ______________	_______________Definition  of siblings / spouses aboard the Titanic

6-Variable Parch ______________	_______________Definition of parents / children aboard the Titanic

7-Variable Ticket ______________ _______________Definition	Ticket number

8-Variable Fare ______________ _______________Definition	Passenger fare

9-Variable Cabin ______________	_______________Definition	Cabin number

10-Variable Embarked ______________	_______________Definition	Port of Embarkation __________ ________Key 	C = Cherbourg, Q = Queenstown,S = Southampton

Pclass: A proxy for socio-economic status (SES)

1st = Upper

2nd = Middle

3rd = Lower

Age: Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5

Sibsp: The dataset defines family relations in this way...

Sibling = brother, sister, stepbrother, stepsister

Spouse = husband, wife (mistresses and fiancés were ignored)

Parch: The dataset defines family relations in this way...

Parent = mother, father

Child = daughter, son, stepdaughter, stepson

Some children travelled only with a nanny, therefore parch=0 for them

In [None]:
train.describe()

In [None]:
train.describe(include='O')

In [None]:
train_data=train.copy()

# Initial Questions & Data Investigation
Did  Pclass affect the chances of surviving?

Did  Pclass affect the Passenger Total ?

Did  Pclass affect the Gender?



In [None]:
Pclass=pd.DataFrame()
Pclass['Total Passinger']=train_data.groupby('Pclass')['Survived'].size()
Pclass['Solo Passinger']=[train_data.loc[(train_data.SibSp== 0)&(train_data.Parch== 0)]['Pclass'].value_counts()[1],
                         train_data.loc[(train_data.SibSp== 0)&(train_data.Parch== 0)]['Pclass'].value_counts()[2],
                         train_data.loc[(train_data.SibSp== 0)&(train_data.Parch== 0)]['Pclass'].value_counts()[3]]
Pclass['Family_Size']=Pclass['Total Passinger']-Pclass['Solo Passinger']
Pclass['median Age']=train_data.groupby('Pclass').median()['Age']
Pclass['(Sex)female']=train_data.groupby(['Sex','Pclass']).median()['Age']['female']
Pclass['(Sex)male']=train_data.groupby(['Sex','Pclass']).median()['Age']['male']
Pclass['Mean Fare']=train_data.groupby('Pclass').mean()['Fare']
Pclass['Survived']=train_data.groupby('Pclass')['Survived'].value_counts().unstack()[1]
Pclass['Not Survived']=train_data.groupby('Pclass')['Survived'].value_counts().unstack()[0]
Pclass['Survived percent']=train_data.groupby('Pclass')['Survived'].value_counts().unstack()[1]/train.groupby('Pclass')['Survived'].size()*100
Pclass=Pclass.astype(int)
display(Pclass.style.background_gradient(cmap='YlGnBu'))

The third class is higher for the number of passengers and smaller for the age of the passengers and the lowest price %24 only cant Survived now The third class is the domestic class on the ship

The Second class is lower for the number of passenger and Median age and median Price and %47 can survive

The First class is Median for the number of passenger and higher for age and %62 can survive

Let's create new feature help us to understand Data
![](https://thebite.aisb.ro/wp-content/uploads/From_a_Child_to_an_Adult-e1523822431626.jpg)

In [None]:
train_data.loc[(train_data.Age >=0)  & (train_data.Age <=9),'age_group'] = '0-9'
train_data.loc[(train_data.Age >=10) &(train_data.Age <=19),'age_group'] = '10-19'
train_data.loc[(train_data.Age >=20) & (train_data.Age <=29),'age_group'] = '20-29'
train_data.loc[(train_data.Age >=30) & (train_data.Age <=39),'age_group'] = '30-39'
train_data.loc[(train_data.Age >=40) & (train_data.Age <=49),'age_group'] = '40-49'
train_data.loc[(train_data.Age >=50) & (train_data.Age <=59),'age_group'] = '50-59'
train_data.loc[(train_data.Age >=60) & (train_data.Age <=69),'age_group'] = '60-69'
train_data.loc[(train_data.Age >=70) & (train_data.Age <=80),'age_group'] = '70-80'

Did  Age Group affect the chances of surviving?

In [None]:
age_group=train_data.groupby(['age_group'])['Survived'].value_counts().unstack(-2)

age_group.rename(index={1: 'Survived',0:'Not-survived'}, inplace=True)
age_group.index.name = 'Survived ?'
age_group.columns.name = 'Age_Group'
display(age_group.style.background_gradient(cmap='YlGnBu'))

The only time there are more survivors than those who cannot survive is in kids

Now let's see how Age Group and Pclass affect survival chance?

And distribution Age group in pclass

In [None]:
age_group=train_data.groupby(['age_group','Survived'])['Pclass'].value_counts().unstack(0)
age_group=age_group.fillna(0)
age_group=age_group.astype(int)
display(age_group.style.background_gradient(cmap='YlGnBu'))

Children in the third class couldn't survive like the first and second and 0 in kids in the second class and only one in first :
and and teenager like  Children in third class may be bad more than %65 cant Survived and second class Fifty fifty first one more than %60 cant Survived after this from the 20 to 59 age in first class survive more than People who could not 60-80 only 5 can survive


Now let's see how Age Group and Sex affect survival chance?

And distribution Age group in Sex

In [None]:
age_group=train_data.groupby(['age_group','Survived'])['Sex'].value_counts().unstack(0)
age_group=age_group.fillna(0)
age_group=age_group.astype(int)
display(age_group.style.background_gradient(cmap='YlGnBu'))

Now let's see how Pclass and Sex affect survival chance?

And distribution Pclass in Sex

In [None]:
age_group=train_data.groupby(['Survived','Pclass'])['Sex'].value_counts().unstack(-1)
age_group=age_group.fillna(0)
age_group=age_group.astype(int)
display(age_group.style.background_gradient(cmap='YlGnBu'))

Let's take a look at the category that affected the survivor

In [None]:
cols=['Pclass',"Sex","SibSp","Parch",'age_group']
Data=pd.DataFrame()
for i in cols:
    Data=train_data.groupby(i)['Survived'].value_counts().unstack()[1]/train_data.groupby(i)['Survived'].size()*100
    print('the percentage of survived %\n', Data,'\n------------------------------')

In [None]:
plt.style.use('bmh')
data=train_data['Survived'].value_counts()
labels=data.index
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(20, 10))
fig.subplots_adjust(wspace=0)
ax2.set_title('Survived - UnSurvived Ratio')
ax2.pie(data,startangle=90,explode=[0,0.05],autopct='%1.1f%%',labels=['Not_Survived', 'Survived'])
ax1.bar(labels[0],data[0],color='black')
ax1.bar(labels[1],data[1],color='blue')
ax1.legend(labels=['Not_Survived', 'Survived'])
ax1.set_title('Survived - UnSurvived Count')

plt.show()

In [None]:
data=train_data['Sex'].value_counts()
labels=data.index
data1=pd.DataFrame()
for i in range(labels.size):
    data1[labels[i]]=train_data.loc[(train_data.Sex ==labels[i])]['Survived'].value_counts()

X0=data1.loc[0]
X1=data1.loc[1]
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(20, 10))
fig.subplots_adjust(wspace=0)
ax2.set_title('Sex Ratio')
ax2.pie(data,explode=[0,0.05],autopct='%1.1f%%',labels=data.index)
ax1.bar(labels, X0,color='black')
ax1.bar(labels, X1,bottom=X0,color='blue')
ax1.legend(labels=['Not_Survived', 'Survived'])
ax1.set_title('Sex Count Across Survived')
plt.show()

In [None]:
data=train_data['Pclass'].value_counts()
labels=data.index
data1=pd.DataFrame()
for i in range(labels.size):
    data1[labels[i]]=train_data.loc[(train_data.Pclass ==labels[i])]['Survived'].value_counts()

X0=data1.loc[0]
X1=data1.loc[1]
plt.style.use('bmh')
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(20, 10))
fig.subplots_adjust(wspace=0)
ax2.set_title('Pclass Ratio')
ax2.pie(data,explode=[0,0.05,0.05] ,autopct='%5.01f%%',labels=data.index)
ax1.bar(labels, X0,color='black')
ax1.bar(labels, X1,bottom=X0,color='blue')
ax1.legend(labels=['Not_Survived', 'Survived'])
ax1.set_title('Pclass Count Across Survived')
plt.show()

In [None]:
data=train_data['Embarked'].value_counts()
labels=data.index
data1=pd.DataFrame()
for i in range(labels.size):
    data1[labels[i]]=train_data.loc[(train_data.Embarked ==labels[i])]['Survived'].value_counts()

X0=data1.loc[0]
X1=data1.loc[1]
plt.style.use('bmh')
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(20, 10))
fig.subplots_adjust(wspace=0)
ax2.set_title('Pclass Ratio')
ax2.pie(data, explode=[0,0.1,0.1],autopct='%2.2f%%',labels=data.index)
ax1.bar(labels, X0,color='black')
ax1.bar(labels, X1,bottom=X0,color='blue')
ax1.legend(labels=['Not_Survived', 'Survived'])
ax1.set_title('Embarked Count Across Survived')

plt.show()

In [None]:
data=train_data['age_group'].value_counts()
labels=data.index
data1=pd.DataFrame()
for i in range(labels.size):
    data1[labels[i]]=train_data.loc[(train_data.age_group ==labels[i])]['Survived'].value_counts()

X0=data1.loc[0]
X1=data1.loc[1]
plt.style.use('bmh')
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(20, 10))
fig.subplots_adjust(wspace=0)
ax2.set_title('Age-Group Ratior')
ax2.pie(data, explode=[0,0,0.03,0.03,0.03,0.03,0.03,0.03], autopct='%2.2f%%',labels=data.index)
ax1.bar(labels, X0,color='black')
ax1.bar(labels, X1,bottom=X0,color='blue')
ax1.legend(labels=['Not_Survived', 'Survived'])
ax1.set_title('Age-Group Count Across Survived')

plt.show()

In [None]:
plt.figure(figsize=(12,8))
sns.kdeplot(train_data['Age'][train_data.Pclass == 1], shade=True)
sns.kdeplot(train_data['Age'][train_data.Pclass == 2], shade=True)
sns.kdeplot(train_data['Age'][train_data.Pclass == 3], shade=True)
plt.legend(['3st', '2st','1st'])
plt.title('Pclass Age Distribution ')
plt.show()

In [None]:
plt.figure(figsize=(12,8))
sns.kdeplot(train_data['Age'][train_data.Sex == 'male'], shade=True)
sns.kdeplot(train_data['Age'][train_data.Sex == 'female'], shade=True)
plt.legend(['Male','Female' ])
plt.title('Sex Age Distribution ')
plt.show()

In [None]:
plt.figure(figsize=(12,8))
sns.kdeplot(train_data['Age'][train_data.Survived == 0], shade=True,color='black')
sns.kdeplot(train_data['Age'][train_data.Survived == 1], shade=True,color='blue')
plt.legend(['Not-Survived','Survived' ])
plt.title('Survived Not-Survived Age Distribution ')
plt.show()

In [None]:
data1 = train_data.groupby('Sex')['Survived'].value_counts()['male']
data2 =train_data.groupby('Sex')['Survived'].value_counts()['female']

plt.figure(figsize=(20,10))

ax1 = plt.subplot(121, aspect='equal')
data1.plot.pie(startangle=90,explode=[0,0.1],autopct='%1.1f%%',colors=['black','blue'], ax=ax1)
ax1.title.set_text('Male')

ax2 = plt.subplot(122, aspect='equal')
data2.plot.pie(startangle=90,explode=[0,0.1],autopct='%1.1f%%', colors=['blue','black'],ax=ax2)
ax2.title.set_text('Female')

plt.show()

In [None]:
f,ax=plt.subplots(1,2,figsize=(20,10))
sns.violinplot("Pclass","Age", hue="Survived", data=train_data,split=True,ax=ax[0])
ax[0].set_title('Pclass and Age vs Survived')
ax[0].set_yticks(range(0,110,10))
sns.violinplot("Sex","Age", hue="Survived", data=train_data,split=True,ax=ax[1])
ax[1].set_title('Sex and Age vs Survived')
ax[1].set_yticks(range(0,110,10))
plt.show()

In [None]:
train_data.loc[(train_data.SibSp== 0)&(train_data.Parch== 0),'status']='Solo'
train_data.loc[(train_data.SibSp!= 0)|(train_data.Parch!= 0),'status']='Family'

In [None]:
data  =train_data['status'].value_counts()
data1 = train_data.groupby('status')['Survived'].value_counts()['Solo']
data2 =train_data.groupby('status')['Survived'].value_counts()['Family']

plt.figure(figsize=(20,10))

ax1 = plt.subplot(121, aspect='equal')
data.plot.pie(startangle=90,explode=[0,0.025],autopct='%1.1f%%', ax=ax1)
ax1.title.set_text('Status Distribution Persent')

ax2 = plt.subplot(222, aspect='equal')
data1.plot.pie(startangle=90,explode=[0,0.05],autopct='%1.1f%%', colors=['black','blue'],ax=ax2)
ax2.title.set_text('Status Solo Survivie Persent')

ax3 = plt.subplot(224, aspect='equal')
data2.plot.pie(startangle=90,explode=[0,0.05],autopct='%1.1f%%', colors=['blue','black'],ax=ax3)
ax3.title.set_text('Status Family Survivie Persent')

plt.show()

In [None]:
data1 =train_data.groupby('age_group')['status'].value_counts().unstack()
data2 =train_data.groupby('Sex')['status'].value_counts().unstack()

plt.figure(figsize=(20,10))

ax1 = plt.subplot(121 )
data1.plot(kind='bar',stacked=True,ax=ax1)
ax1.title.set_text('Status Count Across Age-group')
ax2 = plt.subplot(122 )
data2.plot(kind='bar',stacked=True,ax=ax2)
ax2.title.set_text('Status Count Across Sex')


plt.show()

In [None]:
data1 =train_data.groupby('Pclass')['status'].value_counts().unstack()
data2 =train_data.groupby('Embarked')['status'].value_counts().unstack()

plt.figure(figsize=(20,10))

ax1 = plt.subplot(121 )
data1.plot(kind='bar',stacked=True,ax=ax1)
ax1.title.set_text('Status Count Across Pclass')

ax2 = plt.subplot(122 )
data2.plot(kind='bar',stacked=True,ax=ax2)
ax2.title.set_text('Status Count Across Embarked')


plt.show()

In [None]:
plt.figure(figsize=(12,8))
sns.kdeplot(train_data['Fare'][train_data.Survived == 1], shade=True,color='blue')
sns.kdeplot(train_data['Fare'][train_data.Survived == 0], shade=True,color='black')
plt.legend(['Survived','Not-Survived'])
plt.title('Survived Not-Survived Fare Distribution ')

plt.show()

# Feature Engineering

In [None]:
target=train['Survived']
All_Features=pd.concat([train,test]).reset_index(drop=True)


Let's make Name columns be useful

In [None]:
All_Features['Title']=0
for i in All_Features:
    All_Features['Title']=All_Features.Name.str.extract('([A-Za-z]+)\.')     

In [None]:
All_Features['Title'].value_counts()

Mr is the highst one thats make sense because the men %65 percent of all passenger

In [None]:
others=All_Features['Title'].value_counts()[All_Features['Title'].value_counts()<10].index

In [None]:
All_Features['Title']=All_Features['Title'].replace([others],'other')
All_Features['Title'] = All_Features['Title'].replace('Mlle', 'Miss')
All_Features['Title'] = All_Features['Title'].replace('Ms', 'Miss')
All_Features['Title'] = All_Features['Title'].replace('Mme', 'Mrs')

now the titles distribution cross Pclass

In [None]:
sns.factorplot('Pclass','Survived',col='Title',data=All_Features[:train_pass])
plt.show()

# missing values
checking missing values

In [None]:
def missing_values(data):
    total=data.isnull().sum()
    percent=total/data.isnull().count()*100
    missing_values=(pd.concat([total,percent],axis=1,keys=['Total','Precent'])).sort_values(['Total'],ascending=False)
    missing_values=missing_values.drop((missing_values[missing_values['Total']==0]).index,0)
    return missing_values

In [None]:
display(missing_values(All_Features).head().style.background_gradient(cmap='Blues'))

20% percent of  the Age is null

sex and pclass most correlation feature to age 

In [None]:
All_Features.groupby(['Sex','Pclass'])['Age'].median()

In [None]:
All_Features['Age']=All_Features.groupby(['Sex','Pclass'])['Age'].apply(lambda x:x.fillna(x.median()))

only one value missing in Embarked 

used most value for null values

In [None]:
All_Features['Embarked'].value_counts()

In [None]:
All_Features['Embarked']=All_Features['Embarked'].fillna('S')

In [None]:
All_Features['Fare']=All_Features.groupby(['Sex','Pclass'])['Fare'].apply(lambda x:x.fillna(x.median()))

In [None]:
All_Features=All_Features.drop(['Cabin','Ticket','Name','PassengerId'],axis=1)

In [None]:
missing_values(All_Features).head()

Create New Featue 

In [None]:
All_Features.loc[(All_Features.Age >=0)  & (All_Features.Age <=9),'age_group'] = 0
All_Features.loc[(All_Features.Age >=10) &(All_Features.Age <=19),'age_group'] = 1
All_Features.loc[(All_Features.Age >=20) & (All_Features.Age <=29.9),'age_group'] = 2
All_Features.loc[(All_Features.Age >=30) & (All_Features.Age <=39),'age_group'] = 3
All_Features.loc[(All_Features.Age >=40) & (All_Features.Age <=49),'age_group'] = 4
All_Features.loc[(All_Features.Age >=50) & (All_Features.Age <=59),'age_group'] = 5
All_Features.loc[(All_Features.Age >=60) & (All_Features.Age <=69),'age_group'] = 6
All_Features.loc[(All_Features.Age >=70) & (All_Features.Age <=80),'age_group'] = 7

In [None]:
data =All_Features.groupby('Title')['age_group'].value_counts().unstack()
data.plot(kind='bar',stacked=True,figsize=(12,8),title='Title Count Across Age_group')
plt.show()

Create New featue Family Size  And Alon Passanger

In [None]:
All_Features['Family_Size']=All_Features['SibSp']+All_Features['Parch']
All_Features.loc[(All_Features.Family_Size == 0),'Solo_Passanger'] = 1
All_Features.loc[(All_Features.Family_Size > 0),'Solo_Passanger'] = 0

In [None]:
All_Features['Solo_Passanger'].value_counts()

# Encoding categorical feature

In [None]:
col=['Sex','Title','Embarked']
for i in col:
    All_Features[i]=LabelEncoder().fit_transform(All_Features[i])

In [None]:
f, ax = plt.subplots(figsize=(18, 10))
corrmat = All_Features.corr().sort_values(by='Survived',ascending=False)
mask=np.triu(np.ones_like(All_Features.corr(), dtype=bool))
sns.heatmap(corrmat, vmax=.3,annot=True,mask=mask,cmap="YlGnBu")
plt.title('Correlation Table')
plt.show()

In [None]:
All_Features=pd.get_dummies(All_Features)
All_Features.head()

In [None]:
All_Features=All_Features.drop(['Survived','Age'],axis=1)
All_Features_SL = StandardScaler().fit_transform(All_Features)
print('All_Features shape: {}'.format(All_Features.shape))

In [None]:
train=All_Features_SL[:train_pass]
test=All_Features_SL[train_pass:]
print('train shape: {}'.format(train.shape))
print('test shape: {}'.format(test.shape))
print('target shape: {}'.format(target.shape))

# Machine Learning

In [None]:
model = RandomForestClassifier(criterion='gini',n_estimators=1750,max_depth=7,min_samples_split=6,min_samples_leaf=6,
max_features='auto',oob_score=True,random_state=42,n_jobs=-1)
model.fit(All_Features[:target.shape[0]],target)

with plt.style.context('dark_background'):
    plt.figure(figsize=(12, 10))
    features=pd.Series(model.feature_importances_,All_Features.columns).sort_values(ascending=True)
    plt.barh(features.index ,features.values,color='red')
    plt.title('Feature Importances')
plt.show()

Split Data 

In [None]:
x_train,x_test,y_train,y_test=train_test_split(train,target,test_size=.33,random_state=42,shuffle=True)

In [None]:
model=LogisticRegression()
model.fit(x_train,y_train)
pred=model.predict(x_test)
print('Accuracy for  LogisticRegression is ',metrics.accuracy_score(pred,y_test))
plt.figure(figsize=(10,8))
ax= plt.subplot()
sns.heatmap(confusion_matrix(y_test,pred), annot=True, ax = ax, fmt='g',cmap='YlGnBu') 
ax.set_title('Confusion Matrix')
plt.show()

In [None]:
model=svm.SVC(kernel='rbf',C=1,gamma=0.1)
model.fit(x_train,y_train)
pred=model.predict(x_test)
print('Accuracy for rbf SVM is ',metrics.accuracy_score(pred,y_test))

plt.figure(figsize=(10,8))
ax= plt.subplot()
sns.heatmap(confusion_matrix(y_test,pred), annot=True, ax = ax, fmt='g',cmap='YlGnBu') 
ax.set_title('Confusion Matrix')
plt.show()

In [None]:
model=svm.SVC(kernel='linear',C=1,gamma=0.1)
model.fit(x_train,y_train)
pred=model.predict(x_test)
print('Accuracy for linear SVM is ',metrics.accuracy_score(pred,y_test))

plt.figure(figsize=(10,8))
ax= plt.subplot()
sns.heatmap(confusion_matrix(y_test,pred), annot=True, ax = ax, fmt='g',cmap='YlGnBu') 
ax.set_title('Confusion Matrix')
plt.show()

In [None]:
model=DecisionTreeClassifier()
model.fit(x_train,y_train)
pred=model.predict(x_test)
print('Accuracy for DecisionTreeClassifier is ',metrics.accuracy_score(pred,y_test))

plt.figure(figsize=(10,8))
ax= plt.subplot()
sns.heatmap(confusion_matrix(y_test,pred), annot=True, ax = ax, fmt='g',cmap='YlGnBu') 
ax.set_title('Confusion Matrix')
plt.show()

In [None]:
model=KNeighborsClassifier() 
model.fit(x_train,y_train)
pred=model.predict(x_test)
print('Accuracy for KNeighborsClassifier is ',metrics.accuracy_score(pred,y_test))

plt.figure(figsize=(10,8))
ax= plt.subplot()
sns.heatmap(confusion_matrix(y_test,pred), annot=True, ax = ax, fmt='g',cmap='YlGnBu') 
ax.set_title('Confusion Matrix')
plt.show()

In [None]:
model=KNeighborsClassifier(n_neighbors=24)
model.fit(x_train,y_train)
pred=model.predict(x_test)
print('Accuracy for KNeighborsClassifier is ',metrics.accuracy_score(pred,y_test))

plt.figure(figsize=(10,8))
ax= plt.subplot()
sns.heatmap(confusion_matrix(y_test,pred), annot=True, ax = ax, fmt='g',cmap='YlGnBu') 
ax.set_title('Confusion Matrix')
plt.show()

In [None]:
model = RandomForestClassifier(criterion='gini',n_estimators=1750,max_depth=7,min_samples_split=6,min_samples_leaf=6,
max_features='auto',oob_score=True,random_state=42,n_jobs=-1)
model.fit(x_train,y_train)
pred=model.predict(x_test)
print('Accuracy for RandomForestClassifier is ',metrics.accuracy_score(pred,y_test))

plt.figure(figsize=(10,7))
ax= plt.subplot()
sns.heatmap(confusion_matrix(y_test,pred), annot=True, ax = ax, fmt='g',cmap='YlGnBu') 
ax.set_title('Confusion Matrix')
plt.show()

In [None]:
from yellowbrick.classifier import classification_report
plt.figure(figsize=(10,7))
classification_report(model, x_train, y_train, x_test, y_test, support=True, cmap='YlGnBu')
plt.show()

Svm Linear is most high score one

that's it we done here

thank you for your time 

![](https://memegenerator.net/img/instances/64001897/time-to-say-goodbye.jpg)