![](https://upload.wikimedia.org/wikipedia/commons/thumb/f/fd/RMS_Titanic_3.jpg/500px-RMS_Titanic_3.jpg)

**RMS Titanic** was an British passenger line that is known for it's sinking after an collition with an iceberg during it's maiden voyage in the North Atlantic. The Titanic was during it's time in service the largest ship afloat.

Built in Belfast as second of three Olympic-class ocean liners and named after creatures of greek mythology the Titanic was designed to be the pinnacle of comfort and luxury. Holding an on-board gymnasium, swimming pool, libraries, high-class restaurants and opulent cabins the Titanic was already during it's time famously known for it's extravagance. 

The ten deck ship's total capacity is estimated around 3,327 people. For the maiden voyage approximately 2,000 people boarded the Titance. 1,317 of this people were passengers; 885 crew members. Despite it's luxury equipment and sheer size the Titanic only carried 20 lifeboats with a total capacity of 1,178 people. This and poor management after the collition with an iceberg was the reason for the massiv loss of life.

In the following we are going to work us through the data of the Titanic disaster surviors and predict chances of surving based on the given input. Our goal is to find a model that accuratly predicts the odds of someone to survive the titanic disaster. 


**Content**

[1. Dataset Preparation](#1)  
[2. Exploring the Data](#2)  
[3. Data Preparation & Visualization](#3)  
&emsp;[3.1 PassengerID](#3.1)  
&emsp;[3.2 Survived](#3.2)  
&emsp;[3.3 Pclass](#3.3)  
&emsp;[3.4 Name](#3.4)  
&emsp;[3.5 Sex](#3.5)  
&emsp;[3.6 Age](#3.6)  
&emsp;[3.7 SibSp & Parch](#3.7)  
&emsp;[3.8 Ticket](#3.8)  
&emsp;[3.9 Fare](#3.9)  
&emsp;[3.10 Cabin](#3.10)  
&emsp;[3.11 Embarked](#3.11)  
[4. Classification and Submission](#4)

#  <a id="1">1. Dataset Preparation</a> 

First off, we need to import several Python libraries for data wrangling and visualization. After this we load the datasets.

In [None]:
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.graph_objs as go
import matplotlib.pyplot as plt
import plotly.plotly as py
from plotly import tools
import plotly.figure_factory as ff
import pandas as pd
import numpy as np 
import seaborn as sns
import random 
import warnings
import operator
import copy

from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn import tree
from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import MinMaxScaler

warnings.filterwarnings("ignore")
init_notebook_mode(connected=True)
%matplotlib inline
plt.style.use('ggplot')

# Original Data
train = pd.read_csv("../input/train.csv")
test = pd.read_csv("../input/test.csv")

# Copy for preparation
train_prep = copy.deepcopy(train)
test_prep = copy.deepcopy(test)

 And take a look at the training data. Since our data are already divided into training and test data, we ignore the testdata until submitting to our kaggle competition.

In [None]:
train_prep.head()

In [None]:
train_prep.describe(include="all")

In [None]:
# Looking for null-values
train_prep.isnull().sum()

Because the features "Age", "Cabin" and "Embarked" contain null-values they must be further optimized at a later time.

Until here everything seems fine, so let's start with data analysis.

#  <a id="2">2. Exploring the Data</a> 

Let's take a look at the complete data and analyze the features.

In [None]:
train_prep.info()

**Dataset structure:**

*  PassengerID (int64): Used for passenger identification
* Survived (int64): Survived classification (0 = No, 1 = Yes)
* Pclass (int64): Ticket class 1 = 1st, 2 = 2nd, 3 = 3rd (A proxy for socio-economic status (SES))
* Name (string): Name
* Sex (string): Sex (male, female)
* Age (float64): Age in years. Fractional if less than 1
* SibSp (int64): Family relations (siblings, spouses)
* Parch (int64): Family relations (parent, children)
* Ticket (string): Ticket number
* Fare (float64): Ticket price
* Cabin (string): Cabin number
* Embarked (string): Port of Embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)

With this information we can categorizes the features for further analysis and classification:
*  **Numerical:** PassengerId, Age, SibSp, Parch, Fare
* **Categorical:** Survived, Pclass, Sex, Embarked
* **Text:** Name, Ticket, Cabin

Because "Sex" and "Embarked" are nominal or non-numerical we later need to convert these features.

**Observations:**

Our dataset contains 891 entries which of 341 (891*0.383838) people survived the disaster.

In [None]:
sns.pairplot(pd.get_dummies(train_prep, columns=["Sex"], drop_first=True), hue="Survived")

Furthermore we can see that the features "Age", "Sex", "Pclass", "SibSp" and "Parch" influence the survivability.

1.  The percentage of young people surviving is higher
2.  More females than male survived the disaster.
3.  Solo travelers survived more often
4.  The higher the class the more people survived

** Interpretation **

The reason for the first two observations may be the Birkenhead Drill which is famously known for "Women and children first". The higher odds of solo travelers are most likely explained by the fact that they frequently backfilled lifeboats and only had to fight for their own survival. As mentioned previously, higher-class passengers had access to luxurious equipment and cabins. Probably their way to the lifeboats were thus shorter and easier to control.

So let's dive deeper in to our the data!

#  <a id="3">3. Data Preparation & Visualization</a>
In the last chapter we got a first insight into our dataset. This time we want to delve deeper into our data and find first clues for the model generation and see for ourselfs what influenced the odd for survival. For that we will look at each feature individually and feature engineer if needed.

For example in chapter "[Dataset Preparation](#1)"  we have noticed that the features "Age", "Cabin" and "Embarked" contain missing values. With the information we have already received about these features, we can now decide on appropriate measures.

## <a id="3.1">3.1 PassengerID</a>
Since the "PassengerID" is an artificial feature and is only used to identify survivors, we do not gain any value by using this feature. For this reason, we remove the "PassengerID" from our records.

In [None]:
train_prep.drop(columns=["PassengerId"], inplace=True)
test_prep.drop(columns=["PassengerId"], inplace=True)

## <a id="3.2">3.2 Survived</a>
The feature "Survived" is our target variable. This means our later classification is trained and tested on this feature. This is why we should take a look at the distribution.

In [None]:
fig = go.Figure()

groups = train_prep.groupby(["Survived"]).count().reset_index()

data = go.Pie(
    labels = ["Died", "Survived"],
    values = [groups.Pclass[0], groups.Pclass[1]],
    marker=dict(colors=['#ff7f0e', '#1f77b4'])
)

layout = go.Layout(
    title='Survivors'
)

fig = go.Figure(data=[data], layout=layout)
iplot(fig)

As you can see, only 342 out of 891 people survived. This corresponds to a survival rate of 38% or a mortality rate of 62%. In other words, if our model always predicts death, we could automatically achieve 62% accuracy. 

## <a id="3.3">3.3 Pclass</a>
Pclass tells us which class was booked by the passenger. Available were first class, second class and third class. First class being the most luxurious and third class being the least luxurious. With this information we get an insight into the social enconomic background of the person.

In [None]:
fig = go.Figure()

groups = train_prep.groupby(["Pclass"]).count().reset_index()

trace1 = go.Bar(
    x = ["1st", "2nd", "3rd"],
    y = [groups.Sex[0], groups.Sex[1], groups.Sex[2]],
)

data = [trace1]
layout = go.Layout(
    title='Booked class',
    xaxis=dict(
        title='Class',
        titlefont=dict(
            family='Courier New, monospace',
            size=18,
            color='#7f7f7f'
        )
    ),
    yaxis=dict(
        title='No. of people',
        titlefont=dict(
            family='Courier New, monospace',
            size=18,
            color='#7f7f7f'
        )
    )
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

About 1/4 of all passengers of the Titanic travelled with the first class. The same applies to second-class passengers. Here the number of passengers was slightly more than 1/5. The remaining 491 persons and thus the majority of all passengers were accommodated in the third class.

In this context, we should look at the impact of accommodation on the chances of survival. 

In [None]:
fig = go.Figure()

groups = train_prep.groupby(["Pclass", "Survived"]).count().reset_index()

trace1 = go.Bar(
    x = ["1st", "2nd", "3rd"],
    y = [groups.Sex[1], groups.Sex[3], groups.Sex[5]],
    name = "Survived"
)

trace2 = go.Bar(
    x = ["1st", "2nd", "3rd"],
    y = [groups.Sex[0], groups.Sex[2], groups.Sex[4]],
    name = "Died"
)

data = [trace1, trace2]
layout = go.Layout(
    barmode='stack',
    title='Survivors/deaths of the different classes booked',
    xaxis=dict(
        title='Class',
        titlefont=dict(
            family='Courier New, monospace',
            size=18,
            color='#7f7f7f'
        )
    ),
    yaxis=dict(
        title='No. of people',
        titlefont=dict(
            family='Courier New, monospace',
            size=18,
            color='#7f7f7f'
        )
    )
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

As previously assumed, the booked class had a massive influence on the chance of survival. In general, the higher the class booked, the higher the chance of surviving the catastrophe. This is illustrated by the fact that 63% of first class passengers and 47% of second class passengers survived, while only 24% of third class passengers survived.

Further we use One-Hot-Encoding to convert the alphanummerical values into numerical values. This makes the values machine-readable.

In [None]:
train_prep = pd.get_dummies(train_prep, columns=["Pclass"])
test_prep = pd.get_dummies(test_prep, columns=["Pclass"])

## <a id="3.4">3.4 Name</a>

If you take a closer look at the feature "Name", you will notice that in addition to the traveler's name it also contains his title. With the title we can derive further information about age, status and family.

In [None]:
train_prep["Name"].head(10)

For this reason, we will extract the title from the Name field and introduce a new feature called "Title".
Since we don't need the passenger's name for further analysis and classification, we will remove this feature from our records.

Let's take a look at our new feature.

In [None]:
train_prep["Title"] = train_prep["Name"].str.extract(', ([A-Za-z]+)\.', expand=False)
test_prep["Title"] = test_prep["Name"].str.extract(', ([A-Za-z]+)\.', expand=False)

train_prep.drop(columns=["Name"], inplace=True)
test_prep.drop(columns=["Name"], inplace=True)
groups = train_prep.groupby(["Sex", "Title"], as_index=False)["Survived"].count()
groups

We need to ensure the quality of our new features. This is best done by checking that there are no empty fields.

In [None]:
train_prep[train_prep["Title"].isnull()]

In [None]:
test_prep[test_prep["Title"].isnull()]

Since there is only one missing value of a female, we manually set this value to "Ms".

Because some of the titles found have the same meaning, we group them and assign the same value. An example of this is "Miss" and "Ms". Here we will assign the value "Ms".

In [None]:
train_prep["Title"] = train_prep["Title"].replace(["Miss", "Mlle"], "Ms")
test_prep["Title"] = test_prep["Title"].replace(["Miss", "Mlle"], "Ms")

train_prep["Title"] = train_prep["Title"].replace(["Mme"], "Mrs")
test_prep["Title"] = test_prep["Title"].replace(["Mme"], "Mrs")

train_prep["Title"] = train_prep["Title"].fillna("Ms")

groups = train_prep.groupby(["Sex", "Title"], as_index=False)["Survived"].count()
groups

Further we convert the alphanummerical values into numerical values and build groups. This makes the values machine-readable.

In [None]:
#Group 1
train_prep["Title"] = train_prep["Title"].replace(["Ms", "Mrs", "Mr", "Sir", "Jonkheer", "Lady", "Don", "Dona"], "1")
#Group 2
train_prep["Title"] = train_prep["Title"].replace(["Dr", "Master"], "2")
#Group 3
train_prep["Title"] = train_prep["Title"].replace(["Major","Col", "Capt", "Rev"], "3")

#Group 1
test_prep["Title"] = test_prep["Title"].replace(["Ms", "Mrs", "Mr", "Sir", "Jonkheer", "Lady", "Don", "Dona"], "1")
#Group 2
test_prep["Title"] = test_prep["Title"].replace(["Dr", "Master"], "2")
#Group 3
test_prep["Title"] = test_prep["Title"].replace(["Major","Col", "Capt", "Rev"], "3")


## <a id="3.5">3.5 Sex</a>
We have already made the assumption that sex had a great influence on the chance of survival during the Titanic disaster. We will check this assumption below and use One-Hot-Encoding.

In [None]:
fig = go.Figure()

groups = train_prep.groupby(["Survived", "Sex"]).count().reset_index()

# Survived
trace1 = go.Bar(
    x = ["female", "male"],
    y = groups[(groups["Survived"] == 1)].Embarked,
    name = "Survived"
)

# Died
trace2 = go.Bar(
    x = ["female", "male"],
    y = groups[(groups["Survived"] == 0)].Embarked,
    name = "Died"
)

data = [trace1, trace2]
layout = go.Layout(
    barmode='stack',
    title='Survivors',
    xaxis=dict(
        title='Sex',
        titlefont=dict(
            family='Courier New, monospace',
            size=18,
            color='#7f7f7f'
        )
    ),
    yaxis=dict(
        title='No. of people',
        titlefont=dict(
            family='Courier New, monospace',
            size=18,
            color='#7f7f7f'
        )
    )
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

train_prep = pd.get_dummies(train_prep, columns=["Sex"], drop_first=True)
test_prep = pd.get_dummies(test_prep, columns=["Sex"], drop_first=True)

Apparently, a lot more women survived the sinking of the Titanic. As already suspected this could be due to the Birkenhead Drill.

## <a id="3.6">3.6 Age</a>
In the following we will analyze the age of the passengers. Since the age of some records is missing, we have to consider these values separately. There are several ways to do this:

* Delete entries
* Calculation of the average age
* Calculation of the average age per title
* Calculation of the average age per sex

For simplicity's sake, we use the average age across all records. 

In [None]:
train_prep_age_mean = train_prep["Age"].mean()
test_prep_age_mean = test_prep["Age"].mean()

train_prep["Age"] = train_prep["Age"].fillna(train_prep_age_mean)
test_prep["Age"] = test_prep["Age"].fillna(test_prep_age_mean)

bins = [0, 10, 20, 30, 40, 50, 60, 70, 80, np.inf]
labels = ['0', '1', '2', '3', '4', '5', '6', '7', '8']
train_prep['AgeGroup'] = pd.cut(train_prep["Age"], bins, labels=labels)
test_prep['AgeGroup'] = pd.cut(test_prep["Age"], bins, labels=labels)

data = [go.Histogram(x=train_prep["AgeGroup"], histnorm="probability")]
iplot(data)

Since we have already seen that gender has a large influence on the classification, we will consider the distribution of our data sets in combination with gender.

If the personnel on the Titanic actually held on to the Birkenheaddrill, this would have had an effect on the chances of survival of certain age groups.

In [None]:
showLegend = [True,False]

data = []
for i in range(0,len(pd.unique(train_prep['Survived']))):
    male = {
            "type": 'violin',
            "x": train_prep['Survived'][ (train_prep['Sex_male'] == 1) & (train_prep['Survived'] == pd.unique(train_prep['Survived'])[i]) ],
            "y": train_prep['Age'][ (train_prep['Sex_male'] == 1) & (train_prep['Survived'] == pd.unique(train_prep['Survived'])[i]) ],
            "name": 'male',
            "side": 'negative',
            "showlegend": showLegend[i],
            "line": {
                "color": '#1f77b4'
            }
        }
    data.append(male)
    female = {
            "type": 'violin',
            "x": train_prep['Survived'][ (train_prep['Sex_male'] == 0) & (train_prep['Survived'] == pd.unique(train_prep['Survived'])[i]) ],
            "y": train_prep['Age'][ (train_prep['Sex_male'] == 0) & (train_prep['Survived'] == pd.unique(train_prep['Survived'])[i]) ],
            "name": 'female',
            "side": 'positive',
            "showlegend": showLegend[i],
            "line": {
                "color": '#ff7f0e'
            }
        }
    data.append(female)
        

fig = {
    "data": data,
    "layout" : {
        "title": "Age distribution by sex and survival",
        "yaxis": {
            "zeroline": True,
        },
        "violingap": 0,
        "violinmode": "overlay"
    }
}


iplot(fig, validate = False)

train_prep.drop(columns="Age", inplace=True)
test_prep.drop(columns="Age", inplace=True)

As predicted, age has a similar effect on the chance of survival as gender. This is particularly evident in the age group from 0 to 15 years. There are very few deaths here.

## <a id="3.7">3.7 SibSp & Parch</a>
These two features indicate the number of accompanying family members and differentiate between parents/children and spouses/other relatives. 

Our assumption regarding these features was that the chance of survival decreases with the number of accompanying relatives. This could be due to the fact that the families often only boarded the lifeboats together.


In [None]:
family = train_prep
family["familymembers"] = train_prep["Parch"] + train_prep["SibSp"]

groups = family.groupby(["familymembers"]).count().reset_index()

fig = go.Figure()

data = go.Pie(
    values = groups["Survived"]
)

layout = go.Layout(
    title='Distribution of persons by number of family members'
)

fig = go.Figure(data=[data], layout=layout)
iplot(fig)

We see that most of the travelers boarded the Titanic without family members. Very few people had more than three family members on board.

Now did the number of family members on board the Titanic affect the chance of survival?

In [None]:
groups = family.groupby(["familymembers", "Survived"]).count().reset_index()

barplot = sns.barplot(x="familymembers", y="Fare", hue="Survived", data=groups)
barplot.set_title("Survivor/Dead by number of family members")
barplot.set_xlabel("No. of family members")
barplot.set_ylabel("No. of people")
plt.tight_layout()
train_prep.drop(columns=["familymembers"], inplace=True)

In [None]:
'''
How can i do the previous grapic in plotlywithout stacking the last two groups?

groups = family.groupby(["familymembers", "Survived"]).count().reset_index()

fig = go.Figure()

trace1 = go.Bar(
    y = groups.iloc[::2, :]["Age"],
    name = "Survived"
)

trace2 = go.Bar(
    x = ["1st", "2nd", "3rd"],
    y = [groups.Sex[0], groups.Sex[2], groups.Sex[4]],
    name = "Died"
)

data = [trace1, trace2]
layout = go.Layout(
    barmode='stack',
    title='Survivors/deaths of the different classes booked',
    xaxis=dict(
        title='Class',
        titlefont=dict(
            family='Courier New, monospace',
            size=18,
            color='#7f7f7f'
        )
    ),
    yaxis=dict(
        title='No. of people',
        titlefont=dict(
            family='Courier New, monospace',
            size=18,
            color='#7f7f7f'
        )
    )
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)
'''

Obviously, yes. We can see this from the fact that families up to three people survived more frequently, while larger families had a lower chance of survival. 

If you take a closer look at the data, you can see that the smaller families often consisted of parents and children. According to the Birkenheaddrill logic, women and children would survive with a very high probability. 

But what is the reason why the chance of survival in larger families decreased? A valid assumption for this could be that the coordination of a large family was much more complicated in the circumstances prevailing on the Titanic.

## <a id="3.8">3.8 Ticket</a>
This feature that hase no impact on the outcome variable. Thus, it will be excluded from analysis.

In [None]:
train_prep.drop(columns=["Ticket"], inplace=True)
test_prep.drop(columns=["Ticket"], inplace=True)

## <a id="3.9">3.9 Fare</a>
The "Fare" feature simply describes the price the person has paid for the travel ticket. Since it is a numeric value, we can display the values without further feature engineering.

In [None]:
trace1 = go.Box(
    y=train_prep[train_prep["Survived"] == 1]["Fare"],
    name="Survived"
)

trace2 = go.Box(
    y=train_prep[train_prep["Survived"] == 0]["Fare"],
    name="Died"
)

data=[trace1, trace2]

iplot(data)

You can see that the dataset contains some outliers. Since outliers in input data could skew and mislead the training process of machine learning algorithms resulting in longer training times, less accurate models and ultimately poorer results. 

In this case, we do not have to eliminate these data sets, as the Titanic was a very luxurious ship. It is therefore not surprising that some people paid exceptionally high prices for the voyage. Furthermore the ticket price influences the mortality.

## <a id="3.10">3.10 Cabin</a>
This feature is missing most of it's values. Nevertheless, we use the feature for prediction. Therefore we will classify the feature binary.

In [None]:
train_prep.loc[train_prep['Cabin'].notnull(), 'Cabin'] = 1
test_prep.loc[test_prep['Cabin'].notnull(), 'Cabin'] = 1

train_prep["Cabin"].fillna(0, inplace=True)
test_prep["Cabin"].fillna(0, inplace=True)






## <a id="3.11">3.11 Embarked</a>

"Embarked" describes the port where the person embarked on the Titanic. Possibly there were a lot of passengers of the same class living in a certain region. Such accumulations could also be an indication of the person's wealth and survival.

In [None]:
fig = go.Figure()

groups = train_prep.groupby(["Embarked"]).count().reset_index()

data = go.Pie(
    labels = ["Cherbourg", "Queenstown", "Southampton"],
    values = groups.Fare
)

layout = go.Layout(
    title='No. of Passenger embarked per Port'
)

fig = go.Figure(data=[data], layout=layout)
iplot(fig)

In [None]:
fig = go.Figure()
x = ["Cherbourg", "Queenstown", "Southampton"]

groups = train_prep.groupby(["Embarked", "Survived"]).count().reset_index()

trace1 = go.Bar(
    x = x,
    y = [groups.Fare[1], groups.Fare[3], groups.Fare[5]],
    name = "Survived"
)

trace2 = go.Bar(
    x = x,
    y = [groups.Fare[0], groups.Fare[2], groups.Fare[4]],
    name = "Died"
)



data = [trace1, trace2]
layout = go.Layout(
    barmode='stack',
    title='Survivors/deaths by embarked port',
    xaxis=dict(
        title='Class',
        titlefont=dict(
            family='Courier New, monospace',
            size=18,
            color='#7f7f7f'
        )
    ),
    yaxis=dict(
        title='No. of people',
        titlefont=dict(
            family='Courier New, monospace',
            size=18,
            color='#7f7f7f'
        )
    )
)

fig = go.Figure(data=data, layout=layout)
iplot(fig)

Apparently, the Embarked feature has a minor impact on the classification. This shows that many of the people who boarded  the Titanic at Southhampton Harbour died. With an in-depth analysis of the age and ticket price, further statements can certainly be made. However, this is not part of the analysis.

Embarked" is a categorical feature and we will convert it with one-hot encoding.

In [None]:
train_prep = pd.get_dummies(train_prep, columns=["Embarked"], drop_first=True)
test_prep = pd.get_dummies(test_prep, columns=["Embarked"], drop_first=True)

# <a id="4">4. Classification and Submission</a> 

In this section we will classify the test data and prepare it for submission. First we will train and select a suitable model based on the training data.

However, before we start with the classification, we will briefly check the quality of the data from the prepared data.

In [None]:
train_prep.info()

In [None]:
test_prep.info()

In [None]:
train_prep.head()

In [None]:
test_prep.head()

The following points still need to be improved:

* Zero values in the test data
* Normalization of training and test data

In [None]:
test_prep["Fare"] = test_prep["Fare"].fillna(test_prep["Fare"].mean())

scaler = MinMaxScaler()

train_prep[["SibSp", "Parch", "Fare", "Title"]] = scaler.fit_transform(train_prep[["SibSp", "Parch", "Fare", "Title"]])
test_prep[["SibSp", "Parch", "Fare", "Title"]] = scaler.fit_transform(test_prep[["SibSp", "Parch", "Fare", "Title"]])

Now that we have finished optimizing, we can proceed with the selection of an appropriate model. For this we try different classifier on our dataset and see what fits best.

In [None]:
X_train = train_prep.drop(columns=["Survived"])
y_train = train_prep["Survived"]
X_test = test_prep

names = ["Nearest Neighbors", "Linear SVM", "RBF SVM", "Gaussian Process",
         "Decision Tree", "Random Forest", "Neural Net", "AdaBoost",
         "Naive Bayes", "QDA"
]

classifiers = [
    KNeighborsClassifier(),
    SVC(kernel="linear"),
    SVC(kernel="rbf"),
    GaussianProcessClassifier(),
    tree.DecisionTreeClassifier(max_depth=3),
    RandomForestClassifier(max_depth=3),
    MLPClassifier(),
    AdaBoostClassifier(),
    GaussianNB(),
    QuadraticDiscriminantAnalysis()
]

results = {}
for name, clf in zip(names, classifiers):
    scores = cross_val_score(clf, X_train, y_train, cv=5)
    results[name] = scores
    
for name, scores in results.items():
    print("%20s | Accuracy: %0.2f%% (+/- %0.2f%%)" % (name, 100*scores.mean(), 100*scores.std() * 2))

In this case it's the Neural Net which performs really well. So we will us this as our submission. 

In [None]:
nn = classifiers[3]
nn.fit(X_train, y_train)
predictions = nn.predict(X_test)

submission = pd.DataFrame({ 'PassengerId' : test["PassengerId"], 'Survived': predictions })
submission.to_csv('submission_gp.csv', index=False)

In [None]:
nn = classifiers[4]
nn.fit(X_train, y_train)
predictions = nn.predict(X_test)

submission = pd.DataFrame({ 'PassengerId' : test["PassengerId"], 'Survived': predictions })
submission.to_csv('submission_dt.csv', index=False)

In [None]:
nn = classifiers[5]
nn.fit(X_train, y_train)
predictions = nn.predict(X_test)

submission = pd.DataFrame({ 'PassengerId' : test["PassengerId"], 'Survived': predictions })
submission.to_csv('submission_rf.csv', index=False)

In [None]:
nn = classifiers[6]
nn.fit(X_train, y_train)
predictions = nn.predict(X_test)

submission = pd.DataFrame({ 'PassengerId' : test["PassengerId"], 'Survived': predictions })
submission.to_csv('submission_nn.csv', index=False)

In [None]:
nn = classifiers[7]
nn.fit(X_train, y_train)
predictions = nn.predict(X_test)

submission = pd.DataFrame({ 'PassengerId' : test["PassengerId"], 'Survived': predictions })
submission.to_csv('submission_ab.csv', index=False)

If you've come this far thank you for reading! If you have any feedback, I would be very happy to hear from you.