# Predict Titanic Survival
The RMS Titanic set sail on its maiden voyage in 1912, crossing the Atlantic from Southampton, England to New York City. The ship never completed the voyage, sinking to the bottom of the Atlantic Ocean after hitting an iceberg, bringing down 1,502 of 2,224 passengers onboard.

In this project you will create a Logistic Regression model that predicts which passengers survived the sinking of the Titanic, based on features like age and class.

### Import libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

### Load the passenger data

In [None]:
passengers = pd.read_csv('../input/titanic/titanic_train.csv')
passengers.head()

### Clean the data
Given the saying, “women and children first,” `Sex` and `Age` seem like good features to predict survival. Let’s map the text values in the `Sex` column to a numerical value. Update `Sex` such that all values `female` are replaced with 1 and all values `male` are replaced with 0

In [None]:
genderUpdate = lambda x: 0 if (x == 'male') else (1 if x == 'female' else x)
passengers['Sex'] = passengers['Sex'].map(genderUpdate)
passengers.head()

#### Fill all the empty Age values in passengers with the mean age.

In [None]:
meanAge = passengers['Age'].mean()
print(meanAge)
passengers['Age'] = passengers['Age'].replace(np.nan,meanAge)
passengers.head()

Given the strict class system onboard the Titanic, let’s utilize the `Pclass` column, or the passenger class, as another feature. Create a new column named `FirstClass` that stores `1` for all passengers in first class and `0` for all other passengers.

Create a new column named `SecondClass` that stores `1` for all passengers in second class and `0` for all other passengers.

In [None]:
toReplace1 = lambda x: 1 if x == 1 else 0
firstClass = passengers['Pclass'].apply(toReplace1)
passengers['FirstClass'] = firstClass

In [None]:
toReplace2 = lambda x: 1 if x==2 else 0
secondClass = passengers['Pclass'].apply(toReplace2)
passengers['SecondClass'] = secondClass
passengers.head(20)

## Select and split the data

Now that we have cleaned our data, let’s select the columns we want to build our model on. Select columns `Sex`, `Age`, `FirstClass`, and `SecondClass` and store them in a variable named `features`. Select column `Survived` and store it a variable named `survival`.

In [None]:
features = passengers[['Sex', 'Age', 'FirstClass', 'SecondClass']]
survival = passengers['Survived']

Split the data into training and test sets using sklearn‘s `train_test_split()` method. We’ll use the training set to train the model and the test set to evaluate the model.

In [None]:
train_features, test_features, survival_train, survival_test = train_test_split(features, survival)

# Normalize the data

In [None]:
scaler = StandardScaler()
train_features = scaler.fit_transform(train_features)
test_features = scaler.transform(test_features)

# Create and Evaluate the Model

In [None]:
model = LogisticRegression()
model.fit(train_features, survival_train)
print(model.score(train_features, survival_train))
print(model.score(test_features, survival_test))

print(model.coef_)

# Graph Data

Now, the data will be graph according to the survival and dead distribution and the gender of the crew members

In [None]:
survivors = passengers.where(passengers['Survived'] == 1).dropna()
survivedW = survivors['Age'].where(survivors['Sex'] == 1).dropna()
survivedM = survivors['Age'].where(survivors['Sex'] == 0).dropna()

fig, (ax0, ax1) = plt.subplots(1,2, figsize=(15,5))

ax0.hist(survivedW,alpha=0.5,color='#488f31',bins=25)
ax0.hist(survivedM,alpha=0.5,color='#de425b',bins=25)
ax0.set_title('Distribution of survivors')
ax0.legend(['Women','Men'])
ax0.set_ylabel('Frequency')
ax0.set_xlabel('Age')

dead = passengers.where(passengers['Survived'] == 0).dropna()
deadW = dead['Age'].where(dead['Sex'] == 1).dropna()
deadM = dead['Age'].where(dead['Sex'] == 0).dropna()

ax1.hist(deadW,alpha=0.5,color='red',bins=25)
ax1.hist(deadM,alpha=0.5,color='blue',bins=25)
ax1.set_title('Distribution of deads')
ax1.legend(['Women','Men'])
ax1.set_xlabel('Age')
ax1.set_ylabel('Frequency')
plt.show()

# Predict with the model
Let’s use our model to make predictions on the survival of a few fateful passengers. Provided in the code editor is information for 3rd class passenger `Jack` and 1st class passenger `Rose`, stored in `NumPy` arrays. The arrays store 4 feature values, in the following order:

* `Sex`, represented by a `0` for `male` and `1` for `female`
* `Age`, represented as an integer in years
* `FirstClass`, with a `1` indicating the passenger is in first class
* `SecondClass`, with a `1` indicating the passenger is in second class
A third array, `You`, is also provided in the code editor with empty feature values. Uncomment the line containing `You` and update the array with your information, or the information for some fictitious passenger. Make sure to enter all values as floats with a `.`!

In [None]:
Jack = np.array([0.0, 20.0, 0.0, 0.0])
Rose = np.array([1.0, 17.0, 1.0, 0.0])
You = np.array([1.0, 27.0, 1.0, 0.0])

In [None]:
sample_Passengers= np.array([Jack, Rose, You])
sample_Passengers_Scaled = scaler.transform(sample_Passengers)
print(model.predict(sample_Passengers_Scaled))