# Logistic Regression

## The goal of this machine learning project is to predict the likelyhood of survival for passengers on the Titanic.  At the end of the Jupyter Notebook, the fictional characters Jack, Rose, and myself are predicted for our chances of survival.

In [2]:
import seaborn
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

### Obtaining and cleaning the data:

In [3]:

# Load the passenger data
passengers = pd.read_csv('passengers.csv', header=0)
#print(passengers.head())

# Update sex column to numerical
passengers['sex-int'] = passengers['Sex'].apply(lambda row: 1 if row == 'female' else 0)

# Fill the nan values in the age column
passengers['Age'].fillna(value=passengers['Age'].mean(), inplace=True)

#print(passengers['Age'].values)

# Create a first class column
passengers['FirstClass'] = passengers['Pclass'].apply(lambda row: 1 if row == 1 else 0)

# Create a second class column
passengers['SecondClass'] = passengers['Pclass'].apply(lambda row: 1 if row == 2 else 0)
print(passengers.head())

# Select the desired features
features = passengers[['sex-int', 'Age', 'FirstClass', 'SecondClass']]
survival = passengers['Survived']


   PassengerId  Survived  Pclass  \
0            1         0       3   
1            2         1       1   
2            3         1       3   
3            4         1       1   
4            5         0       3   

                                                Name     Sex   Age  SibSp  \
0                            Braund, Mr. Owen Harris    male  22.0      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                             Heikkinen, Miss. Laina  female  26.0      0   
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                           Allen, Mr. William Henry    male  35.0      0   

   Parch            Ticket     Fare Cabin Embarked  sex-int  FirstClass  \
0      0         A/5 21171   7.2500   NaN        S        0           0   
1      0          PC 17599  71.2833   C85        C        1           1   
2      0  STON/O2. 3101282   7.9250   NaN        S        1           0   
3      0            

### Training, splitting and regularizing the data:

In [4]:
# Perform train, test, split
X_train, X_test, y_train, y_test = train_test_split(features, survival, test_size=0.3)

# Scale the feature data so it has mean = 0 and standard deviation = 1
scaler = StandardScaler()
scaler.fit_transform(X_train)
scaler.transform(X_test)

# Create and train the model
model = LogisticRegression()
model.fit(X_train, y_train)

# Score the model on the train data
print("The model's accuracy on training data is: ", model.score(X_train, y_train))

# Score the model on the test data
print("The model's accuracy on testing data is: ", model.score(X_test, y_test))

# Analyze the coefficients
print("The coefficients of the model are: ", model.coef_)


The model's accuracy on training data is:  0.7913322632423756
The model's accuracy on testing data is:  0.8097014925373134
The coefficients of the model are:  [[ 2.36345342 -0.02788825  2.23359203  1.13126195]]


### The sample passenger's data is made into an array, regularized, and fit into the model.

In [5]:
# Sample passenger features
Jack = np.array([0.0,20.0,0.0,0.0])
Rose = np.array([1.0,17.0,1.0,0.0])
Justin  = np.array([0.0, 23.0, 1.0, 0.0]) #Me

# Combine passenger arrays
sample_passengers = np.array([Jack, Rose, Justin])

# Scale the sample passenger features
scaler.transform(sample_passengers)
#print(sample_passengers)

# Make survival predictions!
print(model.predict(sample_passengers))
# Jack is the only sample passenger predicted to die with the Titanic

print(model.predict_proba(sample_passengers))

[0 1 1]
[[0.87828037 0.12171963]
 [0.06271043 0.93728957]
 [0.45668841 0.54331159]]
