# Titanic Modeling: Basic Version - EXPLORATION
**By Jason "Scott" Person**<br/>
**Data Analytics @ Newman University**

**Data:** A previously cleaned version of [the Titanic data set from Kaggle](https://www.kaggle.com/c/titanic/overview).

**This Notebook:** This is crafted as a demonstration of a standard machine learning training and testing process.

**Contents:**
1. Read and Review Data
2. Prepare Data Splits
3. Train Models
4. Test Models

In [0]:
# Essential Libraries
import numpy as np
import pandas as pd

# Libraries for Machine Learning Process
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Algorithms
from sklearn.linear_model import LogisticRegression

# 1. Read and Review Data

This data has been cleaned in a previous EDA and preparation process.

In [0]:
# Read cleaned version of the data
df = pd.read_csv('titanic_cleaned.csv')
df.head(10)

In [0]:
# Dataframe fundamental info
df.info()

# 2. Prepare Data Splits

In [0]:
# features — all columns except target variable
features = df.drop('Survived', axis=1)

# labels — only the target variable column
labels = df['Survived']

In [0]:
features

In [0]:
labels.head(20)

In [0]:
# Create Train and Test Splits
X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.2, random_state=42)

# Report Number and Proportion of Train and Test Features and Labels
print(f'Train Split: {X_train.shape[0]} Records, {len(y_train)} Labels = {round(len(y_train)/len(labels), 4) * 100}%')
print(f'Test Split: {X_test.shape[0]} Records, {len(y_test)} Labels = {round(len(y_test)/len(labels), 4) * 100}%')

In [0]:
X_train

In [0]:
y_train

In [0]:
y_test

In [0]:
X_test

# 3. Train Models

In [0]:
# Define the model
model = LogisticRegression()

# Train the model using the training features and labels
model.fit(X_train, y_train)

# Report trained model
print(f'Trained and ready: {model}')

# 4. Test Models

In [0]:
# Use the model to generate predictions for the Test split, based on its features only
y_pred = model.predict(X_test)

# Compare model's predictive performance to the provided test labels
score = accuracy_score(y_test, y_pred) * 100

# Report the model and its score
print(model)
print(f'  {score}')

In [0]:
round(accuracy_score(y_test, y_pred) * 100,4)

In [0]:
y_pred

In [0]:
y_test