<h1 style="color:red">Titanic Passenger Survival Prediction System – Machine Learning Cycle</h1>

# Machine Learning Cycle

### Four phases of a Machine Learning Cycle are

### Training Phase

    Build the Model using Training Data

### Testing Phase

     Evaluate the performance of Model using Testing Data

### Application Phase

     Deploy the Model in the Real-world, to predict Real-time unseen Data

### Feedback Phase

    Take Feedback from the Users and Domain Experts to improve the Model


<h1 style="color:red">Executing Machine Learning Cycle Using a Single File</h1>

# Step 1: Import Libraries

In [3]:
pip install numpy pandas scikit-learn prettytable astropy


Note: you may need to restart the kernel to use updated packages.


In [1]:
# Import Libraries

import numpy as np
import pandas as pd
import pickle

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn import svm
from sklearn.metrics import accuracy_score

from prettytable import PrettyTable   
from astropy.table import Table, Column

# Step 2: Load Sample Data

In [2]:
# Load Sample Data

sample_data = pd.read_csv("sample-data.csv")

print("\n\nSample Data:")
print("============\n")
pd.set_option("display.max_rows", None, "display.max_columns", None)
print(sample_data)



Sample Data:

    PClass  Gender Sibling     Embarked Survived
0    Third    Male     One  Southampton       No
1   Second  Female    Zero  Southampton      Yes
2    Third    Male    Zero  Southampton       No
3    Third  Female   Three  Southampton      Yes
4    Third    Male    Zero   Queenstown       No
5    First  Female   Three  Southampton      Yes
6    Third    Male    Zero  Southampton       No
7    Third    Male    Zero  Southampton      Yes
8    First    Male    Zero  Southampton       No
9   Second    Male    Zero  Southampton      Yes
10   Third    Male     One   Queenstown       No
11   First    Male    Zero    Cherbourg      Yes
12   First    Male    Zero    Cherbourg       No
13  Second  Female    Zero  Southampton      Yes
14  Second    Male    Zero  Southampton       No
15   Third    Male     One    Cherbourg      Yes
16   Third    Male     Two    Cherbourg       No
17   Third    Male    Zero  Southampton      Yes
18   Third    Male    Zero  Southampton       No
19  

# Step 3: Understand and Pre-process Sample Data

## Step 3.1: Understand Sample Data

In [3]:
# Understand Sample Data

print("\n\nAttributes in Sample Data:")
print("==========================\n")

print(sample_data.columns)

print("\n\nNumber of Instances in Sample Data:",sample_data["PClass"].count())
print("========================================\n")



Attributes in Sample Data:

Index(['PClass', 'Gender', 'Sibling', 'Embarked', 'Survived'], dtype='object')


Number of Instances in Sample Data: 100



## Step 3.2: Pre-process Sample Data
    o	Sample Data is already Preprocessed
    o	No Preprocessing needs to be Performed 

# Step 4: Feature Extraction
    o	Features are already Extracted
    o	No Feature Extraction needs to be Performed

# Step 5: Label Encoding the Sample Data (Input and Output is converted in Numeric Representation)

## Step 5.1: Train the Label Encoder

In [4]:
# Train the Label Encoder
# Labels

pclass = pd.DataFrame({"Pclass":["First","Second","Third"]})
gender = pd.DataFrame({"Gender":["Male","Female"]})
sibling = pd.DataFrame({"Sibling":["Zero","One","Two","Three"]})
embarked = pd.DataFrame({"Embarked":["Southampton","Cherbourg","Queenstown"]})
survived = pd.DataFrame({"Survived":["Yes","No"]})

# Initialize the Label Encoders 

pclass_label_encoder = LabelEncoder()
gender_label_encoder = LabelEncoder()
sibling_label_encoder = LabelEncoder()
embarked_label_encoder = LabelEncoder()
survived_label_encoder = LabelEncoder()

# Train the Label Encoders

pclass_label_encoder.fit(np.ravel(pclass))
gender_label_encoder.fit(np.ravel(gender))
sibling_label_encoder.fit(np.ravel(sibling))
embarked_label_encoder.fit(np.ravel(embarked))
survived_label_encoder.fit(np.ravel(survived))

## Step 5.2: Label Encode the Output

In [5]:
# Label Encoding of the Output

sample_data_encoded_output = sample_data.copy()
original_sample_data = sample_data.copy()

# Transform Output of into Numerical Representation

print("\n\nSurvived Attribute After Label Encoding:")
print("========================================\n")
sample_data["encoded_survived"] = survived_label_encoder.transform(sample_data['Survived'])
print(sample_data[["Survived", "encoded_survived"]])

# Print Original and Encoded Ouput Sample Data

sample_data_encoded_output[['PClass', 'Gender', 'Sibling', 'Embarked', 'Survived']] = sample_data[['PClass', 'Gender', 'Sibling', 'Embarked', 'encoded_survived']]
pd.set_option("display.max_rows", None, "display.max_columns", None)
print("\n\nOriginal Sample Data:")
print("=====================\n")
pd.set_option("display.max_rows", None, "display.max_columns", None)
print(original_sample_data)
print("\n\nSample Data after Label Encoding of Output:")
print("===========================================\n")
pd.set_option("display.max_rows", None, "display.max_columns", None)
print(sample_data_encoded_output)

# Save the Transformed Features into CSV File 

sample_data_encoded_output.to_csv(r'sample-data-encoded-output.csv', index = False, header = True)



Survived Attribute After Label Encoding:

   Survived  encoded_survived
0        No                 0
1       Yes                 1
2        No                 0
3       Yes                 1
4        No                 0
5       Yes                 1
6        No                 0
7       Yes                 1
8        No                 0
9       Yes                 1
10       No                 0
11      Yes                 1
12       No                 0
13      Yes                 1
14       No                 0
15      Yes                 1
16       No                 0
17      Yes                 1
18       No                 0
19      Yes                 1
20       No                 0
21      Yes                 1
22       No                 0
23      Yes                 1
24       No                 0
25      Yes                 1
26       No                 0
27      Yes                 1
28       No                 0
29      Yes                 1
30       No               

## Step 5.3: Label Encode the Input

In [6]:
# Label Encoding of the Input

sample_data_encoded = sample_data_encoded_output.copy()
sample_data_encoded_output_orignal = sample_data_encoded_output.copy()

# Transform Input Attributes into Numerical Representation

print("\n\nPClass Attribute After Label Encoding:")
print("======================================\n")
sample_data_encoded_output["encoded_pclass"] = pclass_label_encoder.transform(sample_data_encoded_output['PClass'])
pd.set_option("display.max_rows", None, "display.max_columns", None)
print(sample_data_encoded_output[["PClass", "encoded_pclass"]])

print("\n\nGender Attribute After Label Encoding:")
print("======================================\n")
sample_data_encoded_output["encoded_gender"] = gender_label_encoder.transform(sample_data_encoded_output['Gender'])
pd.set_option("display.max_rows", None, "display.max_columns", None)
print(sample_data_encoded_output[["Gender", "encoded_gender"]])

print("\n\nSibling Attribute After Label Encoding:")
print("=======================================\n")
sample_data_encoded_output["encoded_sibling"] = sibling_label_encoder.transform(sample_data_encoded_output['Sibling'])
pd.set_option("display.max_rows", None, "display.max_columns", None)
print(sample_data_encoded_output[["Sibling", "encoded_sibling"]])

print("\n\nEmbarked Attribute After Label Encoding:")
print("========================================\n")
sample_data_encoded_output["encoded_embarked"] = embarked_label_encoder.transform(sample_data_encoded_output['Embarked'])
pd.set_option("display.max_rows", None, "display.max_columns", None)
print(sample_data_encoded_output[["Embarked", "encoded_embarked"]])

# Print Original and Encoded Sample Data

sample_data_encoded[['PClass', 'Gender', 'Sibling', 'Embarked', 'Survived']] = sample_data_encoded_output[['encoded_pclass', 'encoded_gender', 'encoded_sibling', 'encoded_embarked', 'Survived']]
print("\n\nOriginal Sample Data:")
print("=====================\n")
pd.set_option("display.max_rows", None, "display.max_columns", None)
print(original_sample_data)
print("\n\nSample Data after Label Encoding:")
print("=================================\n")
pd.set_option("display.max_rows", None, "display.max_columns", None)
print(sample_data_encoded)

# Save the Transformed Features into CSV File 

sample_data_encoded.to_csv(r'sample-data-encoded.csv', index = False, header = True)



PClass Attribute After Label Encoding:

    PClass  encoded_pclass
0    Third               2
1   Second               1
2    Third               2
3    Third               2
4    Third               2
5    First               0
6    Third               2
7    Third               2
8    First               0
9   Second               1
10   Third               2
11   First               0
12   First               0
13  Second               1
14  Second               1
15   Third               2
16   Third               2
17   Third               2
18   Third               2
19   Third               2
20   Third               2
21   First               0
22   First               0
23  Second               1
24   Third               2
25   Third               2
26   Third               2
27   Third               2
28   Third               2
29   Third               2
30   Third               2
31   First               0
32   Third               2
33   First               0
34   First   

# Step 6: Execute the Training Phase 

## Step 6.1: Splitting Sample Data into Training Data and Testing Data

In [7]:
# Splitting Sample Data into Training Data and Testing Data

training_data_encoded, testing_data_encoded = train_test_split( sample_data_encoded , test_size=0.2 , random_state=0 , shuffle = False)

# Save the Training and Testing Data into CSV File 

training_data_encoded.to_csv(r'training-data-encoded.csv', index = False, header = True)
testing_data_encoded.to_csv(r'testing-data-encoded.csv', index = False, header = True)

# print Training and Testing Data

print("\n\nTraining Data:")
print("==============\n")
pd.set_option("display.max_rows", None, "display.max_columns", None)
print(training_data_encoded)
print("\n\nTesting Data:")
print("==============\n")
pd.set_option("display.max_rows", None, "display.max_columns", None)
print(testing_data_encoded)



Training Data:

    PClass  Gender  Sibling  Embarked  Survived
0        2       1        0         2         0
1        1       0        3         2         1
2        2       1        3         2         0
3        2       0        1         2         1
4        2       1        3         1         0
5        0       0        1         2         1
6        2       1        3         2         0
7        2       1        3         2         1
8        0       1        3         2         0
9        1       1        3         2         1
10       2       1        0         1         0
11       0       1        3         0         1
12       0       1        3         0         0
13       1       0        3         2         1
14       1       1        3         2         0
15       2       1        0         0         1
16       2       1        2         0         0
17       2       1        3         2         1
18       2       1        3         2         0
19       2       0    

## Step 6.2: Splitting Input Vectors and Outputs / Labels of Training Data

In [8]:
# Splitting Input Vectors and Outputs / Labels of Training Data


print("\n\nInputs Vectors (Feature Vectors) of Training Data:")
print("==================================================\n")
input_vector_train = training_data_encoded.iloc[: , :-1]
print(input_vector_train)

print("\n\nOutputs/Labels of Training Data:")
print("================================\n")
print("  Survived")
output_label_train = training_data_encoded.iloc[: ,-1]
print(output_label_train)



Inputs Vectors (Feature Vectors) of Training Data:

    PClass  Gender  Sibling  Embarked
0        2       1        0         2
1        1       0        3         2
2        2       1        3         2
3        2       0        1         2
4        2       1        3         1
5        0       0        1         2
6        2       1        3         2
7        2       1        3         2
8        0       1        3         2
9        1       1        3         2
10       2       1        0         1
11       0       1        3         0
12       0       1        3         0
13       1       0        3         2
14       1       1        3         2
15       2       1        0         0
16       2       1        2         0
17       2       1        3         2
18       2       1        3         2
19       2       0        0         0
20       2       1        3         0
21       0       0        3         2
22       0       1        3         0
23       1       0        3       

## 6.3: Train the Support Vector Classifier

In [9]:
# Train the Support Vector Classifier

print("\n\nTraining the Support Vector Classifier on Training Data")
print("========================================================\n")
print("\nParameters and their values:")
print("============================\n")
svc_model = svm.SVC(gamma='auto',random_state=0)
svc_model.fit(input_vector_train,np.ravel(output_label_train))
print(svc_model)



Training the Support Vector Classifier on Training Data


Parameters and their values:

SVC(gamma='auto', random_state=0)


## Step 6.4: Save the Trained Model

In [10]:
# Save the Trained Model

# Save the Model in a Pkl File

pickle.dump(svc_model, open('svc_trained_model.pkl', 'wb'))

# Step 7: Execute the Testing Phase 

## Step 7.1: Splitting Input Vectors and Outputs/Labels of Testing Data

In [11]:
# Splitting Input Vectors and Outputs/Labels of Testing Data
print("\n\nInputs Vectors (Feature Vectors) of Testing Data:")
print("=================================================\n")
input_vector_test = testing_data_encoded.iloc[: , :-1]
print(input_vector_test)

print("\n\nOutputs/Labels of Testing Data:")
print("==============================\n")
print("  Survived")
output_label_test = testing_data_encoded.iloc[: ,-1]
print(output_label_test)



Inputs Vectors (Feature Vectors) of Testing Data:

    PClass  Gender  Sibling  Embarked
80       2       1        2         2
81       2       0        3         2
82       2       1        3         2
83       0       1        0         2
84       1       1        3         2
85       2       0        0         1
86       1       1        3         2
87       0       0        0         2
88       2       1        3         2
89       1       0        3         2
90       1       1        3         2
91       1       0        3         2
92       2       1        3         0
93       0       1        0         2
94       0       1        0         2
95       2       1        0         0
96       2       1        0         1
97       2       1        0         2
98       1       1        0         2
99       2       1        0         2


Outputs/Labels of Testing Data:

  Survived
80    0
81    1
82    0
83    1
84    0
85    1
86    0
87    1
88    0
89    1
90    0
91    1
92    0

## Step 7.2: Load the Saved Model

In [12]:
# Load the Saved Model
# Load the Saved Model

model = pickle.load(open('svc_trained_model.pkl', 'rb'))

## Step 7.3: Evaluate the Machine Learning Model
### Step 7.3.1: Make Predictions with the Trained Models on Testing Data

In [13]:
# Evaluate the Machine Learning Model

# Provide Test data to the Trained Model

model_predictions = model.predict(input_vector_test)
testing_data_encoded.copy(deep=True)
pd.options.mode.chained_assignment = None
testing_data_encoded["Predictions"] = model_predictions

# Save the Predictions into CSV File

testing_data_encoded.to_csv(r'model-predictions.csv', index = False, header = True)

model_predictions = testing_data_encoded 
print("\n\nPredictions Returned by svc_trained_model:")
print("==========================================\n")
print(model_predictions)



Predictions Returned by svc_trained_model:

    PClass  Gender  Sibling  Embarked  Survived  Predictions
80       2       1        2         2         0            0
81       2       0        3         2         1            1
82       2       1        3         2         0            0
83       0       1        0         2         1            0
84       1       1        3         2         0            0
85       2       0        0         1         1            1
86       1       1        3         2         0            0
87       0       0        0         2         1            1
88       2       1        3         2         0            0
89       1       0        3         2         1            1
90       1       1        3         2         0            0
91       1       0        3         2         1            1
92       2       1        3         0         0            0
93       0       1        0         2         1            0
94       0       1        0         2  

## Step 7.4: Calculate the Accuracy Score

In [14]:
# Calculate the Accuracy Score

# Calculate the Accuracy

model_accuracy_score = accuracy_score(model_predictions["Survived"],model_predictions["Predictions"])

print("\n\nAccuracy Score:")
print("===============\n")
print(round(model_accuracy_score,2))



Accuracy Score:

0.8


# Step 8: Execute the Application Phase

## Step 8.1: Take Input from User

In [16]:
# Take Input from User

pclass_input = input("\nPlease enter PClass here (First,Second,Third) : ").strip()
gender_input = input("\nPlease enter your Gender here (Male, Female) : ").strip()
sibling_input = input("\nPlease enter your Sibling here (Zero,One,Two,Three) : ").strip()
embarked_input = input("\nPlease enter Embarked here (Cherbourg,Southampton,Queenstown) : ").strip()


Please enter PClass here (First,Second,Third) : Second

Please enter your Gender here (Male, Female) : Male

Please enter your Sibling here (Zero,One,Two,Three) : One

Please enter Embarked here (Cherbourg,Southampton,Queenstown) : Southampton


## Step 8.2: Convert User Input into Feature Vector (Exactly Same as Feature Vectors of Sample Data)

In [17]:
# Convert User Input into Feature Vector

user_input = pd.DataFrame({ 'PClass': [pclass_input],'Gender': [gender_input],'Sibling': [sibling_input],'Embarked': [embarked_input]})

print("\n\nUser Input Feature Vector:")
print("==========================\n")
print(user_input)



User Input Feature Vector:

   PClass Gender Sibling     Embarked
0  Second   Male     One  Southampton


## Step 8.3: Label Encoding of Feature Vector (Exactly Same as Label Encoded Feature Vectors of Sample Data)

In [18]:
# Label Encoding

# Transform Input (Categorical) Attributes of Unseen Data into Numerical Representation

unseen_data_features = user_input.copy()
unseen_data_features["PClass"] = pclass_label_encoder.transform(user_input['PClass'])
unseen_data_features["Gender"] = gender_label_encoder.transform(user_input['Gender'])
unseen_data_features["Sibling"] = sibling_label_encoder.transform(user_input['Sibling'])
unseen_data_features["Embarked"] = embarked_label_encoder.transform(user_input['Embarked'])

print("\n\nUser Input Feature Vector:")
print("==========================\n")
print(user_input)

print("\n\nUser Input Encoded Feature Vector:")
print("==================================\n")
print(unseen_data_features)



User Input Feature Vector:

   PClass Gender Sibling     Embarked
0  Second   Male     One  Southampton


User Input Encoded Feature Vector:

   PClass  Gender  Sibling  Embarked
0       1       1        0         2


## Step 8.4: Load the Saved Model

In [19]:
# Load the Saved Model

# Load the Saved Model

model = pickle.load(open('svc_trained_model.pkl', 'rb'))

## Step 8.5: Model Prediction
### Step 8.5.1: Apply Model on the Label Encoded Feature Vector of unseen instance and return Prediction to the User

In [20]:
# Prediction of Unseen Instance

# Make a Prediction on Unseen Data

predicted_survival = model.predict(unseen_data_features)

if(predicted_survival == 1): 
    prediction = "SURVIVED"
if(predicted_survival == 0):
    prediction = "NOT SURVIVED"

# Add the Prediction in a Pretty Table

pretty_table = PrettyTable()
pretty_table.add_column("       ** Prediction **       ",[prediction])
print(pretty_table)

+--------------------------------+
|        ** Prediction **        |
+--------------------------------+
|          NOT SURVIVED          |
+--------------------------------+


# Step 9: Execute the Feedback Phase
## A Two-Step Process
### Step 01: After some time, take Feedback from
    o	Domain Experts and Users on deployed Titanic Passenger Survival Prediction System
### Step 02: Make a List of Possible Improvements based on Feedback received

# Step 10: Improve Model based on Feedback
### There is Always Room for Improvement
### Based on Feedback from Domain Experts and Users
    o	Improve your Model