# Titanic Survival

**The core purpose of this program is to build a logistic regression model to predict the survival of passengers on the Titanic based on various features.**

# Logistic Regression
**Logistic regression is a binary classification algorithm that models the relationship between input variables and the probability of belonging to a certain class, using a logistic or sigmoid function to map the input space to a probability space. It estimates the log odds of a data point belonging to the positive class and allows for the interpretation of feature importance based on the coefficients of the model.**

**Steps:**
* Load the Dataset
* Coverting categorical variable to numerical
* Handling missing values
* Creating new feature columns
* Defining independant and dependant variables
* Split the data into training and testing set
* Normalize the data
* Create a Logistic Regression model and train it
* Calculate its score
* Analyze the coefficients
* Testing it on sample features

# Load the Dataset

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
passengers = pd.read_csv('/kaggle/input/data-science-day1-titanic/DSB_Day1_Titanic_train.csv')

# Coverting categorical variable to numerical 

In [2]:
passengers['Sex'] = passengers['Sex'].map({'female': 1, 'male': 0})

# Handling missing values

In [3]:
passengers['Age'].fillna(value=passengers['Age'].mean(), inplace=True)

# Creating new feature columns

In [4]:
passengers['FirstClass'] = passengers['Pclass'].apply(lambda x: 1 if x==1 else 0)
passengers['SecondClass'] = passengers['Pclass'].apply(lambda x: 1 if x==2 else 0)

# Defining independant and dependant variables

In [5]:
features = passengers[['Sex', 'Age', 'FirstClass', 'SecondClass']]
survival = passengers['Survived']

# Split the data into training and testing set

In [6]:
train_features, test_features, train_labels, test_labels = train_test_split(features, survival, test_size=0.2, random_state=1)

# Normalize the data

In [7]:
scaler = StandardScaler()
train_features = scaler.fit_transform(train_features)
test_features = scaler.transform(test_features)

# Create a Logistic Regression model and train it

In [8]:
model = LogisticRegression()
model.fit(train_features, train_labels)

# Calculate its score

In [9]:
train_score = model.score(train_features, train_labels)
print("Traing score: ", train_score)
test_score = model.score(test_features, test_labels)
print("Test score: ", test_score)

Traing score:  0.797752808988764
Test score:  0.8044692737430168


# Analyze the coefficients

**This shows us the direction of influence as well as importance of each feature.**

In [10]:
print("Features coefficients: ", list(zip(['Sex', 'Age', 'FirstClass', 'SecondClass'], model.coef_[0])))

Features coefficients:  [('Sex', 1.250615605094831), ('Age', -0.4567407696113751), ('FirstClass', 1.0280610392158223), ('SecondClass', 0.5521318339093749)]


# Testing it on sample features

In [11]:
Jack = np.array([0.0, 20.0, 0.0, 0.0])
Rose = np.array([1.0, 17.0, 1.0, 0.0])
You =  np.array([0.0, 25.0, 1.0, 0.0])
#combine passenger arrays
sample_passengers = np.array([Jack, Rose, You])
#scale the sample passenger features
sample_passengers = scaler.transform(sample_passengers)



In [12]:
#survival predictions
predictions = model.predict(sample_passengers)
print("Survival predictions: ", predictions)
#survival probabilities
proba_predictions = model.predict_proba(sample_passengers)
print("Survival probabilities: ", proba_predictions)

Survival predictions:  [0 1 1]
Survival probabilities:  [[0.8947279  0.1052721 ]
 [0.04841785 0.95158215]
 [0.47996417 0.52003583]]
