# Title : Titanic - Machine Learning from Disaster

### Description: 

The objective of this code is to build a predictive model using **RandomForestClassifier** that answers the question: 
“what sorts of people were more likely to survive?” using passenger data (ie name,age,gender,socio-economic class,etc).

### Load titanic dataset

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

### Read titanic train dataset and store it in train_data

In [None]:
train_data = pd.read_csv("./titanic/train.csv")
train_data.head()

### Read titanic test dataset and store it in test_data

In [None]:
test_data = pd.read_csv("./titanic/test.csv")
test_data.head()

### Check the use of 'Sex' feature while predicting the class 

In [None]:
women = train_data.loc[train_data.Sex == 'female']["Survived"]
rate_women = sum(women)/len(women)

print("% of women who survived:", rate_women)

In [None]:
men = train_data.loc[train_data.Sex == 'male']["Survived"]
rate_men = sum(men)/len(men)

print("% of men who survived:", rate_men)

### Steps to follow
    step1 : Extract the class('Survived') column from train_data and store it in y
    step2 : Using "Pclass", "Sex", "SibSp", "Parch" features to build a model
    step3 : Extracting the above columns from train_data and test_data and store it in X and X_test
    step4 : Using RandomForestClassifier to build titanic disaster predictive model 
    step5 : Fit the model using X and y
    step6 : predict the test_y for X_test using the fitted model

In [None]:
from sklearn.ensemble import RandomForestClassifier

y = train_data["Survived"]

features = ["Pclass", "Sex", "SibSp", "Parch"]
X = pd.get_dummies(train_data[features])
X_test = pd.get_dummies(test_data[features])

model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=1)
model.fit(X, y)
predictions = model.predict(X_test)

output = pd.DataFrame({'PassengerId': test_data.PassengerId, 'Survived': predictions})
output.to_csv('submission.csv', index=False)
print("Your submission was successfully saved!")