# Titanic - Solution - Easiest Way


### I used following techniques in this notebook
1. Loading Data using Pandas
2. Checking varible type for each column
3. Checking number of nulls in each column
4. Finding column in Dataset
5. Drop useless columns
6. Handling Missing value with Median and Mode
7. Checking occurance of each category
8. Data Visualiztion using Matplotlib
9. Checking Ouliers
10. Handling Categorical Data using Get_Dummies()
11. Concatenating the Original Dataset & the One after creating Dummies
12. Seggregating X & y.
13. Preprocessing Numeric Data using StandardScaler
14. Dropping useless columns after we get_dummies()
15. Splitting using train_test_split
16. Using Random Forest as ML model
17. Predicting & Scoring the Trained Model

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


### **1. Loading Data using Pandas**

In [None]:
train=pd.read_csv('/kaggle/input/titanic/train.csv')
test=pd.read_csv('/kaggle/input/titanic/test.csv')

# train.head()
test.head()

### 2. Checking varible type for each column

In [None]:
train.info()

### 3. Checking number of nulls in each column

In [None]:
train.isnull().sum()

### 4. Finding column in Dataset

In [None]:
train.columns

In [None]:
test.columns

### 5. Drop useless columns

In [None]:
train.drop(columns=['Name', 'Ticket', 'Cabin'], axis=1, inplace=True)
test.drop(columns= ['Name', 'Ticket', 'Cabin'], axis=1, inplace= True)
# train.head()
test.head()

### 6. Handling Missing value with Median and Mode


In [None]:
train['Age'].fillna(train['Age'].median(), inplace=True)
train['Embarked'].fillna(train['Embarked'].mode()[0], inplace=True)

train.isnull().sum()

In [None]:
test['Age'].fillna(test['Age'].median(), inplace=True)
test['Fare'].fillna(test['Fare'].mean(), inplace=True)
test['Embarked'].fillna(test['Embarked'].mode()[0], inplace=True)
test.isnull().sum()

### 7. Checking occurance of each category

In [None]:
train['Survived'].value_counts()

In [None]:
train['Pclass'].value_counts()

In [None]:
train['Sex'].value_counts()

In [None]:
train['SibSp'].value_counts()

In [None]:
train['Parch'].value_counts()

In [None]:
train['Embarked'].value_counts()

### 8. Data Visualiztion using Matplotlib

In [None]:
sns.countplot(x='Survived', data= train)

In [None]:
sns.countplot(x='Sex', data= train)

In [None]:
sns.countplot(x='Survived', hue='Sex', data= train)

In [None]:
sns.countplot(x='Survived', hue='Pclass', data= train)

In [None]:
sns.boxplot(x='Survived', y= 'Age', hue='Sex', data= train)

In [None]:
sns.boxplot(x='Pclass', y= 'Fare', data= train)

### 9. Checking Ouliers

In [None]:
train.plot(kind='box')

In [None]:
cols= ['Age', 'SibSp', 'Parch', 'Fare']

train[cols]= train[cols].clip(lower= train[cols].quantile(0.15), upper= train[cols].quantile(0.85), axis=1)

train.drop(columns=['Parch'], axis=1, inplace=True)
train.plot(kind='box', figsize= (10,8)) 
# no outliers 

In [None]:
test.plot(kind='box', figsize= (10,8))
# there are outliers

In [None]:
test[cols]= test[cols].clip(lower= test[cols].quantile(0.15), upper= test[cols].quantile(0.85), axis=1)

test.drop(columns=['Parch'], axis=1, inplace=True)
test.plot(kind='box', figsize= (10,8))  
# no outliers

### 10. Handling Categorical Data using Get_Dummies()
#### We use *'drop_first'* to avoid **Dummy Trap**

In [None]:
train1= pd.get_dummies(train, columns=['Pclass', 'Sex', 'Embarked' ], drop_first= True)
test1= pd.get_dummies(test, columns=['Pclass', 'Sex', 'Embarked' ], drop_first= True)

### 11. Concatenating the Original Dataset & the One after creating Dummies*(get_dummies() creates a new DF containing JUST the dummies, MOST People get wrong here)*

In [None]:
train2=pd.concat([train,train1],axis=1)
test2=pd.concat([test,test1],axis=1)

### 12. Splitting X & y

In [None]:
y_train= train2['Survived']
X_train= train2.drop(['PassengerId','Survived'],axis=1)

# y_test= test2['Survived']
#X_test=test2.drop(['PassengerId','Survived'],axis=1)

### 13. Preprocessing Numeric Data using StandardScaler

In [None]:
from sklearn.preprocessing import StandardScaler
ss= StandardScaler()
features= ['Age', 'SibSp', 'Fare']

X_train[features]= ss.fit_transform(X_train[features])

X_test[features]= ss.fit_transform(X_test[features])
# X_train.head()
# X_test.head()

### 14. Dropping useless columns after we get_dummies()

In [None]:
# X_train=X_train.drop(['Pclass','Sex','Embarked'],axis=1)
X_test=X_test.drop(['Pclass','Sex','Embarked'],axis=1)
# X_train
X_test

### 15. Splitting using train_test_split

In [None]:

# from sklearn.model_selection import train_test_split
# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

### 16. Using Random Forest as ML model

In [None]:
#Using little bit of Hyperparameter Tuning
from sklearn.ensemble import RandomForestClassifier
clf=RandomForestClassifier(n_estimators=500)
clf.fit(X_train,y_train)

### 17. Predicting & Scoring the Trained Model

In [None]:
predictions= clf.predict(X_test)
clf.score(X_test, y_test)

### 18. Saving the output in a file

In [None]:
submission= pd.DataFrame(data=predictions)
print(submission.head())
filename= 'titanic_prediction1.csv'
submission.to_csv(filename,index=False)

#### If you like this notebook, give an upvote. Any suggestions or comments are appreciated.