# <font color='blue'>Titanic:</font> I Got the best results by using Kernel SVM with 84% accuracy
---
 
* **Part 1 - Data Preprocessing**
   1. Importing libraries
   2. Importing the dataset
   3. Dataset information (Pandas Profiling)
   4. Dropping unnecessary columns
      - "Train" set
      - "Test" set
   5. Taking care of misssing data
      - "Age" in Train & Test set
      - "Embarked" in Train set
      - "Fare" in Test set
      - Updated info()
   6. Encoding categorical data
      - "sex" in Train & Test set
      - "Embarked" in Train & Test set
      - Updated head()
   7. Spliting the Train & Test datasets
   8. Feature Scaling   
* **Part 2 - Training the Classification model**
   1. Kernel SVM
   2. Other algorithms
   3. Accuracy score  
* **Part 3 - Creating a submission.csv**

# <font color='blue'>Part 1 - Data Preprocessing </font>

## Importing libraries

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Importing the dataset

In [None]:
train_df = pd.read_csv('../input/titanic/train.csv')
test_df = pd.read_csv('../input/titanic/test.csv')

## Dataset information (Pandas Profiling)

In [None]:
import pandas_profiling as pp
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline

In [None]:
pp.ProfileReport(train_df, title = 'Pandas Profiling report of "Train" set', html = {'style':{'full_width': True}})

In [None]:
pp.ProfileReport(test_df, title = 'Pandas Profiling report of "Test" set', html = {'style':{'full_width': True}})

## Dropping unnecessary columns

### 'Train' set

In [None]:
train_df=train_df.drop("PassengerId",axis=1)
train_df=train_df.drop("Name",axis=1)
train_df=train_df.drop("Ticket",axis=1)
train_df=train_df.drop("Cabin",axis=1)

In [None]:
train_df.head()

### 'Test' set

In [None]:
test_df=test_df.drop("Name",axis=1)
test_df=test_df.drop("Ticket",axis=1)
test_df=test_df.drop("Cabin",axis=1)

In [None]:
test_df.head()

## Taking care of misssing data

### 'Age' in Train & Test set

In [None]:
data = [train_df, test_df]

for dataset in data:
    mean = train_df["Age"].mean()
    std = test_df["Age"].std()
    is_null = dataset["Age"].isnull().sum()
    # compute random numbers between the mean, std and is_null
    rand_age = np.random.randint(mean - std, mean + std, size = is_null)
    # fill NaN values in Age column with random values generated
    age_slice = dataset["Age"].copy()
    age_slice[np.isnan(age_slice)] = rand_age
    dataset["Age"] = age_slice
    dataset["Age"] = train_df["Age"].astype(int)
train_df["Age"].isnull().sum()

### 'Embarked' in Train set

In [None]:
common_value = 'S'
train_df["Embarked"] = train_df["Embarked"].fillna(common_value)

### 'Fare' in Test set

In [None]:
test_df = test_df.fillna(test_df['Fare'].mean())

### Updated info()

In [None]:
train_df.info()

In [None]:
test_df.info()

## Encoding categorical data 

### 'sex' in Train & Test set

In [None]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
train_df["Sex"]= le.fit_transform(train_df["Sex"])
print(train_df["Sex"])

In [None]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
test_df["Sex"]= le.fit_transform(test_df["Sex"])
print(test_df["Sex"])

### 'Embarked' in Train & Test set

In [None]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
train_df["Embarked"]= le.fit_transform(train_df["Embarked"])
print(train_df["Embarked"])

In [None]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
test_df["Embarked"]= le.fit_transform(test_df["Embarked"])
print(test_df["Embarked"])

### Updated head()

In [None]:
train_df.head()

In [None]:
test_df.head()

## Spliting the Train & Test datasets

In [None]:
X_train = train_df.drop("Survived", axis=1)
Y_train = train_df["Survived"]
X_test  = test_df.drop("PassengerId", axis=1).copy()
''' OR
X_train = train_df[:, 0:-1]
Y_train = train_df[:, -1]
X_test  = test_df[:, 1:]
'''

## Feature Scaling

In [None]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [None]:
print(X_train)

In [None]:
print(Y_train)

# <font color='blue'>Part 2 - Training the Classification model</font>

## Kernel SVM

In [None]:
from sklearn.svm import SVC
classifier = SVC(kernel = 'rbf', random_state = 0)
classifier.fit(X_train, Y_train)
Y_pred = classifier.predict(X_test)

## Other algorithms

In [None]:
''' 
from sklearn.svm import SVC
classifier = SVC(kernel = 'linear', random_state = 0)
classifier.fit(X_train, Y_train)

from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
classifier.fit(X_train, Y_train)

from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train, Y_train)

from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)

from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)

from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, Y_train)

from xgboost import XGBClassifier
classifier = XGBClassifier()
classifier.fit(X_train, Y_train)

'''

## Accuracy score

In [None]:
from sklearn.metrics import accuracy_score
classifier.score(X_train, Y_train)
classifier = round(classifier.score(X_train, Y_train) * 100, 2)
classifier

# <font color='blue'>Part 3 - Creating a submission.csv</font>

In [None]:
submission = pd.DataFrame({
        "PassengerId": test_df["PassengerId"],
        "Survived": Y_pred
    })

In [None]:
submission.to_csv('submission.csv', index=False)

# If you liked my work then please upvote, Thank you.