**Importing all the libraries that may be useful in analyzing the insights**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

**Importing dataset(s)**

In [None]:
dataset = pd.read_csv("../input/titanic/gender_submission.csv")
test_data = pd.read_csv("../input/titanic/test.csv")
train_data = pd.read_csv("../input/titanic/train.csv")

**After importing; now we will take train data and analyze it**

In [None]:
train_data.head(10)

**We saw some missing data; so time to find it out**

In [None]:
train_data.isnull()

*The columns mainly Cabin and somewhat also Age has missing values.*

**Using heatmap, we will analyze missing values in training data**

In [None]:
sns.heatmap(train_data.isnull())

**What we just guessed is right. There are missing values in 2 columns.**

**Now back to business; we will focus on visualizing on survivals in the Titanic. For that now we will make some plots for our better understanding.**

**Analyzing Ages of passengers**

In [None]:
sns.distplot(train_data["Age"].dropna(), bins = 80, kde=False, color="blue")

**Now taking other columns so that we may know insights of that unfortunate incident in a better way.**

In [None]:
sns.set_palette("Paired")
sns.countplot('SibSp', data=train_data)

**Survival**

In [None]:
sns.set_palette("Set3", 1, .75)
sns.countplot('Survived', data=train_data)

**Survival vs. Gender**

In [None]:
sns.set_style("whitegrid")
sns.countplot("Survived", hue="Sex", data=train_data, palette="rainbow")

**Fare paid by passengers**

In [None]:
plt.hist(train_data['Fare'], bins = 10, color='brown')

**Pretty Good!**
**Now getting our way ahead and working on Data Cleaning**

**We will focus on missing values now and we are trying to fill them now by taking the average of Passengers' ages. We will now narrow our search area by focusing on Passengers' class in Titanic**

In [None]:
plt.figure(figsize=(10,6))
sns.boxplot(x=train_data["Pclass"], y=train_data["Age"], palette="autumn")

**The boxplot results show us that the passengers with older ages were in higher Class(es). So we using a user-defined function we will now make sure we use right Pclass on ages.**

In [None]:
def ages(col):
  Age = col[0]
  Pclass = col[1]
  if pd.isnull(Age):
    if Pclass == 1:
      return 37
    elif Pclass == 2:
      return 29
    else:
      return 24
  else:
    return Age

**Getting function to work**

In [None]:
train_data['Age'] = train_data[['Age', 'Pclass']].apply(ages, axis=1)

**Confiming our results using Heatmap**

In [None]:
sns.heatmap(train_data.isnull(), cmap="viridis", yticklabels=False, cbar=False)

In [None]:
train_data.dropna(inplace=True)

**We also noticed there are some Categorical column (Embarked) that should be converted into the Numerical one.**

In [None]:
pd.get_dummies(train_data['Embarked'], drop_first=True)

In [None]:
sex = pd.get_dummies(train_data['Sex'],drop_first=True)
embark = pd.get_dummies(train_data['Embarked'],drop_first=True)

**Dropping above columns so that we can simplify and move on to ML**

---



In [None]:
train_data.drop(['Name', 'Sex', 'Ticket', 'Embarked', 'Cabin'], axis=1, inplace=True)

In [None]:
pd.concat([train_data, embark, sex], axis=1)
train_data

**Moving towards Machine Learning model development**

**Spliting data into Train and Test set**

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(train_data.drop('Survived', axis=1), train_data['Survived'], 
                                                    test_size=0.2, random_state=0)

**Training our model**

In [None]:
from sklearn.linear_model import LogisticRegression
regressor = LogisticRegression()
regressor.fit(X_train, y_train)

**Predictions**

In [None]:
pred = regressor.predict(X_test)

**Importing libraries for testing accuracy of our model**

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

**Accuracy**

In [None]:
accuracy = accuracy_score(y_test, pred)
accuracy

**Confusion Matrix**

In [None]:
conf = confusion_matrix(y_test, pred)
conf

**Evaluating Classification Report**

In [None]:
print(classification_report(y_test, pred))