# interpolation algorithm

### What is interpolation algorithm?
    Interpolation is a technique in Python with which you can estimate unknown data points between two known data points. It is commonly used to fill missing values in a table or a dataset using the already known values.

### When to use Interpolation?
    we can use Interpolation to find missing value with help of its neighbors. When imputing missing values with average does not fit best, we have to move to a different technique and the technique most people find is Interpolation.

### Implementation

In [7]:
import numpy as np
import pandas as pd
import re

In [11]:
df = pd.read_csv("train.csv")
df

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C


In [9]:
df.isnull().sum()

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64

In [12]:
def detect_missing():
        # checking missing values
        null_series = df.isnull().sum()
        print()
        null_column_list = []
        if sum(null_series):
            print('Following columns contains missing values : ')
            total_samples = df.shape[0]
            for i, j in null_series.items():
                if j:
                    print("{} : {:.2f} %".format(i, (j/total_samples)*100))
                    null_column_list.append(i)
        else:
            print("None of the columns contains missing values !")
        return null_column_list

In [13]:
null_column_list = detect_missing()
print(null_column_list)


Following columns contains missing values : 
Age : 19.87 %
Cabin : 77.10 %
Embarked : 0.22 %
['Age', 'Cabin', 'Embarked']


In [14]:

        df_interpolate = df.copy()
        # mapping embarked values by numeric values
        embarked_mapping = {"S": 1, "C": 2, "Q": 3}
        df_interpolate['Embarked'] = df_interpolate['Embarked'].map(embarked_mapping)
        # mapping Cabin string values by numeric values
        deck = {"A": 0, "B": 1, "C": 2, "D": 3, "E": 4, "F": 5, "G": 6, "U": 7}
        df_interpolate['Cabin'] = df_interpolate['Cabin'].fillna("U")
        df_interpolate['Cabin'] = df_interpolate['Cabin'].map(lambda x: re.compile("([a-zA-Z]+)").search(x).group())
        df_interpolate['Cabin'] = df_interpolate['Cabin'].map(deck)
        df_interpolate['Cabin'].replace({7:np.nan}, inplace=True)
        df_interpolate.interpolate(method='linear', inplace=True, limit_direction='both')
        # reverse mapping the values
        embarked_mapping = {1:"S", 2:"C", 3:"Q"}
        df_interpolate['Embarked'] = df_interpolate['Embarked'].map(embarked_mapping)
        deck_mapping = {0 : "A", 1 : "B", 2 : "C", 3 : "D", 4 : "E", 5 : "F", 6 : "G"}
        df_interpolate['Cabin'] = df_interpolate['Cabin'].map(deck_mapping)


In [15]:
df_interpolate.isnull().sum()

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age              0
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          507
Embarked         2
dtype: int64

In [16]:
df_interpolate.to_csv("after_interpolation_train.csv")