### Super Simple Primer on Interpolation###

There are always gaps in data, so interpolation is essential for data analysis. In finance, interpolation is especially useful when we are dealing with time series data.

To get acquainted with interpolation, let's do a super simple toy example.

In [2]:
# Let's create a simple random time series 
import pandas as pd
import numpy as np

timerange = pd.date_range('1/1/2016', periods=252)
mean = 1.2
std = 0.3

t_series = pd.DataFrame(np.random.normal(mean, std ,len(timerange)), 
                        index=timerange, columns=['Prices'])

In [3]:
t_series.head(3)

Unnamed: 0,Prices
2016-01-01,1.413515
2016-01-02,1.394857
2016-01-03,1.360298


Let's randomly insert 50 gaps in the time series 

In [4]:
t_series['Prices_Gapped'] = t_series['Prices']
t_series.loc[np.random.choice(t_series.index, 80), 'Prices_Gapped'] = np.nan

In [6]:
t_series.head(20)

Unnamed: 0,Prices,Prices_Gapped
2016-01-01,1.413515,1.413515
2016-01-02,1.394857,1.394857
2016-01-03,1.360298,1.360298
2016-01-04,1.290587,
2016-01-05,1.75269,
2016-01-06,1.451516,1.451516
2016-01-07,0.492146,0.492146
2016-01-08,1.796194,1.796194
2016-01-09,1.026254,1.026254
2016-01-10,1.106907,1.106907


All it takes is one line to interpolate the data

In [8]:
t_series['Prices_Filled']=t_series['Prices_Gapped'].interpolate(method='linear')

In [9]:
t_series.head(20)

Unnamed: 0,Prices,Prices_Gapped,Prices_Filled
2016-01-01,1.413515,1.413515,1.413515
2016-01-02,1.394857,1.394857,1.394857
2016-01-03,1.360298,1.360298,1.360298
2016-01-04,1.290587,,1.390704
2016-01-05,1.75269,,1.42111
2016-01-06,1.451516,1.451516,1.451516
2016-01-07,0.492146,0.492146,0.492146
2016-01-08,1.796194,1.796194,1.796194
2016-01-09,1.026254,1.026254,1.026254
2016-01-10,1.106907,1.106907,1.106907


Linear interpolation might not get what we want. You can see that the filled prices are not the same as the original time series. Other interpolation methods are available.

In [24]:
t_series['Prices_Filled_Cubic']=t_series['Prices_Gapped'].interpolate(method='cubic')

In [25]:
t_series['Prices_Filled_Poly']=t_series['Prices_Gapped'].interpolate(method='polynomial', order=2)

In [26]:
t_series.head(20)

Unnamed: 0,Prices,Prices_Gapped,Prices_Filled,Prices_Filled_Cubic,Prices_Filled_Poly
2016-01-01,1.413515,1.413515,1.413515,1.413515,1.413515
2016-01-02,1.394857,1.394857,1.394857,1.394857,1.394857
2016-01-03,1.360298,1.360298,1.360298,1.360298,1.360298
2016-01-04,1.290587,,1.390704,1.74553,1.591244
2016-01-05,1.75269,,1.42111,2.029394,1.963615
2016-01-06,1.451516,1.451516,1.451516,1.451516,1.451516
2016-01-07,0.492146,0.492146,0.492146,0.492146,0.492146
2016-01-08,1.796194,1.796194,1.796194,1.796194,1.796194
2016-01-09,1.026254,1.026254,1.026254,1.026254,1.026254
2016-01-10,1.106907,1.106907,1.106907,1.106907,1.106907


Other than cubic and polynomial, other interpolation methods are available. See here - https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.interpolate.html