## Impute missing values by predicting missing values when index is date time 
* Common techniques includes mean/ median for numeric values and most frequent values for nominal/ text data. Especially, these methods may depend on the group where values are computed. E.g: We may compute mean/ median /most for each group and assign these values to missing ones. See more ImputingMissingWithPima.ipynb notebook   
* Second approach is to use a predicted model to generate "new values"for those missing ones. Eg.: Random Forest model.    
* Third approach comes from signal processing which will be presented in this notebook. It requires index data being date time. E.g: time series when up-sampling generates several undetermined values

In [1]:
import numpy as np
import pylab as pl
from numpy import fft
    
def fourierExtrapolation(x, n_predict):
    n = x.size
    n_harm = 10                     # number of harmonics in model
    t = np.arange(0, n)
    p = np.polyfit(t, x, 1)         # find linear trend in x
    x_notrend = x - p[0] * t        # detrended x
    x_freqdom = fft.fft(x_notrend)  # detrended x in frequency domain
    f = fft.fftfreq(n)              # frequencies
    indexes = range(n)
    # sort indexes by frequency, lower -> higher
    indexes.sort(key = lambda i: np.absolute(f[i]))
 
    t = np.arange(0, n + n_predict)
    restored_sig = np.zeros(t.size)
    for i in indexes[:1 + n_harm * 2]:
        ampli = np.absolute(x_freqdom[i]) / n   # amplitude
        phase = np.angle(x_freqdom[i])          # phase
        restored_sig += ampli * np.cos(2 * np.pi * f[i] * t + phase)
    return restored_sig + p[0] * t

## Temperature Predictions Hacker Rank  challenge
This problem demontrates how to fill missing value with pandas intepolate. Origin problem from
https://www.hackerrank.com/challenges/temperature-predictions. Given a record containing the maximum and minimum monthly temperatures at a particular station. The record shows the temperature information for each month in a data range from January 1908  to March 2012 ; however, some of the temperature values have been blanked out! Estimate and print the missing values. 
**Brief information:** The first line contains an integer, N, denoting the number of rows of data in the input file. 
The second line contains the header for the tab-separated file; this line can be ignored, and is simply there to make the test case easier to read.   
The N subsequent lines each describe the respective , year, month, maximum temporature  and minimum temporature data as a row of tab-separated values. In some of the rows, The minum or maximum temperature field has been blanked out and replaced by:Missing_1 , Missing_2 , etc.   
** Sample Input ** 
20   
yyyy    month     tmax        tmin   
1908    January   5.0        -1.4   
1908    February  7.3         1.9   
1908    March     6.2         0.3   
1908    April     Missing_1   2.1   
1908    May       Missing_2   7.7   
1908    June      17.7        8.7   
1908    July      Missing_3  11.0   
1908    August    17.5        9.7   
1908    September 16.3        8.4   
** Sample output***   
The four missing values (Missing_1,Missing_2 ,Missing_3 , and Missing_4) are:     
8.6   
15.8   
18.9   


In [1]:
import fileinput
import pandas as pd
import numpy as np

#####Extract the minimum and maximum temperature data
#####   Append NA (empty value) to missing value
#####   Store the location of missing value in missing_dict
i=-2
temperature=[]
min_temp =[]
max_temp = []
missing_dict={}


In [2]:
no_test = int(input()) # a number of  data
input() # skip columns name 
for i in range(no_test):
    temperature_list = input().split()
    if "Missing" not in temperature_list[2]:
        min_temp.append(float(temperature_list[2]))
    else:
        missing_dict[int(temperature_list[2].replace("Missing_","").replace("\n",""))]=['min',i]
        min_temp.append(np.nan)
    if "Missing" not in temperature_list[3]:
        max_temp.append(float(temperature_list[3].replace("\n","")))
    else:
        max_temp.append(np.nan)
        missing_dict[int(temperature_list[3].replace("Missing_","").replace("\n",""))]=['max',i]


20
yyyy    month     tmax        tmin
1908    January   5.0        -1.4
1908    February  7.3         1.9
1908    March     6.2         0.3
1908    April     Missing_1   2.1
1908    May       Missing_2   7.7
1908    June      17.7        8.7
1908    July      Missing_3  11.0
1908    August    17.5        9.7
1908    September 16.3        8.4
1908    October   14.6        8.0
1908    November   9.6        3.4
1908    December   5.8        Missing_4
1909    January    5.0        0.1
1909    February   5.5       -0.3
1909    March      5.6       -0.3
1909    April     12.2        3.3
1909    May       14.7        4.8
1909    June      15.0        7.5
1909    July      17.3       10.8
1909    August    18.8       10.7  


In [3]:
##### Interpolate the data i.e fill the missing value     
d = {'min' : pd.Series(min_temp),'max' : pd.Series(max_temp)}
df = pd.DataFrame(d)
df_processed = df.interpolate()

##### Print the filling missing value
for x in sorted(missing_dict.keys()):
    print(df_processed[missing_dict[x][0]][missing_dict[x][1]])


10.0333333333
13.8666666667
17.6
1.75
