## Introduction 

A **time series data** is the set of measurements taking place in a constant interval of time, here time acts as independent variable and the objective ( to study changes in a characteristics) is dependent variables.
  
For example, 
- Consumption of energy per hour
- Sales on daily basis
- Company's profits per quarter
- Annual changes in a population of a country

<br> Time Series data can be represented using various visualization techniques in order to uncover hidden patterns in the dataset.Since time acts as a reference point in relation to the entire procedure, it can be noticed that time-series always depicts a relationship between two variables in which one is time and the other one is any quantitative variable.

Sometimes the dates in the time-series dataset might be missing or might not be available. For example, in the IoT usage datasets, we might not have usages for a particular day due to some technixal issues in the device. These missing dates needs to be captured in the Feature Engineering and appropriate action needs to be taken to get more accurate predictions or forecasts.


### Python code to find any missing dates in the Time-Series

In [1]:
import pandas as pd

stk_data = {
    'Datetime':['2021-01-01','2021-01-02','2021-01-04','2021-01-05','2021-01-06','2021-01-07'],
    'Price': ['54','43','65','53','55','55']
}
df_stk = pd.DataFrame(stk_data)
print(df_stk)

     Datetime Price
0  2021-01-01    54
1  2021-01-02    43
2  2021-01-04    65
3  2021-01-05    53
4  2021-01-06    55
5  2021-01-07    55


In [2]:
df_stk = df_stk.set_index('Datetime')

In [3]:
df_stk.index = pd.to_datetime(df_stk.index)

In [4]:
df_stk_new = pd.date_range(start=df_stk.index[0], end=df_stk.index[-1]).difference(df_stk.index)

In [5]:
df_stk_new

DatetimeIndex(['2021-01-03'], dtype='datetime64[ns]', freq=None)

It is evident that the date '2021-01-03' was missing in the original dataset.
Further analysis or feature engineering can be continued based on the problem statement

### Python code to populate the missing dates in the DataFrame 

In [6]:
import pandas as pd

data = {
    'Datetime':['2021-01-01','2021-01-02','2021-01-04','2021-01-05','2021-01-06','2021-01-07'],
    'label': ['54','43','65','53','55','55']
}
df = pd.DataFrame(data)
print(df)

     Datetime label
0  2021-01-01    54
1  2021-01-02    43
2  2021-01-04    65
3  2021-01-05    53
4  2021-01-06    55
5  2021-01-07    55


In [7]:
df['Datetime'] = pd.to_datetime(df['Datetime'])

In [8]:
idx = pd.date_range(df['Datetime'].min(), df['Datetime'].max())

In [9]:
df = df.set_index('Datetime')

In [10]:
df_result = df.reindex(idx)
df_result

Unnamed: 0,label
2021-01-01,54.0
2021-01-02,43.0
2021-01-03,
2021-01-04,65.0
2021-01-05,53.0
2021-01-06,55.0
2021-01-07,55.0


The above code will include the missing dates in the time-series problem with the NULL values in the "label" column.
<br> We will need to use the EDA & Feature engineering techniques to fill the NULL values

### Handling Missing Time Series Data

When data is missing in a time series, we can use some form of imputation or interpolation to impute the missing values. 
We can try the below methods to populate the missing values

1. Interpolate
2. Fillna
3. Impyute

I tried using the **fillna() with forward fill** method in the below example 

In [11]:
df_result['label'].fillna(method = 'ffill', inplace = True)
df_result

Unnamed: 0,label
2021-01-01,54
2021-01-02,43
2021-01-03,43
2021-01-04,65
2021-01-05,53
2021-01-06,55
2021-01-07,55
