In [None]:
# Importing required modules
import pandas as pd          
import numpy as np
import statsmodels.api as sm 
import matplotlib.pyplot as plt
import matplotlib.dates as mdates # For plotting graphs 
import datetime as dt
from datetime import datetime    # To access datetime 
from pandas import Series        # To work on series 
import seaborn as sns
sns.set(rc={'figure.figsize':(11, 4)})
%matplotlib inline 

import warnings                   # To ignore the warnings 
warnings.filterwarnings("ignore")

# Settings for pretty nice plots
plt.style.use('fivethirtyeight')
plt.show()

<div align='center'><font size="5" color='#088a5a'>Getting Started with Time Series analysis using Pandas</font></div>
<div align='center'><font size="4" color="#088a5a">An introductory guide to get started with the Time Series datasets in Python</font></div>
<hr>

![](https://cdn-images-1.medium.com/max/1200/1*SkNISejyo-j4t8C8kpv4mQ.jpeg)

[Source](https://www.freepik.com/free-photos-vectors/business)



<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#What-is-Time-Series-Data" data-toc-modified-id="What-is-Time-Series-Data-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>What is Time Series Data</a></span><ul class="toc-item"><li><span><a href="#Components-of-Time-Series" data-toc-modified-id="Components-of-Time-Series-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Components of Time Series</a></span></li></ul></li><li><span><a href="#Dataset" data-toc-modified-id="Dataset-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Dataset</a></span></li><li><span><a href="#Objective" data-toc-modified-id="Objective-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Objective</a></span></li><li><span><a href="#A-first-look-at-Maruti’s-stock-Prices" data-toc-modified-id="A-first-look-at-Maruti’s-stock-Prices-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>A first look at Maruti’s stock Prices</a></span><ul class="toc-item"><li><span><a href="#Datetime-objects-in-Python" data-toc-modified-id="Datetime-objects-in-Python-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Datetime objects in Python</a></span></li><li><span><a href="#About-the-Stock-Data" data-toc-modified-id="About-the-Stock-Data-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>About the Stock Data</a></span></li><li><span><a href="#Visualising-the-Time-Series-data" data-toc-modified-id="Visualising-the-Time-Series-data-4.3"><span class="toc-item-num">4.3&nbsp;&nbsp;</span>Visualising the Time Series data</a></span></li></ul></li><li><span><a href="#Manipulating-TimeSeries-dataset" data-toc-modified-id="Manipulating-TimeSeries-dataset-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Manipulating TimeSeries dataset</a></span><ul class="toc-item"><li><span><a href="#Manipulating-Datetime" data-toc-modified-id="Manipulating-Datetime-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Manipulating Datetime</a></span></li><li><span><a href="#Subsetting-Data-Using-Pandas-Dataframes" data-toc-modified-id="Subsetting-Data-Using-Pandas-Dataframes-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>Subsetting Data Using Pandas Dataframes</a></span></li></ul></li><li><span><a href="#Visualizing-the-volume-weighted-average-price-(VWAP)" data-toc-modified-id="Visualizing-the-volume-weighted-average-price-(VWAP)-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Visualizing the volume weighted average price (VWAP)</a></span><ul class="toc-item"><li><span><a href="#Visualizing-using-markers" data-toc-modified-id="Visualizing-using-markers-6.1"><span class="toc-item-num">6.1&nbsp;&nbsp;</span>Visualizing using markers</a></span></li><li><span><a href="#Visualising-using-KDEs" data-toc-modified-id="Visualising-using-KDEs-6.2"><span class="toc-item-num">6.2&nbsp;&nbsp;</span>Visualising using KDEs</a></span></li><li><span><a href="#Visualising-using-Lineplots" data-toc-modified-id="Visualising-using-Lineplots-6.3"><span class="toc-item-num">6.3&nbsp;&nbsp;</span>Visualising using Lineplots</a></span></li></ul></li><li><span><a href="#Time-series-seasonal-decomposition" data-toc-modified-id="Time-series-seasonal-decomposition-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Time-series seasonal decomposition</a></span></li><li><span><a href="#Feature-Extraction" data-toc-modified-id="Feature-Extraction-8"><span class="toc-item-num">8&nbsp;&nbsp;</span>Feature Extraction</a></span></li><li><span><a href="#Time-resampling" data-toc-modified-id="Time-resampling-9"><span class="toc-item-num">9&nbsp;&nbsp;</span>Time resampling</a></span></li><li><span><a href="#Time-Shifting" data-toc-modified-id="Time-Shifting-10"><span class="toc-item-num">10&nbsp;&nbsp;</span>Time Shifting</a></span><ul class="toc-item"><li><span><a href="#Forward-Shifting" data-toc-modified-id="Forward-Shifting-10.1"><span class="toc-item-num">10.1&nbsp;&nbsp;</span>Forward Shifting</a></span></li><li><span><a href="#Backwards-Shifting" data-toc-modified-id="Backwards-Shifting-10.2"><span class="toc-item-num">10.2&nbsp;&nbsp;</span>Backwards Shifting</a></span></li><li><span><a href="#Shifting-based-off-time-string-code" data-toc-modified-id="Shifting-based-off-time-string-code-10.3"><span class="toc-item-num">10.3&nbsp;&nbsp;</span>Shifting based off time string code</a></span></li></ul></li><li><span><a href="#Rolling-windows" data-toc-modified-id="Rolling-windows-11"><span class="toc-item-num">11&nbsp;&nbsp;</span>Rolling windows</a></span></li><li><span><a href="#Handling-Missing-Values-in-Time-series-Data" data-toc-modified-id="Handling-Missing-Values-in-Time-series-Data-12"><span class="toc-item-num">12&nbsp;&nbsp;</span>Handling Missing Values in Time-series Data</a></span><ul class="toc-item"><li><span><a href="#Basic-Imputation-Techniques-for-Time-Series" data-toc-modified-id="Basic-Imputation-Techniques-for-Time-Series-12.1"><span class="toc-item-num">12.1&nbsp;&nbsp;</span>Basic Imputation Techniques for Time Series</a></span><ul class="toc-item"><li><span><a href="#Link-to-the-notebook-:-A-guide-to-Handling-missing-values" data-toc-modified-id="Link-to-the-notebook-:-A-guide-to-Handling-missing-values-12.1.1"><span class="toc-item-num">12.1.1&nbsp;&nbsp;</span><a href="https://www.kaggle.com/parulpandey/a-guide-to-handling-missing-values" target="_blank">Link to the notebook : A guide to Handling missing values</a></a></span></li></ul></li><li><span><a href="#Conclusion" data-toc-modified-id="Conclusion-12.2"><span class="toc-item-num">12.2&nbsp;&nbsp;</span>Conclusion</a></span><ul class="toc-item"><li><ul class="toc-item"><li><span><a href="#P.S-:-This-notebook-has-also-been-converted-into-a-blogpost,-incase-you-are-interested.-You-can-read-it-here:-Time-Series-Analysis-with-Pandas" data-toc-modified-id="P.S-:-This-notebook-has-also-been-converted-into-a-blogpost,-incase-you-are-interested.-You-can-read-it-here:-Time-Series-Analysis-with-Pandas-12.2.0.1"><span class="toc-item-num">12.2.0.1&nbsp;&nbsp;</span>P.S : This notebook has also been converted into a blogpost, incase you are interested. You can read it here: <a href="https://kite.com/blog/python/pandas-time-series-analysis/" target="_blank">Time Series Analysis with Pandas</a></a></span></li></ul></li></ul></li></ul></li></ul></div>

# What is Time Series Data
Time series data is a sequence of data points in chronological order that is used by businesses to analyze past data and make future predictions. These data points are a set of observations at specified times and equal intervals, typically with a datetime index and corresponding value. Common examples of time series data in our day-to-day lives include:     

* Measuring weather temperatures 
* Measuring the number of taxi rides per month
* Predicting a company’s stock prices for the next day


## Components of Time Series

Time series data consist of four components:

1. Trend Component: This is a variation that moves up or down in a reasonably predictable pattern over a long period.

2. Seasonality Component: is the variation that is regular and periodic and repeats itself over a specific period such as a day, week, month, season, etc.,

3. Cyclical Component: is the variation that corresponds with business or economic 'boom-bust' cycles or follows their own peculiar cycles, and

4. Random Component: is the variation that is erratic or residual and does not fall under any of the above three classifications.

To make this concept more clear here is a visual interpretation of the various components of the Time Series. You can view the original diagram with its context, [here](https://www.atap.gov.au/tools-techniques/travel-demand-modelling/6-forecasting-evaluation).

<img src='https://kite.com/wp-content/uploads/2019/08/variations-of-time-series.jpg'>
<div align="center"><font size="2">[Source](https://www.atap.gov.au/tools-techniques/travel-demand-modelling/6-forecasting-evaluation)</font></div>

# Dataset


In this notebook, we’ll use it to analyze stock prices of [Maruti](https://en.wikipedia.org/wiki/Maruti_Suzuki) and perform some basic time series operations. Maruti Suzuki India Limited, formerly known as **Maruti Udyog Limited**, is an automobile manufacturer in India. It is a 56.21% owned subsidiary of the Japanese car and motorcycle manufacturer [Suzuki Motor Corporation](https://en.wikipedia.org/wiki/Suzuki).[6]As of July 2018, it had a market share of 53% of the Indian passenger car market.

<img src='https://www.dsij.in/Portals/0/EasyDNNnews/11202/img-MarutiSuzuki.jpg'>
<div align="center"><font size="2">Source - https://www.dsij.in/Portals/0/EasyDNNnews/11202/img-MarutiSuzuki.jpg</font></div>


# Objective

In this notebook, we will introduce some common techniques used in time-series analysis and walk through the iterative steps required to manipulate, visualize time-series data.
<hr>

# A first look at Maruti’s stock Prices

Let’s check what the first 5 lines of our time-series data look like:

In [None]:
df = pd.read_csv("/kaggle/input/nifty50-stock-market-data/MARUTI.csv")
df.head()

In [None]:
# For the sake of this notebook, I shall limit the number of columns to keep things simple. 

data = df[['Date','Open','High','Low','Close','Volume','VWAP']]


## Datetime objects in Python

Let us now look at the datatypes of the various components.

In [None]:
data.info()

It appears that the Date column is being treated as a string rather than as dates. To fix this, we’ll use the pandas `to_datetime()` feature which converts the arguments to dates.

In [None]:
# Convert string to datetime64
data['Date'] = data['Date'].apply(pd.to_datetime)
data.set_index('Date',inplace=True)
data.head()

## About the Stock Data

Now that our data has been converted into the desired format, let’s take a look at its various columns for further analysis.      

* **The Open and Close columns** indicate the opening and closing price of the stocks on a particular day.
* **The High and Low columns** provide the highest and the lowest price for the stock on a particular day, respectively.
* **The Volume column** tells us the total volume of stocks traded on a particular day.

The **volume weighted average price (VWAP)** is a trading benchmark used by traders that gives the average price a security has traded at throughout the day, based on both volume and price. It is important because it provides traders with insight into both the trend and value of a security.[source](https://www.investopedia.com/terms/v/vwap.asp).

## Visualising the Time Series data


In [None]:
data['VWAP'].plot(figsize=(10,6),title='Maruti Stock Prices')
plt.ylabel('VWAP')

There has been a steady increase in Maruti's Stock prices except a slimp around 2019. We shall use Pandas to investigate it further in the coming sections.
<hr>

# Manipulating TimeSeries dataset

As pandas was developed in the context of financial modeling, it contains a comprehensive set of tools for working with dates, times, and time-indexed data. Let’s look at the main pandas data structures for working with time series data.

## Manipulating Datetime
Python's basic objects for working with dates and times reside in the built-in `datetime` module. In pandas, a single point in time is represented as a `Timestamp` and we can use `datetime()` function to create Timestamps from strings in a wide variety of date/time formats.

In [None]:
from datetime import datetime
my_year = 2019
my_month = 4
my_day = 21
my_hour = 10
my_minute = 5
my_second = 30

We can now create a datetime object, and use it freely with pandas given the above attributes.

In [None]:
test_date = datetime(my_year, my_month, my_day)
test_date



For the purposes of analyzing our particular data, we have selected only the day, month and year, but we could also include more details like hour, minute and second if necessary. 

In [None]:
test_date = datetime(my_year, my_month, my_day, my_hour, my_minute, my_second)
print('The day is : ', test_date.day)
print('The hour is : ', test_date.hour)
print('The month is : ', test_date.month)

For our stock price dataset, the type of the index column is DatetimeIndex. We can use pandas to obtain the minimum and maximum dates in the data.

In [None]:
print(data.index.max())
print(data.index.min())

We can also calculate the latest date location and the earliest date index location as follows:

In [None]:
# Earliest date index location
print('Earliest date index location is: ',data.index.argmin())

# Latest date location
print('Latest date location: ',data.index.argmax())


## Subsetting Data Using Pandas Dataframes

Instead of working with the entire data, it is also possible to slice the Time Series data to highlight the portion of the data we are interested in. Since the volume weighted average price (VWAP) is a trading benchmark, we shall limit our analysis to only that column.

In [None]:
df_vwap = df[['Date','VWAP']]
df_vwap['Date'] = df_vwap['Date'].apply(pd.to_datetime)
df_vwap.set_index("Date", inplace = True)
df_vwap.head()

In [None]:
# Slicing on year
vwap_subset = df_vwap['2017':'2020']

# Slicing on month
vwap_subset = df_vwap['2017-01':'2020-12']

#Slicing on day
vwap_subset = df_vwap['2017-01-01':'2020-12-15']

# Visualizing the volume weighted average price (VWAP)

When working with time-series data, a lot can be revealed through visualizing it. 


## Visualizing using markers
It is possible to add markers in the plot to help emphasize the specific observations or specific events in the time series.

In [None]:
ax = vwap_subset.plot(color='blue',fontsize=14)
ax.set_xlabel('Date')
ax.set_ylabel('VWAP')

ax.axvspan('2019-01-01','2019-01-31', color='red', alpha=0.3)
ax.axhspan(6500,7000, color='green',alpha=0.3)

plt.show()

## Visualising using KDEs

Summarizing the data with Density plots to see where the mass of the data is located

In [None]:
sns.kdeplot(df_vwap['VWAP'],shade=True)

## Visualising using Lineplots

In [None]:
# Visualising the VWAP 
df_vwap['VWAP'].plot(figsize=(16,8),title=' volume weighted average price')

It appears that Maruti had a more or less steady increase in its stock price over the from 2004 to the mid of 2018 window.There appears to be some drop in 2019 though.let's further analyse the data for the year 2018. 

In [None]:
ax = df_vwap.loc['2018', 'VWAP'].plot(figsize=(15,6))
ax.set_title('Month-wise Trend in 2018'); 
ax.set_ylabel('VWAP');
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'));

We see that there was a dip in the stock prices particularly around end of October and November. Let's zoom in on these dates

In [None]:
ax = df_vwap.loc['2018-10':'2018-11','VWAP'].plot(marker='o', linestyle='-',figsize=(15,6))
ax.set_title('Oct-Nov 2018 trend'); 
ax.set_ylabel('VWAP');
ax.xaxis.set_major_locator(mdates.WeekdayLocator(byweekday=mdates.MONDAY))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %d'));

So there is a dip in stock prices around the last week of october and first week of November. One could investigate it further by finding out if there was some special event that occured on that day.
<hr>

# Time-series seasonal decomposition

We can decompose a time series into trend, seasonal amd remainder components, as mentioned in the earlier section. The series can be decomposed as an additive or multiplicative combination of the base level, trend, seasonal index and the residual.

The seasonal_decompose in statsmodels is used to implements the decomposition.

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose
from dateutil.parser import parse

plt.rcParams.update({'figure.figsize': (10,10)})
y = df_vwap['VWAP'].to_frame()


# Multiplicative Decomposition 
result_mul = seasonal_decompose(y, model='multiplicative',freq = 52)

# Additive Decomposition
result_add = seasonal_decompose(y, model='additive',freq = 52)

# Plot
plt.rcParams.update({'figure.figsize': (10,10)})
result_mul.plot().suptitle('Multiplicative Decompose', fontsize=22)
result_add.plot().suptitle('Additive Decompose', fontsize=22)
plt.show()


In [None]:
## Extract the Components
# Actual Values = Product of (Seasonal * Trend * Resid)
df_reconstructed = pd.concat([result_add.seasonal, result_add.trend, result_add.resid, result_add.observed], axis=1)
df_reconstructed.columns = ['seas', 'trend', 'resid', 'actual_values']
df_reconstructed.tail()


# Feature Extraction


Let's extract time and date features from the Date column. 

In [None]:
df_vwap.reset_index(inplace=True)
df_vwap['year'] = df_vwap.Date.dt.year
df_vwap['month'] = df_vwap.Date.dt.month
df_vwap['day'] = df_vwap.Date.dt.day
df_vwap['day of week'] = df_vwap.Date.dt.dayofweek
df_vwap['Weekday Name'] = df_vwap.Date.dt.weekday_name


#Set Date column as the index column.
df_vwap.set_index('Date', inplace=True)
df_vwap.head()

# Time resampling

Examining stock price data for every single day isn’t of much use to financial institutions, who are more interested in spotting market trends. To make it easier, we use a process called time resampling to aggregate data into a defined time period, such as by month or by quarter. Institutions can then see an overview of stock prices and make decisions according to these trends.           

The pandas library has a `resample()` function which resamples such time series data. The resample method in pandas is similar to its `groupby` method as it is essentially grouping according to a certain time span. The `resample()` function looks like this:

In [None]:
df_vwap.resample(rule = 'A').mean()[:5]

> **To summarize what happened above:**
>
>  * `data.resample()` is used to resample the stock data.
>  * The ‘**A**’ stands for year-end frequency, and denotes the offset values by which we want to resample the data.
>  * `mean()` indicates that we want the average stock price during this period.

Below is a complete list of the offset values. The list can also be found in the [pandas documentation](http://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases).

![](https://cdn-images-1.medium.com/max/800/1*piQRFEDprVNqznejGotpcw.png)

We can also use time sampling to plot charts for specific columns.

In [None]:
plt.rcParams['figure.figsize'] = (8, 6)
df_vwap['VWAP'].resample('A').mean().plot(kind='bar')
plt.title('Yearly Mean VWAP for Maruti')


The above bar plot corresponds to Maruti’s VWAP at year-end for each year in our data set. 

Similarly, year start mean VWAP can be found below. 

In [None]:
df_vwap['VWAP'].resample('AS').mean().plot(kind='bar',figsize = (10,4))
plt.title('Yearly start Mean VWAP for Maruti')


# Time Shifting

Sometimes, we may need to shift or move the data forward or backwards in time. This shifting is done along a time index by the desired number of time-frequency increments.Here is the original dataset before any time shifts.

In [None]:
df_vwap.head()

## Forward Shifting

To shift our data forward, we will pass the desired number of periods (or increments) through the shift() function, which needs to be positive value in this case. Let's move our data forward by one period or index, which means that all values which earlier corresponded to row N will now belong to row N+1. Here is the output: 


In [None]:
df_vwap.shift(1).head()

## Backwards Shifting

To shift our data backwards, the number of periods (or increments) must be negative.

In [None]:
df_vwap.shift(-1).head()

The opening amount corresponding to **2003-07-09** is now **167**, whereas originally it was **164.90**.

## Shifting based off time string code

We can also use the offset from the offset table for time shifting. For that, we will use the pandas `shift()` function. We only need to pass in the `periods` and `freq` parameters. The `period` attribute defines the number of steps to be shifted, while the `freq` parameters denote the size of those steps.

Let’s say we want to shift the data three months forward:

In [None]:
df_vwap.tshift(periods=3, freq = 'M').head()


# Rolling windows

Time series data can be noisy due to high fluctuations in the market. As a result, it becomes difficult to gauge a trend or pattern in the data. Here is a visualization of the Amazon’s adjusted close price over the years where we can see such noise:  

In [None]:
df_vwap['VWAP'].plot(figsize = (10,6))

As we’re looking at daily data, there’s quite a bit of noise present. It would be nice if we could average this out by a week, which is where a rolling mean comes in. A rolling mean, or moving average, is a transformation method which helps average out noise from data. It works by simply splitting and aggregating the data into windows according to function, such as `mean()`, `median()`, `count()`, etc. For this example, we’ll use a rolling mean for 7 days.

In [None]:
df_vwap.rolling(7).mean().head(10)

The first six values have all become blank as there wasn’t enough data to actually fill them when using a window of seven days.      

So, what are the key benefits of calculating a moving average or using this rolling mean method? Our data becomes a lot less noisy and more reflective of the trend than the data itself. Let’s actually plot this out. First, we’ll plot the original data followed by the rolling data for 30 days.     

In [None]:
df_vwap['VWAP'].plot()
df_vwap.rolling(window=30).mean()['VWAP'].plot(figsize=(16, 6))

The **blue line** is the original open price data. The **red line represents the 30-day rolling window**, and has less noise than the orange line. Something to keep in mind is that once we run this code, the first 29 days aren’t going to have the blue line because there wasn’t enough data to actually calculate that rolling mean.

# Handling Missing Values in Time-series Data

Real world data is messy and often contains missing values. it is not uncommon for time-series data to contain missing values. 

In [None]:
#Checking for missing values
df_vwap.isnull().sum()

Our current data doesn't have any missing values but then this doesn't reflect so of the scenarios we might face in real life. I have created an extensive notebook that goes deeper into handling missing values in both Time series and non Time series problems.

![](https://imgur.com/HKYjTqQ.png)

## Basic Imputation Techniques for Time Series
* 'ffill' or 'pad' - Replace NaN s with last observed value
* 'bfill' or 'backfill' - Replace NaN s with next observed value
* Linear interpolation method

### [Link to the notebook : A guide to Handling missing values](https://www.kaggle.com/parulpandey/a-guide-to-handling-missing-values)

## Conclusion

Python’s pandas library is a powerful, comprehensive library with a wide variety of inbuilt functions for analyzing time series data. In this article, we saw how pandas can be used for wrangling and visualizing time series data. 

We also performed tasks like time sampling, time shifting and rolling with stock data. These are usually the first steps in analyzing any time series data. Going forward, we could use this data to perform a basic financial analysis by calculating the daily percentage change in stocks to get an idea about the volatility of stock prices. Another way we could use this data would be to predict Maruti’s stock prices for the next few days by employing machine learning techniques. This would be especially helpful from the shareholder’s point of view. 

#### P.S : This notebook has also been converted into a blogpost, incase you are interested. You can read it here: [Time Series Analysis with Pandas](https://kite.com/blog/python/pandas-time-series-analysis/)