In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input/nifty50-stock-market-data'):
    for filename in filenames:
        print(os.path.join(dirname, filename))



![](https://images.unsplash.com/photo-1535320903710-d993d3d77d29?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=1350&q=80)

*Photo by M. B. M. on Unsplash*

# Time series data

Time series data is a sequence of data points in chronological order that is used by businesses to analyze past data and make future predictions. These data points are a set of observations at specified times and equal intervals, typically with a datetime index and corresponding value. Common examples of time series data in our day-to-day lives include:      

* Measuring weather temperatures 
* Measuring the number of taxi rides per month
* Predicting a company’s stock prices for the next day

## Components of Time Series

Time series data consist of four components:

* Trend Component: This is a variation that moves up or down in a reasonably predictable pattern over a long period.

* Seasonality Component: is the variation that is regular and periodic and repeats itself over a specific period such as a day, week, month, season, etc.,

* Cyclical Component: is the variation that corresponds with business or economic 'boom-bust' cycles or follows their own peculiar cycles, and

* Random Component: is the variation that is erratic or residual and does not fall under any of the above three classifications.

To make this concept more clear here is a visual interpretation of the various components of the Time Series. You can view the original diagram with its context, [here](https://www.atap.gov.au/tools-techniques/travel-demand-modelling/6-forecasting-evaluation).

![](https://kite.com/wp-content/uploads/2019/08/variations-of-time-series.jpg)

# Importing necessary libraries

Python’s Pandas' library is frequently used to import, manage, and analyze datasets in a variety of formats. In this article, we’ll use it to analyze stock prices of [Maruti](https://en.wikipedia.org/wiki/Maruti_Suzuki) and perform some basic time series operations.

Maruti Suzuki India Limited, formerly known as **Maruti Udyog Limited**, is an automobile manufacturer in India. It is a 56.21% owned subsidiary of the Japanese car and motorcycle manufacturer [Suzuki Motor Corporation](https://en.wikipedia.org/wiki/Suzuki).[6]As of July 2018, it had a market share of 53% of the Indian passenger car market.[*Wikipedia*]

In [None]:
# Importing required modules
import pandas as pd          
import numpy as np               # For mathematical calculations 
import matplotlib.pyplot as plt  # For plotting graphs 
import datetime as dt
from datetime import datetime    # To access datetime 
from pandas import Series        # To work on series 
%matplotlib inline 

import warnings                   # To ignore the warnings 
warnings.filterwarnings("ignore")


# Settings for pretty nice plots
plt.style.use('fivethirtyeight')
plt.show()



# Table of Contents:
* A first look at Maruti’s stock Prices
* Pandas for time series analysis
 - Manipulating datetime
* Feature Extraction
* Exploratory data Analysis
* Time Resampling
* Time Shifting
 - Forward Shifting
 - Bckward Shifting
 - Shifting based of time string code
* Rolling Window 
 


# A first look at Maruti’s stock Prices
Let’s look at the first few columns of the dataset:

In [None]:
df = pd.read_csv("/kaggle/input/nifty50-stock-market-data/MARUTI.csv")
df.head()

In [None]:
# For the sake of this notebook, I shall limit the number of columns to keep things simple. 

data = df[['Date','Open','High','Low','Close','Volume','VWAP']]


Let us now look at the datatypes of the various components.

In [None]:
data.info()

It appears that the Date column is being treated as a string rather than as dates. To fix this, we’ll use the pandas `to_datetime()` feature which converts the arguments to dates.

In [None]:
# Convert string to datetime64
data['Date'] = data['Date'].apply(pd.to_datetime)
data.head()

Now that our data has been converted into the desired format, let’s take a look at its various columns for further analysis.      

* **The Open and Close columns** indicate the opening and closing price of the stocks on a particular day.
* **The High and Low columns** provide the highest and the lowest price for the stock on a particular day, respectively.
* **The Volume column** tells us the total volume of stocks traded on a particular day.

The **volume weighted average price (VWAP)** is a trading benchmark used by traders that gives the average price a security has traded at throughout the day, based on both volume and price. It is important because it provides traders with insight into both the trend and value of a security.[source](https://www.investopedia.com/terms/v/vwap.asp).

# Pandas for time series analysis
As pandas was developed in the context of financial modeling, it contains a comprehensive set of tools for working with dates, times, and time-indexed data. Let’s look at the main pandas data structures for working with time series data.

## Manipulating datetime
Python's basic objects for working with dates and times reside in the built-in `datetime` module. In pandas, a single point in time is represented as a `Timestamp` and we can use `datetime()` function to create Timestamps from strings in a wide variety of date/time formats.

In [None]:
from datetime import datetime
my_year = 2019
my_month = 4
my_day = 21
my_hour = 10
my_minute = 5
my_second = 30

We can now create a datetime object, and use it freely with pandas given the above attributes.

In [None]:
test_date = datetime(my_year, my_month, my_day)
test_date


For the purposes of analyzing our particular data, we have selected only the day, month and year, but we could also include more details like hour, minute and second if necessary. 

In [None]:
test_date = datetime(my_year, my_month, my_day, my_hour, my_minute, my_second)
print('The day is : ', test_date.day)
print('The hour is : ', test_date.hour)
print('The month is : ', test_date.month)

For our stock price dataset, the type of the index column is DatetimeIndex. We can use pandas to obtain the minimum and maximum dates in the data.

In [None]:
print(data.index.max())
print(data.index.min())

We can also calculate the latest date location and the earliest date index location as follows:

In [None]:
# Earliest date index location
print('Earliest date index location is: ',data.index.argmin())

# Latest date location
print('Latest date location: ',data.index.argmax())


## Feature Extraction

Let's extract time and date features from the Date column. Since the **volume weighted average price** (VWAP) is a trading benchmark, we shall limit our analysis to only that column.

In [None]:
df_vwap = df[['Date','VWAP']]
df_vwap['Date'] = df_vwap['Date'].apply(pd.to_datetime)
df_vwap.head()

Let's extract the year, month, day, day of the week from the Date column.

**Note - 0 is the starting of the week, i.e., 0 is Monday and 6 is Sunday.**


In [None]:
df_vwap['year'] = df_vwap.Date.dt.year
df_vwap['month'] = df_vwap.Date.dt.month
df_vwap['day'] = df_vwap.Date.dt.day
df_vwap['day of week'] = df_vwap.Date.dt.dayofweek

#Set Date column as the index column.
df_vwap.set_index('Date', inplace=True)
df_vwap.head()

In [None]:
# Visualising the VWAP 

plt.figure(figsize=(16,8)) 
plt.plot(df_vwap['VWAP'], label='VWAP') 
plt.title('Time Series') 
plt.xlabel("Time(year)") 
plt.ylabel("Volume Weighted Average Price") 
plt.legend(loc='best')

It appears that Maruti had a more or less steady increase in its stock price over the from 2004 to the mid of 2018 window.There appears to be some drop in 2019 though.  We’ll now use pandas to analyze and manipulate this data to gain insights.

## Exploratory Data Analysis

Let's explore the data and look at details at year, month and day level

In [None]:
# Yearly VWAP of Maruti Stocks

df_vwap.groupby('year')['VWAP'].mean().plot.bar()

There is a continuos increase in the VWAP price till 2018 and a certain dip in 2019. Now, let's analyse the data month wise

In [None]:
# Monthly VWAP of Maruti Stocks

df_vwap.groupby('month')['VWAP'].mean().plot.bar()

no major difference in between different months

In [None]:
# Daily VWAP of Maruti Stocks

df_vwap.groupby('day')['VWAP'].mean().plot.bar()

Again, all days of the month have somewhat similar outcomes

In [None]:
# Analysing w.r.t day of the week

df_vwap.groupby('day of week')['VWAP'].mean().plot.bar()

This is somewhat strange. While the activity is almost similar from Monday to Friday, Sunday shows a sudden spike in the VWAP price. This means although stockmarkets are closed on weekends, but when they were open, there were huge transactions on those days.

## Time resampling
Examining stock price data for every single day isn’t of much use to financial institutions, who are more interested in spotting market trends. To make it easier, we use a process called time resampling to aggregate data into a defined time period, such as by month or by quarter. Institutions can then see an overview of stock prices and make decisions according to these trends.           

The pandas library has a `resample()` function which resamples such time series data. The resample method in pandas is similar to its `groupby` method as it is essentially grouping according to a certain time span. The `resample()` function looks like this:

In [None]:
df_vwap.resample(rule = 'A').mean()[:5]

> **To summarize what happened above:**
>
>  * `data.resample()` is used to resample the stock data.
>  * The ‘**A**’ stands for year-end frequency, and denotes the offset values by which we want to resample the data.
>  * `mean()` indicates that we want the average stock price during this period.

Below is a complete list of the offset values. The list can also be found in the [pandas documentation](http://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases).

![](https://cdn-images-1.medium.com/max/800/1*piQRFEDprVNqznejGotpcw.png)

We can also use time sampling to plot charts for specific columns.

In [None]:
df_vwap['VWAP'].resample('A').mean().plot(kind='bar',figsize = (10,4))
plt.title('Yearly Mean VWAP for Maruti')


The above bar plot corresponds to Maruti’s VWAP at year-end for each year in our data set. 

Similarly, year start mean VWAP can be found below. 

In [None]:
df_vwap['VWAP'].resample('AS').mean().plot(kind='bar',figsize = (10,4))
plt.title('Yearly start Mean VWAP for Maruti')


## Time Shifting

Sometimes, we may need to shift or move the data forward or backwards in time. This shifting is done along a time index by the desired number of time-frequency increments.Here is the original dataset before any time shifts.

In [None]:
df_vwap.head()

## Forward Shifting
To shift our data forward, we will pass the desired number of periods (or increments) through the shift() function, which needs to be positive value in this case. Let's move our data forward by one period or index, which means that all values which earlier corresponded to row N will now belong to row N+1. Here is the output: 


In [None]:
df_vwap.shift(1).head()

## Backwards shifting
To shift our data backwards, the number of periods (or increments) must be negative.

In [None]:
df_vwap.shift(-1).head()

The opening amount corresponding to **2003-07-09** is now **167**, whereas originally it was **164.90**.

## Shifting based off time string code
We can also use the offset from the offset table for time shifting. For that, we will use the pandas `shift()` function. We only need to pass in the `periods` and `freq` parameters. The `period` attribute defines the number of steps to be shifted, while the `freq` parameters denote the size of those steps.

Let’s say we want to shift the data three months forward:

In [None]:
df_vwap.tshift(periods=3, freq = 'M').head()

## Rolling windows

Time series data can be noisy due to high fluctuations in the market. As a result, it becomes difficult to gauge a trend or pattern in the data. Here is a visualization of the Amazon’s adjusted close price over the years where we can see such noise:  

In [None]:
df_vwap['VWAP'].plot(figsize = (10,6))

As we’re looking at daily data, there’s quite a bit of noise present. It would be nice if we could average this out by a week, which is where a rolling mean comes in. A rolling mean, or moving average, is a transformation method which helps average out noise from data. It works by simply splitting and aggregating the data into windows according to function, such as `mean()`, `median()`, `count()`, etc. For this example, we’ll use a rolling mean for 7 days.

In [None]:
df_vwap.rolling(7).mean().head(10)

The first six values have all become blank as there wasn’t enough data to actually fill them when using a window of seven days.      

So, what are the key benefits of calculating a moving average or using this rolling mean method? Our data becomes a lot less noisy and more reflective of the trend than the data itself. Let’s actually plot this out. First, we’ll plot the original data followed by the rolling data for 30 days.     

In [None]:
df_vwap['VWAP'].plot()
df_vwap.rolling(window=30).mean()['VWAP'].plot(figsize=(16, 6))

The **blue line** is the original open price data. The **red line represents the 30-day rolling window**, and has less noise than the orange line. Something to keep in mind is that once we run this code, the first 29 days aren’t going to have the blue line because there wasn’t enough data to actually calculate that rolling mean.

# Conclusion
Python’s pandas library is a powerful, comprehensive library with a wide variety of inbuilt functions for analyzing time series data. In this article, we saw how pandas can be used for wrangling and visualizing time series data. 

We also performed tasks like time sampling, time shifting and rolling with stock data. These are usually the first steps in analyzing any time series data. Going forward, we could use this data to perform a basic financial analysis by calculating the daily percentage change in stocks to get an idea about the volatility of stock prices. Another way we could use this data would be to predict Maruti’s stock prices for the next few days by employing machine learning techniques. This would be especially helpful from the shareholder’s point of view. 

#### P.S : This notebook has also been converted into a blogpost, incase you are interested. You can read it here: [Time Series Analysis with Pandas](https://kite.com/blog/python/pandas-time-series-analysis/)