<h1 style="color:orange;font-family:courier" align="center">Time Series Analysis</h1>
<h2 style="color:orange;font-family:verdana;" align="center">NIFTY-50 Stock Market</h2>
<h3 style="color:orange;" align="center">BAJAJ-AUTO</h3>

<h1 style="color:royalblue;">Introduction</h1>
Time series analysis is a statistical technique that deals with time series data, or trend analysis.  Time series data means that data is in a series of  particular time periods or intervals.  The data is considered in three types:

* **Time series data:** A set of observations on the values that a variable takes at different times.

* **Cross-sectional data:** Data of one or more variables, collected at the same point in time.

* **Pooled data:** A combination of time series data and cross-sectional data.

<i style="color:royalblue;">Time Series Analysis is used for many applications such as:</i>
 * Economic Forecasting
 * Sales Forecasting
 * Budgetary Analysis
 * Stock Market Analysis
 * Yield Projections
 * Process and Quality Control
 * Inventory Studies
 * Workload Projections
 * Utility Studies
 * Census Analysis
     
<h2 style="color:royalblue;">Components of Time Series</h2>
Time series data consist of four components:

 * Trend Component: This is a variation that moves up or down in a reasonably predictable pattern over a long period.

 * Seasonality Component: is the variation that is regular and periodic and repeats itself over a specific period such as a day, week, month, season, etc.,

 * Cyclical Component: is the variation that corresponds with business or economic 'boom-bust' cycles or follows their own peculiar cycles, and

 * Random Component: is the variation that is erratic or residual and does not fall under any of the above three classifications.

To make this concept more clear here is a visual interpretation of the various components of the Time Series. You can view the original diagram with its context, [here](https://www.atap.gov.au/tools-techniques/travel-demand-modelling/6-forecasting-evaluation).

![](https://kite.com/wp-content/uploads/2019/08/variations-of-time-series.jpg )
<h3 style="color:royalblue;">Traditional Techniques:</h3> The fitting of time series models can be an ambitious undertaking. There are many methods of model fitting including the following:
 * Box-Jenkins ARIMA models
 * Box-Jenkins Multivariate Models
 * Holt-Winters Exponential Smoothing (single, double, triple)
The user's application and preference will decide the selection of the appropriate technique. It is beyond the realm and intention of the authors of this handbook to cover all these methods. The overview presented here will start by looking at some basic smoothing techniques:
 * Averaging Methods
 * Exponential Smoothing Techniques.
 
<h3 style="color:royalblue;">Modern Technique:</h3>
All these techniques tutorial are mention in this [notebook](https://www.kaggle.com/rohanrao/a-modern-time-series-tutorial) given by @rohanrao
* Auto ARIMAX
* Facebook Prophet
* LightGBM
* LSTM 

<h1 style="color:royalblue;">NIFTY 50</h1>
The NIFTY 50 index National Stock Exchange of India's benchmark broad based stock market index for the Indian equity market. Full form of NIFTY is National Index Fifty. It represents the weighted average of 50 Indian company stocks in 13 sectors and is one of the two main stock indices used in India, the other being the BSE Sensex.

<p>Nifty is owned and managed by India Index Services and Products (IISL), which is a wholly owned subsidiary of the NSE Strategic Investment Corporation Limited. IISL had a marketing and licensing agreement with Standard and Poor's for co-branding equity indices until 2013. The Nifty 50 was launched 1 April 1996, and is one of the many stock indices of Nifty.</p>
**Source:**https://en.wikipedia.org/wiki/NIFTY_50

<h1 style="color:royalblue;">BAJAJ-AUTO</h1>
<h4 style="color:darkblue;">The Company</h4>
The Bajaj Group is amongst the top 10 business houses in India. Its footprint stretches over a wide range of industries, spanning automobiles (two wheelers manufacturer and three wheelers manufacturer), home appliances, lighting, iron and steel, insurance, travel and finance. The group's flagship company, Bajaj Auto, is ranked as the world's fourth largest three and two wheeler manufacturer and the Bajaj brand is well-known across several countries in Latin America, Africa, Middle East, South and South East Asia. Founded in 1926, at the height of India's movement for independence from the British, the group has an illustrious history. The integrity, dedication, resourcefulness and determination to succeed which are characteristic of the group today, are often traced back to its birth during those days of relentless devotion to a common cause. Jamnalal Bajaj, founder of the group, was a close confidant and disciple of Mahatma Gandhi. In fact, Gandhiji had adopted him as his son.
<p>In 2007, Bajaj Auto acquired a 14% stake in KTM that has since grown to 48%. This partnership catalysed Bajaj Auto’s endeavour to democratise motorcycle racing in India. Bajaj Auto today exclusively manufactures Duke range of KTM bikes and exports them worldwide. In FY2018, KTM was the fastest growing motorcycle brand in the country</p>
**Source:** https://www.bajajauto.com/
<h4 style="color:orange;">Before starting, I would like to thanks @parulpandey and @rohanrao for some very amazing inspiration.</h4>
<h4> Kernel Inspiration:</h4>
* https://www.kaggle.com/parulpandey/getting-started-with-time-series-using-pandas 
* https://www.kaggle.com/rohanrao/a-modern-time-series-tutorial

# 1. Importing Packages and Collecting Data

In [None]:
'''Import basic modules'''
import pandas as pd
import numpy as np
import datetime as dt
from datetime import datetime    
from pandas import Series 
import statsmodels.api as sm

'''import visualization'''
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")
%matplotlib inline
import altair as alt # visualization

'''Display markdown formatted output like bold, italic bold etc.'''
from IPython.display import Markdown
def bold(string):
    display(Markdown(string))

In [None]:
"""Let's look on the Bajaj-Auto stok price dataset"""
df = pd.read_csv("../input/nifty50-stock-market-data/BAJAJ-AUTO.csv")
df.head()

In [None]:
"""Let's look on the data info"""
df.info()

Now that our data has been converted into the desired format, let’s take a look at its various columns for further analysis.

* **The Open and Close columns** indicate the opening and closing price of the stocks on a particular day.
* **The High and Low columns** provide the highest and the lowest price for the stock on a particular day, respectively.
* **The Volume column** tells us the total volume of stocks traded on a particular day.
The **volume weighted average price (VWAP)** is a trading benchmark used by traders that gives the average price a security has traded at throughout the day, based on both volume and price. It is important because it provides traders with insight into both the trend and value of a security.[source](https://www.investopedia.com/terms/v/vwap.asp).

# Data Prearation

In [None]:
df.Date = pd.to_datetime(df.Date, format="%Y-%m-%d")
df["month"] = df.Date.dt.month
df["week"] = df.Date.dt.week
df["day"] = df.Date.dt.day
df["day_of_week"] = df.Date.dt.dayofweek
df.fillna(df.mean(), inplace=True)

df.set_index("Date", drop=False, inplace=True)
df.head()

# Exploratory Data Analysis
Let's explore the data and look at details at year, month and day level

## <font color="brown">Volume Weighted Average Price</font>

In [None]:
bars = alt.Chart(df).mark_trail(color='orange').encode(
    x = 'Date:T',
    y = 'VWAP:Q',
).properties(
    title={
    "text":['Volume Weighted Average Price (VWAP)'],
    "subtitle":['There is a continuos increase in the VWAP price till 2018 and a certain dip in 2019'],
    "fontSize":15,
    "fontWeight": 'bold',
    "font":'Courier New',
    }
)

text = bars.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
)   

(bars + text).properties( height=300, width=600)

In [None]:
vwap_df = df[['VWAP']]
start_date = datetime(2017,1,1)
end_date = datetime(2018,12,1)
temp = vwap_df[(start_date <=vwap_df.index) & (end_date <=vwap_df.index)].reset_index()
bars = alt.Chart(temp).mark_trail(color='orange').encode(
    x = 'Date:T',
    y = 'VWAP:Q',
).properties(
    title={
    "text":['Trend of VWAP in 2019  '],
    "subtitle":['There is a continuos increase in the VWAP price in 2019'],
    "fontSize":15,
    "fontWeight": 'bold',
    "font":'Courier New',
    }
)

text = bars.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
)   

(bars + text).properties( height=300, width=600)

## <font color="brown">Open and Close Stock Price</font>

In [None]:
temp = df.groupby('month')['Open'].mean().reset_index()
bars = alt.Chart(temp).mark_bar().encode(
    y = 'month:O',
    x = 'Open:Q',
    color=alt.condition(
        alt.datum.month < 11 ,  
        alt.value('darkblue'),     
        alt.value('orange')   
    )
).properties(
    title={
    "text":['Opening Price of the Stocks (Month Wise)'],
    "subtitle":['No major difference in between different months but highest opening price in the month of Nov & Dec'],
    "fontSize":15,
    "fontWeight": 'bold',
    "font":'Courier New',
    }
)

text = bars.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
)   

(bars + text).properties( height=250, width=600)

In [None]:
temp = df.groupby('day_of_week')['Open'].mean().reset_index()
bars = alt.Chart(temp).mark_bar(color='darkblue').encode(
    y = 'day_of_week:O',
    x = 'Open:Q',
    color=alt.condition(
        alt.datum.day_of_week == 6,  
        alt.value('orange'),     
        alt.value('darkblue')   
    )
).properties(
    title={
    "text":['Opening Price of the Stocks (Day of Week)'],
    "subtitle":['Averge highest opening price in the sunday'],
    "fontSize":15,
    "fontWeight": 'bold',
    "font":'Courier New',
    }
)

text = bars.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
)   

(bars + text).properties( height=250, width=600)

In [None]:
temp = df.groupby('month')['Close'].mean().reset_index()
bars = alt.Chart(temp).mark_bar().encode(
    y = 'month:O',
    x = 'Close:Q',
    color=alt.condition(
        alt.datum.month < 11 ,  
        alt.value('darkblue'),     
        alt.value('orange')   
    )
).properties(
    title={
    "text":['Closing Price of the Stocks (Month Wise)'],
    "subtitle":['No major difference in between different months but highest closing price in the month of Nov & Dec'],
    "fontSize":15,
    "fontWeight": 'bold',
    "font":'Courier New',
    }
)

text = bars.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
)   

(bars + text).properties( height=250, width=600)

In [None]:
temp = df.groupby('day_of_week')['Close'].mean().reset_index()
bars = alt.Chart(temp).mark_bar(color='darkblue').encode(
    y = 'day_of_week:O',
    x = 'Close:Q',
    color=alt.condition(
        alt.datum.day_of_week == 6 ,  
        alt.value('orange'),     
        alt.value('darkblue')   
    )
).properties(
    title={
    "text":['Opening Price of the Stocks (Day of Week)'],
    "subtitle":['Averge highest closing price in the sunday'],
    "fontSize":15,
    "fontWeight": 'bold',
    "font":'Courier New',
    }
)

text = bars.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
)   

(bars + text).properties( height=250, width=600)

# Facebook Prophet

Prophet follows the sklearn model API. We create an instance of the Prophet class and then call its fit and predict methods.

The input to Prophet is always a dataframe with two columns: ds and y. The ds (datestamp) column should be of a format expected by Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a timestamp. The y column must be numeric, and represents the measurement we wish to forecast.

Splitting the data into train and validation along with features.

* **train:** Data from 26th May, 2008 to 31st December, 2018.
* **valid:** Data from 1st January, 2019 to 31st December, 2019.

```Note that the default parameters are used for Prophet. They can be tuned to improve the results.```

In [None]:
df_train = df[df.Date < "2019"]
df_valid = df[df.Date >= "2019"]

from fbprophet import Prophet
model = Prophet()
model.fit(df_train[["Date", "VWAP"]].rename(columns={"Date": "ds", "VWAP": "y"}))

forecast = model.predict(df_valid[["Date", "VWAP"]].rename(columns={"Date": "ds", "VWAP": "y"}))

In [None]:
model.plot_components(forecast)

In [None]:
model.plot(forecast)
plt.title('Volume Weighted Average Price (VWAP) With Predicted Values', fontsize=15)
plt.show()

## <font color="teal">Give me your feedback and if you find my kernel helpful please UPVOTE will be appreciated.</font>