<a href="https://colab.research.google.com/github/gopaljigupta45/YES_BANK_STOCK_PRICE_PREDICTION/blob/main/Yes_bank_stock_price_prediction_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **PROBLEM STATEMENT :-**

### Yes Bank is a well-known bank in the Indian financial domain. Since 2018, it has been in the news because of the fraud case involving Rana Kapoor. Owing to this fact, it was interesting to see how that impacted the stock prices of the company and whether Time series models or any other predictive models can do justice to such situations. This dataset has monthly stock prices of the bank since its inception and includes closing, starting, highest, and lowest stock prices of every month.



### ***Our main objective is to predict the stockâ€™s closing price of the month.*** 


---




## **Loading the libraries and the data.**
---


In [53]:
# importing the libraries we'll need.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

In [54]:
# Mounting google drive to load the data.
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [55]:
# Loading our dataset.
df = pd.read_csv('/content/drive/MyDrive/Yes Bank Stock Closing Price Prediction - GOPAL JI GUPTA/data_YesBank_StockPrices.csv')

In [56]:
# Taking a look at the data.
df.head()          # displays first five instances of the dataframe.

Unnamed: 0,Date,Open,High,Low,Close
0,Jul-05,13.0,14.0,11.25,12.46
1,Aug-05,12.58,14.88,12.55,13.42
2,Sep-05,13.48,14.87,12.27,13.3
3,Oct-05,13.2,14.47,12.4,12.99
4,Nov-05,13.35,13.88,12.88,13.41


Explaining the data:-
We have a dataset containing values of Yes bank monthly stock prices as mentioned in our problem statement. 

Explaining the features present :-


*  **Date :-** The date (Month and Year provided)
*  **Open :-** The price of the stock at the beginning of a particular month.
*  **High :-**The Peak(Maximum) price at which a stock traded during the month.
*  **Low :-**The Lowest price at which a stock traded during the month.
*  **Close :-** The trading price at the end of the month.

## **Data Cleansing.**

---

In [57]:
# Checking for null values.
df.isna().sum()

Date     0
Open     0
High     0
Low      0
Close    0
dtype: int64

In [58]:
# So there are no null values in our dataset.
# Getting information about our data - its datatypes, its size etc. also printing the shape of the data.
df.info()
print('\n', f'The shape of the dataset is : {df.shape}')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 185 entries, 0 to 184
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Date    185 non-null    object 
 1   Open    185 non-null    float64
 2   High    185 non-null    float64
 3   Low     185 non-null    float64
 4   Close   185 non-null    float64
dtypes: float64(4), object(1)
memory usage: 7.4+ KB

 The shape of the dataset is : (185, 5)


In [59]:
# getting descripted statistics of the data.
df.describe(include='all')

Unnamed: 0,Date,Open,High,Low,Close
count,185,185.0,185.0,185.0,185.0
unique,185,,,,
top,Jul-05,,,,
freq,1,,,,
mean,,105.541405,116.104324,94.947838,105.204703
std,,98.87985,106.333497,91.219415,98.583153
min,,10.0,11.24,5.55,9.98
25%,,33.8,36.14,28.51,33.45
50%,,62.98,72.55,58.0,62.54
75%,,153.0,169.19,138.35,153.3


In [60]:
# Let us now preserve the original data before we operate on it.
preserved_stock_data = df.copy()

In [61]:
# Checking for duplicate instances.
df[df.duplicated()==True]

Unnamed: 0,Date,Open,High,Low,Close


In [62]:
# So there is no duplicate data in our dataframe.
# checking the datatypes once more.
df.dtypes

Date      object
Open     float64
High     float64
Low      float64
Close    float64
dtype: object

In [63]:
# as we can see, Date column has the object datatype. 
df['Date']

0      Jul-05
1      Aug-05
2      Sep-05
3      Oct-05
4      Nov-05
        ...  
180    Jul-20
181    Aug-20
182    Sep-20
183    Oct-20
184    Nov-20
Name: Date, Length: 185, dtype: object

In [64]:
# we need to modify this before passing it to a model.
# lets convert Date column to a proper datetime datatype.
from datetime import datetime
df['Date'] = pd.to_datetime(df['Date'].apply(lambda x: datetime.strptime(x, '%b-%y')))     # this converts date to a yyyy-mm-dd format.

In [65]:
df.head()

Unnamed: 0,Date,Open,High,Low,Close
0,2005-07-01,13.0,14.0,11.25,12.46
1,2005-08-01,12.58,14.88,12.55,13.42
2,2005-09-01,13.48,14.87,12.27,13.3
3,2005-10-01,13.2,14.47,12.4,12.99
4,2005-11-01,13.35,13.88,12.88,13.41


Since we are trying to track variation in stock price on different dates, it makes sense to set this column as index.

In [66]:
df.set_index(df.Date, inplace=True)           # setting Date column as index.