<a href="https://colab.research.google.com/github/hmarathe420/Kotak_Mahindra_Bank_Stock_Price_Prediction/blob/main/Kotak_Mahindra_Bank_Stock_Price_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project - Kotak Mahindra Bank Stock Price Prediction**
## **Name - Harshal Marathe**
### **Project Type - Regression**
#### **Role - Data Science Intern at Bharat Intern**

## **Project Summary -**


As a Data Science Intern at Bharat Intern, I have been assigned the exciting task of developing a Stock Price Prediction model for Kotak Mahindra. This project aims to leverage machine learning techniques to analyze historical stock market data and build a predictive model that can forecast future stock prices for Kotak Mahindra. The primary objective of this project is to develop a robust machine learning model that can accurately forecast the future stock prices of Kotak Mahindra Bank based on historical market data.

Key Project Steps:

1. Data Collection: The first step in your project will be to gather historical stock market data for Kotak Mahindra Bank. This dataset will likely include daily or intraday information about the stock's opening price, closing price, high and low prices, trading volume, and other relevant financial indicators.

2. Data Preprocessing: Once you have collected the data, you will perform data preprocessing to clean and prepare it for analysis. This step involves handling missing values, removing duplicates, and dealing with any outliers or noise in the data. Proper data preprocessing is essential for building accurate and reliable predictive models.

3. Exploratory Data Analysis (EDA): After preprocessing the data, you will conduct an Exploratory Data Analysis (EDA) to gain insights into the dataset. Visualization techniques and statistical summaries will be used to identify patterns, trends, and relationships between different variables. EDA will help you understand the data better and guide feature selection and engineering.

4. Feature Engineering: Feature engineering is a crucial step in stock price prediction. You will extract or create relevant features from the existing dataset to provide meaningful input to the machine learning algorithms. Lagged features, technical indicators, and market sentiment features are some examples of potential features that might enhance the model's predictive power.

5. Model Selection: The next step is to choose suitable machine learning algorithms for the prediction task. Commonly used models for time series forecasting include Linear Regression, Support Vector Regression, Random Forest Regression, Gradient Boosting Regression, and Long Short-Term Memory (LSTM) networks. You will explore these models and select the most appropriate one based on their performance during evaluation.

6. Model Training and Evaluation: Once the model is selected, you will train it on the preprocessed data and evaluate its performance using appropriate evaluation metrics such as Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), or R-squared. The model will be tested on a validation set to assess its generalization capability.

7. Hyperparameter Tuning: Fine-tune the hyperparameters of the selected model to optimize its performance. This process involves adjusting the model's parameters to achieve better accuracy and robustness.

8. Prediction: After successfully training and tuning the model, you will deploy it to predict the future stock prices of Kotak Mahindra Bank. The predictions will be interpreted to understand how the model responds to various market conditions.

The successful completion of this project will demonstrate your data science skills, including data preprocessing, exploratory data analysis, feature engineering, machine learning modeling, and predictive analysis. It will also provide valuable insights into the potential performance of stock price prediction models, which can be applied in financial decision-making and investment strategies.

#### Import Required Libraries

In [3]:
# Importing the required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import yfinance as yf

#### Data collection and Understanding

In [16]:
# loading the kotak bank dataset from the yfinance library
df = yf.download('KOTAKBANK.NS,', start = '2019-01-01', end='2022-12-31', progress = False)

In [19]:
# top 5 rows of dataset
df.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-01-01,1254.0,1256.0,1242.25,1250.449951,1247.377319,1202580
2019-01-02,1247.900024,1248.25,1227.0,1240.599976,1237.551636,1713677
2019-01-03,1240.0,1251.0,1227.650024,1235.25,1232.2146,1825888
2019-01-04,1238.0,1252.0,1230.0,1247.949951,1244.883545,1468796
2019-01-07,1250.949951,1254.0,1241.0,1246.599976,1243.536743,1211456


In [20]:
# last 5 rows of dataset
df.tail()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2022-12-26,1821.949951,1826.099976,1794.199951,1813.550049,1812.057983,4746738
2022-12-27,1822.650024,1827.949951,1807.0,1820.900024,1819.401978,1548286
2022-12-28,1824.400024,1831.400024,1815.25,1820.099976,1818.602539,1904268
2022-12-29,1814.400024,1820.0,1801.0,1818.75,1817.253662,2154490
2022-12-30,1825.099976,1838.0,1822.300049,1827.25,1825.746704,2692688


In [21]:
# Describe the dataset
df.describe()

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
count,990.0,990.0,990.0,990.0,990.0,990.0
mean,1654.308181,1674.052877,1632.254901,1653.633282,1650.796571,3728684.0
std,245.176533,245.843011,245.210868,244.899482,244.922211,4274852.0
min,1095.050049,1155.0,1001.0,1098.25,1096.141968,197609.0
25%,1436.75,1466.050049,1408.250031,1443.237549,1440.467224,2081424.0
50%,1722.950012,1734.200012,1697.724976,1717.450012,1714.578857,2860446.0
75%,1844.5625,1864.0,1820.212494,1844.612518,1841.670746,4257448.0
max,2200.0,2253.0,2176.600098,2210.949951,2207.801758,83859900.0


In [23]:
# dataset info
df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 990 entries, 2019-01-01 to 2022-12-30
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Open       990 non-null    float64
 1   High       990 non-null    float64
 2   Low        990 non-null    float64
 3   Close      990 non-null    float64
 4   Adj Close  990 non-null    float64
 5   Volume     990 non-null    int64  
dtypes: float64(5), int64(1)
memory usage: 54.1 KB


In [25]:
# shape of dataset
df.shape

(990, 6)

In [30]:
# counting the duplicate rows of the dataset
df.duplicated().sum()

0

In [32]:
# counting the missing or null values of the dataset
df.isna().sum()

Open         0
High         0
Low          0
Close        0
Adj Close    0
Volume       0
dtype: int64

From the above analysis we get to know that there are 990 rows and 6 column in our dataset and there are no any null values and duplicated rows present in this dataset. from the dataset information we get to know that the volume column has int datatype and rest of the column has float datatype.