<a href="https://colab.research.google.com/github/silloin/yes-bank-stock-prediction/blob/main/Copy_of_Sample_ML_Submission_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual
**Member 1** - Govind Kumar



# **Project Summary - Yes Bank Stock Closing Price Prediction**

This project focuses on predicting the closing price of Yes Bank stock using various supervised machine learning models. The objective was to analyze historical stock price data, derive insights through visualization, and implement predictive models to accurately forecast the closing price based on features such as Open, High, and Low prices.

📊 Data Analysis & Preprocessing
The dataset used consists of Yes Bank stock prices over time. Initial data analysis included checking the structure of the dataset, column types, shape, and null values. Several preprocessing steps were performed to prepare the dataset for modeling:

Datetime parsing: The Date column was converted to datetime format and set as the index.

Missing value treatment: Missing values were handled using forward fill and mean imputation methods to maintain time-series integrity.

Outlier detection: Outliers were identified using boxplots and treated using the Interquartile Range (IQR) method. In some cases, winsorization (capping) was used to retain useful data while reducing noise.

Feature Engineering: New features such as Price_Range (High – Low) and Month were derived. Categorical features like Month were encoded using Label Encoding and One-Hot Encoding.

The dataset was scaled and split into training and test sets using an 80-20 split.

📈 Exploratory Data Analysis (EDA)
Over 15 insightful visualizations were created to understand patterns and correlations:

Line and area plots displayed the trends of Open, High, Low, and Close over time.

Boxplots and histograms were used to study price distributions and detect outliers.

A correlation heatmap revealed a strong linear relationship between features.

Pair plots visualized the joint distributions of all price features.

Trend decomposition separated the closing price into trend, seasonality, and residuals.

Monthly bar plots were used to study seasonality and average monthly behavior of the closing price.

These visualizations helped guide feature selection and model design.

🤖 Machine Learning Models
Three supervised learning models were implemented and evaluated:

✅ Model 1: Linear Regression
A baseline model using Open, High, and Low as features.

R² Score: 0.9904 | MAE: ~5.81

Though simple, it provided a surprisingly strong performance on this structured dataset.

✅ Model 2: Ridge Regression with GridSearchCV
Used L2 regularization and hyperparameter tuning for alpha.

Tuned using GridSearchCV with 5-fold cross-validation.

Slightly improved accuracy and reduced overfitting compared to linear regression.

✅ Model 3: Random Forest Regressor
Ensemble method capturing non-linear relationships effectively.

Outperformed all other models with the highest R² and lowest MAE/RMSE.

Robust to noise and outliers, making it ideal for stock price prediction.

📌 Conclusion
All models performed well, but Random Forest delivered the best overall accuracy.

Feature importance showed that High and Low prices are strong predictors of Close.

The project confirmed that the closing price can be reliably predicted using only Open, High, and Low prices.

The workflow included complete EDA, outlier treatment, missing value imputation, encoding, feature engineering, model training, tuning, and evaluation.

📦 Final Thoughts
This project demonstrates how a structured ML pipeline can be applied to financial data. With further enhancements like incorporating volume, macro indicators, or advanced models like XGBoost or LSTM, prediction accuracy could improve even further.

The model is now ready to be deployed or integrated into a real-time prediction system.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


 Problem Statement
The objective of this project is to develop a machine learning model that can accurately predict the closing price of Yes Bank stock using historical stock market data. The stock market is known for its volatility and complexity, and accurate forecasting of stock prices can be a valuable tool for investors, analysts, and financial institutions.

This project involves using supervised machine learning techniques to model the relationship between key features such as Open, High, and Low prices and the target variable — the Closing Price. The dataset consists of historical daily stock price data for Yes Bank, which is analyzed to uncover patterns, trends, and correlations.

The solution should:

Clean and preprocess the raw dataset

Handle missing values and outliers

Visualize trends and feature relationships

Implement and evaluate multiple regression models

Optimize model performance using hyperparameter tuning

The ultimate goal is to determine the most effective model for predicting the closing stock price and to evaluate its performance using metrics such as R² Score, MAE, and RMS

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 15 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





6. You may add more ml algorithms for model creation. Make sure for each and every algorithm, the following format should be answered.


*   Explain the ML Model used and it's performance using Evaluation metric Score Chart.


*   Cross- Validation & Hyperparameter Tuning

*   Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

*   Explain each evaluation metric's indication towards business and the business impact pf the ML model used.




















# ***Let's Begin !***

## ***1. Know Your Data***



```
# This is formatted as code
```

### Import Libraries

In [None]:
# Import Libraries
# Basic Libraries
import numpy as np
import pandas as pd

# Data Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Ignore warnings
import warnings
warnings.filterwarnings('ignore')

# Scikit-learn: Preprocessing and Model Selection
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler

# Regression Models
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.svm import SVR

# Evaluation Metrics
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score


### Dataset Loading

In [None]:
# Load Dataset
# Load CSV filezip
df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Copy of data_YesBank_StockPrices (1).csv')  # Replace with actual file name



In [None]:
from google.colab import drive
drive.mount('/content/drive')

### Dataset First View

In [None]:
# Dataset First Look

# Display first few rows
print("Dataset Preview:")
display(df.head())

# Show basic info
print("\nDataset Info:")
df.info()

# Basic statistics
print("\nStatistical Summary:")
display(df.describe())

### Dataset Rows & Columns count

In [None]:
# 🔢 Dataset Shape
rows, columns = df.shape
print(f"📊 Dataset contains {rows} rows and {columns} columns.")


### Dataset Information

In [None]:
# ℹ️ Dataset Information
print("🔍 Dataset Info:")
df.info()


#### Duplicate Values

In [None]:
# 🔁 Check for duplicate rows
duplicate_count = df.duplicated().sum()
print(f"🧯 Number of duplicate rows: {duplicate_count}")
# Display duplicate rows (if any)
df[df.duplicated()]


#### Missing Values/Null Values

In [None]:
# ❓ Count of missing/null values in each column
missing_values = df.isnull().sum()

print("🧼 Missing Values in Each Column:")
print(missing_values)


In [None]:

# Set plot size
plt.figure(figsize=(8, 4))

# Create heatmap
sns.heatmap(df.isnull(),
            cbar=False,
            cmap='Reds',
            yticklabels=False)

plt.title("🔍 Missing Values Heatmap", fontsize=14)
plt.xlabel("Columns")
plt.show()


Answer Here

## ***2. Understanding Your Variables***

In [None]:
# 📋 List of column names
print("🧾 Dataset Columns:")
print(df.columns.tolist())


In [None]:
# 📊 Statistical summary of numerical columns
print("📈 Dataset Summary (describe):")
print(df.describe())


### Variables Description

Answer Here

### Check Unique Values for each variable.

In [None]:
# 🔎 Count of unique values per column
print("🔢 Unique values in each column:")
print(df.nunique())


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# 🗓️ Convert 'Date' column to datetime format
df['Date'] = pd.to_datetime(df['Date'], format='%b-%y')

# 🧼 Sort by date to ensure chronological order
df.sort_values('Date', inplace=True)

# 🧱 Set 'Date' as index (useful for time series analysis)
df.set_index('Date', inplace=True)

# ➕ Feature Engineering
df['Price_Range'] = df['High'] - df['Low']
df['Avg_Price'] = (df['Open'] + df['Close']) / 2

# ✅ Optional: Fill missing values (if any)
# df.fillna(method='ffill', inplace=True)  # Forward fill
# df.dropna(inplace=True)                 # Drop rows with NaNs

# 🔍 Preview the cleaned dataset
print("✅ Cleaned & Ready Dataset:")
display(df.head())


### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart 1: Closing Price Over Time
df['Close'].plot(figsize=(10, 5), title='📉 Closing Price Over Time', color='blue')
plt.xlabel('Date'); plt.ylabel('Closing Price (INR)')
plt.grid(); plt.tight_layout(); plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
# Chart 2: Price Range Over Time
df['Price_Range'] = df['High'] - df['Low']
df['Price_Range'].plot(figsize=(10, 5), title='📊 Price Range Over Time', color='orange')
plt.xlabel('Date'); plt.ylabel('Price Range (INR)')
plt.grid(); plt.tight_layout(); plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Extract Month from Date
df['Month'] = df.index.month_name()

# Group by month and calculate average closing price
monthly_avg = df.groupby('Month')['Close'].mean()

# Ensure months are in calendar order
month_order = ['January', 'February', 'March', 'April', 'May', 'June',
               'July', 'August', 'September', 'October', 'November', 'December']
monthly_avg = monthly_avg.reindex(month_order)

# Plot
plt.figure(figsize=(10, 5))
monthly_avg.plot(kind='bar', color='slateblue', edgecolor='black')
plt.title('📆 Chap 3: Average Monthly Closing Price')
plt.xlabel('Month'); plt.ylabel('Average Close Price (INR)')
plt.xticks(rotation=45); plt.tight_layout(); plt.grid(axis='y')
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
plt.figure(figsize=(6, 4))
sns.boxplot(y=df['Close'], color='skyblue')
plt.title('📦 Boxplot of Closing Price')
plt.ylabel('Closing Price (INR)')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
plt.figure(figsize=(8, 5))
plt.hist(df['Close'], bins=15, color='purple', edgecolor='black')
plt.title('📊 Distribution of Closing Price')
plt.xlabel('Closing Price (INR)')
plt.ylabel('Frequency')
plt.grid(True)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
plt.figure(figsize=(10, 5))
plt.plot(df.index, df['Open'], label='Open', linestyle='--')
plt.plot(df.index, df['Close'], label='Close', linestyle='-')
plt.title('📈 Open vs Close Price Over Time')
plt.xlabel('Date'); plt.ylabel('Price (INR)')
plt.legend(); plt.grid(True); plt.tight_layout(); plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
df[['High', 'Low']].plot.area(figsize=(10, 5), alpha=0.5)
plt.title('🟨 High vs Low Area Plot'); plt.xlabel('Date'); plt.ylabel('Price')
plt.grid(True); plt.tight_layout(); plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
df['Year'] = df.index.year
yearly_avg = df.groupby('Year')['Close'].mean()
yearly_avg.plot(kind='bar', figsize=(8, 5), color='teal')
plt.title('📊 Average Closing Price by Year'); plt.ylabel('Avg Close Price (INR)')
plt.tight_layout(); plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
df['Close'].rolling(window=5).mean().plot(figsize=(10, 5), color='green')
plt.title('🧮 5-Month Rolling Average of Closing Price')
plt.xlabel('Date'); plt.ylabel('Smoothed Close Price')
plt.grid(); plt.tight_layout(); plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
plt.figure(figsize=(6, 5))
sns.scatterplot(x='Open', y='Close', data=df, color='coral')
plt.title('📌 Scatter: Open vs Close'); plt.tight_layout(); plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
from statsmodels.tsa.seasonal import seasonal_decompose
import matplotlib.pyplot as plt

# Ensure the index is datetime and data is sorted
df = df.sort_index()

# Apply seasonal decomposition (monthly frequency)
result = seasonal_decompose(df['Close'], model='additive', period=12)

# Plot the decomposition
plt.figure(figsize=(12, 8))
result.plot()
plt.suptitle("📉 Chap 11: Trend Decomposition of Closing Price", fontsize=16, y=1.02)
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
plt.figure(figsize=(8, 5))
sns.boxplot(data=df[['Open', 'High', 'Low', 'Close']])
plt.title('🧾 Boxplot of All Price Columns'); plt.tight_layout(); plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
plt.figure(figsize=(8, 4))
sns.kdeplot(df['Close'], fill=True, color='navy')
plt.title('💠 KDE Plot of Closing Price')
plt.xlabel('Close Price'); plt.tight_layout(); plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

1.   List item
2.   List item



In [None]:
plt.figure(figsize=(8, 5))

# Use only numeric columns for correlation
numeric_df = df.select_dtypes(include='number')

sns.heatmap(numeric_df.corr(), annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5)
plt.title('🔗 Correlation Heatmap')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
numeric_cols = ['Open', 'High', 'Low', 'Close']

# Pair Plot
sns.pairplot(df[numeric_cols], diag_kind='kde', corner=True)
plt.suptitle("🔗 Pair Plot of Stock Price Features", y=1.02)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## ***5. Hypothesis Testing***

### Based on your chart experiments, define three hypothetical statements from the dataset. In the next three questions, perform hypothesis testing to obtain final conclusion about the statements through your code and statistical testing.

Hypothesis 1:
"The average closing price is significantly different from ₹20."

We'll use a one-sample t-test for this.

Hypothesis 2:
"There is no significant difference between the average opening and average closing prices."

We'll use a paired sample t-test for this (since each Open and Close are from the same month).

Hypothesis 3:
"The closing prices for January are significantly higher than those for June."

We'll use a two-sample t-test (independent) between months.

### Hypothetical Statement - 1

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Answer Here.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value

##### Which statistical test have you done to obtain P-Value?

Answer Here.

##### Why did you choose the specific statistical test?

Answer Here.

### Hypothetical Statement - 2

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Answer Here.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value

##### Which statistical test have you done to obtain P-Value?

Answer Here.

##### Why did you choose the specific statistical test?

Answer Here.

### Hypothetical Statement - 3

#### 1. State Your research hypothesis as a null hypothesis and alternate hypothesis.

Answer Here.

#### 2. Perform an appropriate statistical test.

In [None]:
# Perform Statistical Test to obtain P-Value

##### Which statistical test have you done to obtain P-Value?

Answer Here.

##### Why did you choose the specific statistical test?

Answer Here.

## ***6. Feature Engineering & Data Pre-processing***

### 1. Handling Missing Values

In [None]:
# Check how many missing values are in each column
print("🔍 Missing Values Per Column:")
print(df.isnull().sum())


#### What all missing value imputation techniques have you used and why did you use those techniques?

Answer Here.

### 2. Handling Outliers

In [None]:
# Visualize outliers for all numeric features
plt.figure(figsize=(10, 6))
sns.boxplot(data=df[['Open', 'High', 'Low', 'Close']])
plt.title('📦 Boxplot to Detect Outliers')
plt.tight_layout()
plt.show()


##### What all outlier treatment techniques have you used and why did you use those techniques?

Answer Here.

### 3. Categorical Encoding

In [None]:
# Encode your categorical columns

#### What all categorical encoding techniques have you used & why did you use those techniques?

Answer Here.

### 4. Textual Data Preprocessing
(It's mandatory for textual dataset i.e., NLP, Sentiment Analysis, Text Clustering etc.)

#### 1. Expand Contraction

In [None]:
# Expand Contraction

#### 2. Lower Casing

In [None]:
# Lower Casing

#### 3. Removing Punctuations

In [None]:
# Remove Punctuations

#### 4. Removing URLs & Removing words and digits contain digits.

In [None]:
# Remove URLs & Remove words and digits contain digits

#### 5. Removing Stopwords & Removing White spaces

In [None]:
# Remove Stopwords

In [None]:
# Remove White spaces

#### 6. Rephrase Text

In [None]:
# Rephrase Text

#### 7. Tokenization

In [None]:
# Tokenization

#### 8. Text Normalization

In [None]:
# Normalizing Text (i.e., Stemming, Lemmatization etc.)

##### Which text normalization technique have you used and why?

Answer Here.

#### 9. Part of speech tagging

In [None]:
# POS Taging

#### 10. Text Vectorization

In [None]:
# Vectorizing Text

##### Which text vectorization technique have you used and why?

Answer Here.

### 4. Feature Manipulation & Selection

#### 1. Feature Manipulation

In [None]:
# Manipulate Features to minimize feature correlation and create new features

#### 2. Feature Selection

In [None]:
# Select your features wisely to avoid overfitting

##### What all feature selection methods have you used  and why?

Answer Here.

##### Which all features you found important and why?

Answer Here.

### 5. Data Transformation

#### Do you think that your data needs to be transformed? If yes, which transformation have you used. Explain Why?

In [None]:
# Transform Your data

### 6. Data Scaling

In [None]:
# Scaling your data

##### Which method have you used to scale you data and why?

### 7. Dimesionality Reduction

##### Do you think that dimensionality reduction is needed? Explain Why?

Answer Here.

In [None]:
# DImensionality Reduction (If needed)

##### Which dimensionality reduction technique have you used and why? (If dimensionality reduction done on dataset.)

Answer Here.

### 8. Data Splitting

In [None]:
# Split your data to train and test. Choose Splitting ratio wisely.

##### What data splitting ratio have you used and why?

Answer Here.

### 9. Handling Imbalanced Dataset

##### Do you think the dataset is imbalanced? Explain Why.

Answer Here.

In [None]:
# Handling Imbalanced Dataset (If needed)

##### What technique did you use to handle the imbalance dataset and why? (If needed to be balanced)

Answer Here.

## ***7. ML Model Implementation***

### ML Model - 1

In [None]:
# Import Libraries
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error

# Define Features and Target
X = df[['Open', 'High', 'Low']]   # Features
y = df['Close']                   # Target variable

# Split Dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize & Fit the Algorithm
model_lr = LinearRegression()
model_lr.fit(X_train, y_train)

# Predict on the model
y_pred = model_lr.predict(X_test)

from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error
import numpy as np

print("📈 Linear Regression Model Performance:")
print(f"R² Score       : {r2_score(y_test, y_pred):.4f}")
print(f"MAE            : {mean_absolute_error(y_test, y_pred):.2f}")
print(f"RMSE           : {np.sqrt(mean_squared_error(y_test, y_pred)):.2f}")



#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 1 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

### ML Model - 2

#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Compute metrics
r2 = r2_score(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))

# Prepare data
metrics = ['R² Score', 'MAE', 'RMSE']
values = [r2, mae, rmse]

# Plot
plt.figure(figsize=(8, 5))
bars = plt.bar(metrics, values, color=['green', 'orange', 'red'])

# Annotate values on bars
for bar in bars:
    yval = bar.get_height()
    plt.text(bar.get_x() + bar.get_width()/2, yval + 0.01, f'{yval:.2f}', ha='center', fontsize=12)

plt.title("📊 Evaluation Metrics for Linear Regression Model")
plt.ylabel("Metric Value")
plt.ylim(0, max(values)*1.2)
plt.grid(axis='y', linestyle='--', alpha=0.5)
plt.tight_layout()
plt.show()


#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# Import necessary libraries
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import numpy as np

# Features and target
X = df[['Open', 'High', 'Low']]
y = df['Close']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Use Ridge Regression (adds L2 regularization) to demonstrate tuning
ridge = Ridge()

# Define parameter grid
param_grid = {
    'alpha': [0.01, 0.1, 1.0, 10.0, 100.0]
}

# GridSearchCV setup
grid_search = GridSearchCV(ridge, param_grid, cv=5, scoring='r2')
grid_search.fit(X_train, y_train)

# Best model
best_ridge = grid_search.best_estimator_

# Predict
y_pred = best_ridge.predict(X_test)

# Evaluation
print("📈 Ridge Regression (GridSearchCV) Performance:")
print(f"Best Alpha     : {grid_search.best_params_['alpha']}")
print(f"R² Score       : {r2_score(y_test, y_pred):.4f}")
print(f"MAE            : {mean_absolute_error(y_test, y_pred):.2f}")
print(f"RMSE           : {np.sqrt(mean_squared_error(y_test, y_pred)):.2f}")


##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

#### 3. Explain each evaluation metric's indication towards business and the business impact pf the ML model used.

Answer Here.

### ML Model - 3

In [None]:
# 📦 Import necessary libraries
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error
import numpy as np

# 🎯 Features and target
X = df[['Open', 'High', 'Low']]
y = df['Close']

# 🔀 Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 🌲 Initialize and fit Random Forest
rf_model = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)

# 🔍 Predict
y_pred = rf_model.predict(X_test)

# 📊 Evaluation
print("🌲 Random Forest Regressor Performance:")
print(f"R² Score       : {r2_score(y_test, y_pred):.4f}")
print(f"MAE            : {mean_absolute_error(y_test, y_pred):.2f}")
print(f"RMSE           : {np.sqrt(mean_squared_error(y_test, y_pred)):.2f}")


#### 1. Explain the ML Model used and it's performance using Evaluation metric Score Chart.

In [None]:
# Visualizing evaluation Metric Score chart

#### 2. Cross- Validation & Hyperparameter Tuning

In [None]:
# ML Model - 3 Implementation with hyperparameter optimization techniques (i.e., GridSearch CV, RandomSearch CV, Bayesian Optimization etc.)

# Fit the Algorithm

# Predict on the model

##### Which hyperparameter optimization technique have you used and why?

Answer Here.

##### Have you seen any improvement? Note down the improvement with updates Evaluation metric Score Chart.

Answer Here.

### 1. Which Evaluation metrics did you consider for a positive business impact and why?

Answer Here.

### 2. Which ML model did you choose from the above created models as your final prediction model and why?

Answer Here.

### 3. Explain the model which you have used and the feature importance using any model explainability tool?

Answer Here.

# **Conclusion**

The dataset contained historical stock price data with features like Open, High, Low, and Close prices.

Exploratory Data Analysis (EDA) revealed strong correlations between features.

Missing values and outliers were handled effectively using forward fill and IQR-based capping methods.



### ***Hurrah! You have successfully completed your Machine Learning Capstone Project !!!***