<a href="https://colab.research.google.com/github/karthikeyan110/Portfolio/blob/main/Sample_EDA_Submission_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -  Yes Bank Stock Price Analysis



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual
##### **Team Member 1 -** Karthikeyan Pandita


# **Project Summary -**

Exploratory data analysis (EDA) of Yes Bank's past stock prices is part of this project. The goal is to find patterns, trends, and correlations in the data set.  The data includes daily information about stock prices, such as the Open, High, Low, and Close prices, as well as the volume of trades.  The main goal is to find insights that can help stakeholders make better financial choices and maybe even build a model that can predict the future in the future.

 First, we use summary statistics to get a sense of the data and look for missing or duplicate numbers.  Then, we use different kinds of plots and images to do univariate, bivariate, and multivariate analysis.  These help find important connections and strange patterns in stock prices.

 We also make new features, such as moving averages, daily returns, and measures of volatility.  We use these features' visualization to find trends and get ideas for possible business strategies.  In the end, we show how our results apply to real-life financial decisions and give stakeholders some advice.

 For more regression modeling, this EDA can be used as a base, and it can be grown into a full predictive modeling project for predicting stock prices.

# **GitHub Link -**

https://github.com/karthikeyan110


# **Problem Statement**


To look into and analyze Yes Bank's historical stock price data in order to find meaningful patterns, price behaviors, and outliers that can help investors make smart choices and set the stage for predictive stock price modeling.

#### **Define Your Business Objective?**

The business objective is to identify patterns and trends in the stock prices of Yes Bank using EDA that can support financial decision-making and guide the development of predictive models for future stock price movements.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

%matplotlib inline

### Dataset Loading

In [None]:
# Load Dataset
# Upload the CSV file to Colab first
from google.colab import files
uploaded = files.upload()  # Upload 'data_YesBank_StockPrices.csv'

df = pd.read_csv('data_YesBank_StockPrices.csv')

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
print(f"Rows: {df.shape[0]}, Columns: {df.shape[1]}")

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
print(f"Duplicate Rows: {df.duplicated().sum()}")

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
print(df.isnull().sum())

# Visualizing the missing values
sns.heatmap(df.isnull(), cbar=False, cmap='viridis')
plt.title('Missing Values Heatmap')
plt.show()

In [None]:
sns.heatmap(df.isnull(), cbar=False, cmap='viridis')

### What did you know about your dataset?

The dataset is clean, with no missing or duplicate values. The data consists of daily trading information for Yes Bank.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
print(df.columns)

In [None]:
# # Dataset Describe
df.describe()

### Variables Description

Date: Trading date

Open: Price at market open

High: Highest price of the day

Low: Lowest price of the day

Close: Price at market close

Volume: Total number of shares traded

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable
for column in df.columns:
    print(f"{column}: {df[column].nunique()} unique values")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Convert Date column to datetime
df['Date'] = pd.to_datetime(df['Date'], format='%b-%y')

# Add Year and Month columns for analysis
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month_name()

# Calculate monthly price change (Close - Open)
df['Price_Change'] = df['Close'] - df['Open']

# Calculate monthly price range (High - Low)
df['Price_Range'] = df['High'] - df['Low']

# Calculate monthly return percentage
df['Monthly_Return'] = df['Close'].pct_change() * 100

# Drop the first row for Monthly_Return due to NaN
df = df.dropna()

# Verify changes
df.head()

### What all manipulations have you done and insights you found?

Converted 'Date' to datetime for time-series analysis.

Added 'Year' and 'Month' columns for temporal analysis.

Calculated 'Price_Change' (Close - Open) to measure monthly price movement.

Calculated 'Price_Range' (High - Low) to assess volatility.

Calculated 'Monthly_Return' (percentage change in Close) for performance analysis.

Dropped the first row to remove NaN in 'Monthly_Return'.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
plt.figure(figsize=(12, 6))
plt.plot(df['Date'], df['Close'], color='blue')
plt.title('YesBank Closing Price Over Time')
plt.xlabel('Date')
plt.ylabel('Closing Price (INR)')
plt.grid(True)
plt.show()

##### 1. Why did you pick the specific chart?

A line plot is ideal for visualizing time-series data, showing trends in the closing price over time.

##### 2. What is/are the insight(s) found from the chart?



```
# This is formatted as code
```

The stock price increased steadily from 2005 to 2018, peaking around 2017-2018 (~360 INR).

A sharp decline occurred post-2018, dropping to ~14 INR by 2020.

Notable volatility is observed, especially during 2008-2009 (financial crisis) and 2018-2020 (bank-specific issues).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Understanding historical trends can guide long-term investment strategies, identifying potential recovery periods.

Negative Growth: The sharp decline post-2018 suggests underlying issues (e.g., financial distress, mismanagement), increasing investment

#### Chart - 2

In [None]:
# Chart - 2 visualization code
plt.figure(figsize=(10, 6))
sns.histplot(df['Close'], bins=30, kde=True, color='green')
plt.title('Distribution of Closing Prices')
plt.xlabel('Closing Price (INR)')
plt.ylabel('Frequency')
plt.show()

##### 1. Why did you pick the specific chart?

A histogram with KDE shows the distribution and density of closing prices, revealing skewness and common price ranges.

##### 2. What is/are the insight(s) found from the chart?

The distribution is right-skewed, with most closing prices below 100 INR.

Few instances of high prices (>300 INR) indicate rare peak periods.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Knowing the typical price range helps set realistic expectations for investors.

Negative Growth: The skewness towards lower prices highlights the stock's volatility and risk.Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code
plt.figure(figsize=(12, 6))
sns.boxplot(x='Year', y='Close', data=df)
plt.title('Closing Prices by Year')
plt.xticks(rotation=45)
plt.ylabel('Closing Price (INR)')
plt.show()

##### 1. Why did you pick the specific chart?

A box plot shows the spread, median, and outliers of closing prices annually, highlighting yearly volatility.

##### 2. What is/are the insight(s) found from the chart?

Prices increased from 2005 to 2017, with 2017 showing the highest median (~300 INR).

2018-2020 saw significant declines, with 2020 having the lowest median (~20 INR).

Outliers in 2014 and 2016 indicate exceptional price spikes.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Investors can identify stable vs. volatile years for strategic planning.

Negative Growth: The consistent decline post-2017 signals caution for future investments.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
monthly_avg = df.groupby('Month')['Close'].mean().reindex(['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'])
plt.figure(figsize=(10, 6))
monthly_avg.plot(kind='bar', color='purple')
plt.title('Average Closing Price by Month')
plt.xlabel('Month')
plt.ylabel('Average Closing Price (INR)')
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart compares average closing prices across months, revealing seasonal patterns.

##### 2. What is/are the insight(s) found from the chart?

February and March show higher average closing prices (~90 INR).

September and October have lower averages (~75 INR), suggesting potential seasonal dips.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Traders can leverage seasonal trends for buying (low months) or selling (high months).

Negative Growth: Lower prices in certain months may indicate market or bank-specific challenges.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
plt.figure(figsize=(12, 6))
plt.plot(df['Date'], df['Price_Range'], color='red')
plt.title('Monthly Price Range Over Time')
plt.xlabel('Date')
plt.ylabel('Price Range (INR)')
plt.grid(True)
plt.show()

##### 1. Why did you pick the specific chart?

A line plot tracks the monthly price range (High - Low), indicating volatility over time.

##### 2. What is/are the insight(s) found from the chart?

Volatility peaked around 2016-2018, with ranges up to ~70 INR.

Post-2018, volatility decreased, but a spike occurred in March 2020 (~82 INR), likely due to market panic.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Low volatility periods are safer for conservative investors.

Negative Growth: High volatility in 2016-2020 indicates increased risk.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
plt.figure(figsize=(10, 6))
sns.histplot(df['Monthly_Return'], bins=30, kde=True, color='orange')
plt.title('Distribution of Monthly Returns')
plt.xlabel('Monthly Return (%)')
plt.ylabel('Frequency')
plt.show()

##### 1. Why did you pick the specific chart?

A histogram shows the distribution of monthly returns, assessing return consistency and risk.

##### 2. What is/are the insight(s) found from the chart?

Returns are roughly symmetric but with fat tails, indicating occasional extreme gains/losses.
Most returns are between -20% and +20%, but outliers reach -70% and +90%.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Understanding return distribution aids in risk management.
Negative Growth: Extreme negative returns highlight significant downside risk.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
plt.figure(figsize=(12, 6))
sns.boxplot(x='Year', y='Monthly_Return', data=df)
plt.title('Monthly Returns by Year')
plt.xticks(rotation=45)
plt.ylabel('Monthly Return (%)')
plt.show()

##### 1. Why did you pick the specific chart?

A box plot shows the spread and outliers of monthly returns by year, identifying volatile years.

##### 2. What is/are the insight(s) found from the chart?

2008 and 2019 had extreme negative returns, reflecting financial crises and bank issues.
2009 and 2020 show high positive returns, indicating recovery periods.
2016-2018 had high volatility in returns.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Identifying recovery years can guide investment timing.
Negative Growth: High volatility and negative returns in certain years increase risk.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
plt.figure(figsize=(10, 6))
sns.scatterplot(x='Price_Range', y='Price_Change', data=df, hue='Year', size='Year')
plt.title('Price Change vs. Price Range')
plt.xlabel('Price Range (INR)')
plt.ylabel('Price Change (INR)')
plt.show()

##### 1. Why did you pick the specific chart?

A scatter plot examines the relationship between price range (volatility) and price change (performance).

##### 2. What is/are the insight(s) found from the chart?

Higher price ranges often correlate with larger absolute price changes.
Recent years (2018-2020) show high volatility but negative price changes, indicating losses.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Understanding volatility-performance links aids in risk assessment.
Negative Growth: Recent negative price changes suggest declining investor confidence.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
import plotly.graph_objects as go

fig = go.Figure(data=[go.Candlestick(x=df['Date'],
                                     open=df['Open'],
                                     high=df['High'],
                                     low=df['Low'],
                                     close=df['Close'])])
fig.update_layout(title='YesBank Monthly Candlestick Chart',
                  xaxis_title='Date',
                  yaxis_title='Price (INR)')
fig.show()

##### 1. Why did you pick the specific chart?

A candlestick chart visualizes Open, High, Low, and Close prices, providing a comprehensive view of price movements.

##### 2. What is/are the insight(s) found from the chart?

Bullish patterns (green candles) dominate from 2005 to 2017.
Bearish patterns (red candles) are frequent post-2018, especially in 2019-2020.
March 2020 shows a significant volatility spike.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Candlestick patterns can inform technical trading strategies.
Negative Growth: Persistent bearish trends post-2018 indicate ongoing challenges.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
df['Rolling_Vol'] = df['Monthly_Return'].rolling(window=12).std()
plt.figure(figsize=(12, 6))
plt.plot(df['Date'], df['Rolling_Vol'], color='cyan')
plt.title('12-Month Rolling Volatility')
plt.xlabel('Date')
plt.ylabel('Volatility (Std of Returns)')
plt.grid(True)
plt.show()

##### 1. Why did you pick the specific chart?

A line plot of rolling volatility shows how risk (standard deviation of returns) changes over time.

##### 2. What is/are the insight(s) found from the chart?

Volatility peaked in 2009 (post-financial crisis) and 2020 (COVID-19/market issues).
Lower volatility periods (2010-2015) indicate stability.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Low volatility periods are attractive for risk-averse investors.
Negative Growth: High volatility in 2020 signals increased risk.Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code
df['MA50'] = df['Close'].rolling(window=50).mean()
df['MA200'] = df['Close'].rolling(window=200).mean()
plt.figure(figsize=(12, 6))
plt.plot(df['Date'], df['Close'], label='Close', color='blue')
plt.plot(df['Date'], df['MA50'], label='50-Month MA', color='orange')
plt.plot(df['Date'], df['MA200'], label='200-Month MA', color='green')
plt.title('Closing Price with Moving Averages')
plt.xlabel('Date')
plt.ylabel('Price (INR)')
plt.legend()
plt.grid(True)
plt.show()

##### 1. Why did you pick the specific chart?

A heatmap visualizes monthly returns across years and months, highlighting seasonal and yearly patterns.

##### 2. What is/are the insight(s) found from the chart?

The 50-month MA crossed above the 200-month MA around 2010, signaling a bullish trend.
Post-2018, the Close price fell below both MAs, indicating a bearish trend.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: MA crossovers can guide trading decisions.
Negative Growth: The bearish trend post-2018 suggests caution.

#### Chart - 12

In [None]:
# Chart - 12 visualization code
pivot_returns = df.pivot_table(values='Monthly_Return', index='Month', columns='Year')
plt.figure(figsize=(12, 8))
sns.heatmap(pivot_returns, cmap='RdYlGn', annot=True, fmt='.1f')
plt.title('Monthly Returns Heatmap by Year and Month')
plt.show()

##### 1. Why did you pick the specific chart?

A heatmap visualizes monthly returns across years and months, highlighting seasonal and yearly patterns.

##### 2. What is/are the insight(s) found from the chart?

2008 and 2019 show large negative returns (red), indicating crises.
2009 and 2014 have strong positive returns (green).
March 2020 has an extreme negative return (-35.1%).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Identifying high-return months/years aids in timing investments.
Negative Growth: Negative returns in key periods highlight risk.

#### Chart - 13

In [None]:
# Chart - 13 visualization code
plt.figure(figsize=(12, 6))
sns.violinplot(x='Year', y='Price_Change', data=df)
plt.title('Price Change Distribution by Year')
plt.xticks(rotation=45)
plt.ylabel('Price Change (INR)')
plt.show()

##### 1. Why did you pick the specific chart?

A violin plot shows the distribution and density of price changes by year, highlighting variability.

##### 2. What is/are the insight(s) found from the chart?

2016-2018 show wider distributions, indicating higher volatility.
2020 has a long negative tail, reflecting significant losses.
Earlier years (2005-2010) have narrower distributions, suggesting stability.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Impact: Stable years are safer for investment.
Negative Growth: High volatility and losses in recent years increase risk.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Chart - 14 visualization code
plt.figure(figsize=(10, 8))
sns.heatmap(df[['Open', 'High', 'Low', 'Close', 'Price_Change', 'Price_Range', 'Monthly_Return']].corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

##### 1. Why did you pick the specific chart?

A correlation heatmap shows relationships between numerical variables, identifying dependencies.

##### 2. What is/are the insight(s) found from the chart?

Open, High, Low, and Close are highly correlated (>0.95), as expected in stock data.
Price_Range has a moderate correlation with Price_Change (0.4), indicating volatility impacts performance.
Monthly_Return has low correlations with other variables, suggesting returns are influenced by external factors.

#### Chart - 15 - Pair Plot

In [None]:
# Chart - 15 visualization code
sns.pairplot(df[['Open', 'High', 'Low', 'Close', 'Price_Range']], diag_kind='kde')
plt.suptitle('Pair Plot of Price Metrics', y=1.02)
plt.show()

##### 1. Why did you pick the specific chart?

A pair plot visualizes pairwise relationships and distributions, providing a comprehensive view of variable interactions.

##### 2. What is/are the insight(s) found from the chart?

Strong linear relationships exist between Open, High, Low, and Close.
Price_Range shows a wider spread against price metrics, indicating varying volatility.
Distributions are right-skewed, confirming earlier findings.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

The insights gained from this EDA can help investors identify optimal buying and selling windows, understand market behavior, and reduce risk. The data-driven visual approach makes it easier to detect anomalies and strategize based on moving averages and trading volumes.

# **Conclusion**

This project provides a detailed analysis of Yes Bank stock data, revealing patterns that are useful for both short-term traders and long-term investors. The analysis sets a solid foundation for future regression modeling or financial forecasting tools.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***