# Analysis of Stocks - Daily Returns

##  Objective 

### The key statistical concepts that will be explored this notebook are:

- Probability Distribution
- Normal Distribution
- Significance of mean and standard deviation
- Confidence Interval
- Outlier events

### Datasets 

We will use the following real world data for this notebook.

- Daily stock prices of ICICI Bank and Yes Bank
- For the period of 01 October 2022 to 07 March 2023
- The dails stock price data can be downloaded from BSE India Site

https://www.bseindia.com/markets/equity/EQReports/StockPrcHistori.html?flag=0

#### We will explore some of the basic insights like:

- What is the value at risk (VaR)

Value at risk (VaR) is a measure of the risk of loss for investments. It estimates how much a set of investments might lose (with a given probability), given normal market conditions, in a set time period such as a day. (https://en.wikipedia.org/wiki/Value_at_risk)

- What is the probability of making a certain percetage of profit or loss if invested in a stock for a specified duration of time?

- How basic statistical analysis helps answer the above questions?

## Read the dataset

- Explore the pandas, seaborn documentation page

    - [Pandas Home Page](https://pandas.pydata.org/)
    - [Matplotlib Home Page](https://matplotlib.org/)
    - [Seaborn Home Page](https://seaborn.pydata.org/)


In [None]:
import pandas as pd
import matplotlib as mplot
import matplotlib.pyplot as plt
import seaborn as sn

### Check the library versions

In [None]:
pd.__version__

In [None]:
mplot.__version__

In [None]:
sn.__version__

### Load ICICI Bank Data

- Read different kinds of data formats

https://pandas.pydata.org/docs/user_guide/io.html

In [None]:
icici_df = pd.read_csv( 'ICICI.csv', parse_dates=['Date'] )

In [None]:
type(icici_df)

A DataFrame is a data structure that organizes data into a 2-dimensional table of rows and columns, much like a spreadsheet. DataFrames are one of the most common data structures used in modern data analytics because they are a flexible and intuitive way of storing and working with data.

Source: [What is a DataFrame](https://www.databricks.com/glossary/what-are-dataframes#:~:text=What%20is%20a%20DataFrame%3F,storing%20and%20working%20with%20data.)

#### Show few records

In [None]:
icici_df.head( 5 )

#### How many rows and columns?

In [None]:
icici_df.shape

In [None]:
icici_df.info()

### Time Series Data

This data is time series based. It makes sense to index the data based on timestamp.

In [None]:
icici_df = icici_df.set_index(['Date'], drop=True)

In [None]:
icici_df.head(5)

#### Sort the data based on ascending order of timestamp

In [None]:
icici_df.sort_index(ascending = True, inplace=True)

### Slicing and indexing

- How to slice, dice, and get subsets of pandas rows and columns.

In [None]:
icici_df[0:5]

In [None]:
icici_df[-5:]

### Select columns

In [None]:
icici_df = icici_df[['Close Price', 'Open Price']]

In [None]:
icici_df[0:5]

### Load Yes Bank Data

In [None]:
# Read the csv file
yes_df = pd.read_csv( 'Yes.csv', parse_dates=['Date'] )

# Set the time index 
yes_df = yes_df.set_index(['Date'], drop=True)

# Sort the records based on time
yes_df.sort_index(ascending = True, inplace=True)

# Select only Close and Open Price columns for further analysis
yes_df = yes_df[['Close Price', 'Open Price']]

# Print Few Records
yes_df.head( 5 )

### Calculate daily gains


- Calculate daily gain or loss in terms of percentage 

$$ gain = {(Close Price - Open Price) * 100 \over Open Price} $$


In [None]:
icici_df["gain"] = ((icici_df['Close Price'] - icici_df['Open Price']) * 100 /
                    icici_df['Open Price'])

In [None]:
icici_df.head( 5 )

In [None]:
yes_df["gain"] = ((yes_df['Close Price'] - yes_df['Open Price']) * 100 / 
                  yes_df['Open Price'])

In [None]:
yes_df[0:5]

## Plotting Historical Price Trends 

For plotting the price trends we will plot the close price on time scale.
The figure size can be set using figsize

**figsize(width, height)**
    - Width, height in inches.

https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.figure.html

In [None]:
plt.figure(figsize = (12, 6))
plt.plot(icici_df['Close Price'], color = 'r');

In [None]:
plt.figure(figsize = (12, 6))
plt.plot(yes_df['Close Price'], color = 'r');

- From December 2022, the stock prices have fallen and reached the october 2022 levels.

- But our focus will be on daily gains here .e.g how much gain or loss if remain invested for a day.

## Plotting Daily Gains

How much the stock price changes every day. Is there any pattern in the change of price?

In [None]:
plt.figure(figsize = (12, 4))
plt.plot( icici_df.gain, 'r' );

In [None]:
plt.figure(figsize = (12, 4))
plt.plot( yes_df.gain, 'b' );

### Calculating min and max gains

In [None]:
icici_df.gain.min(), icici_df.gain.max()

In [None]:
yes_df.gain.min(), yes_df.gain.max()

### What is stock volatility?
- Volatility is the rate at which the price of a stock increases or decreases over a particular period.

- Higher stock price volatility often means higher risk and helps an investor to estimate the fluctuations that may happen in the future.

Source: [Stock Volatility](https://www.fidelity.com.sg/beginners/your-guide-to-stock-investing/understanding-stock-market-volatility-and-how-it-could-help-you#:~:text=Volatility%20is%20the%20standard%20deviation,said%20to%20have%20high%20volatility.)

### Plotting distribution of daily returns 

- How many times (in terms of frequency) we are observing different values of gain or loss? Do we observe very high gain or loss very frequently or this is a rare event?

- A **histogram** can show the frequency of data items in successive numerical intervals of equal size e.g. 

    - Frequenecy of -2% to -1% loss 
    - Frequenecy of -1% to 0% loss 
    - Frequenecy of 0% to 1% gain 
    - Frequenecy of 1% to 2% gain
    - and so on

In [None]:
plt.hist(icici_df.gain);

- The bin intervals are calculated automatically
- We can create our own bins to make the histogram more readable.
- Create bins from -4.0 to +4.0 with bin size of 1.0 and show frequencey for each bin

In [None]:
plt.hist(icici_df.gain, bins = range(-4, 4, 1));

In [None]:
plt.hist(yes_df.gain, bins = range(-9, 12, 1));

### Plotting Distribution Plots

In the probability distribution curve, the x-axis indicates the possible values and y-axis indicates the probability of that value occuring.

Probability distribution describes all the possible values and likelihoods that a random variable can take within a given range. This range will be bounded between the minimum and maximum possible values.

In [None]:
sn.kdeplot(icici_df.gain);

In [None]:
plt.figure(figsize = (12, 5))
sn.kdeplot(icici_df.gain, label = 'ICICI Bank' );
sn.kdeplot(yes_df.gain, label = 'Yes Bank' );
plt.title("Volatility of Stocks")
plt.legend();

- Yes Bank has higher dispersion than ICICI Bank, which indicates it is higher volatile.
- But can we measure volatility?

## What is Normal Distribution

The normal distribution, also known as the Gaussian distribution, is the most important probability distribution in statistics for independent, random variables. Most people recognize its familiar bell-shaped curve in statistical reports.

- The normal distribution is a continuous probability distribution that is symmetrical around its mean, most of the observations cluster around the central peak, and the probabilities for values further away from the mean taper off equally in both directions. 
- Extreme values in both tails of the distribution are similarly unlikely. While the normal distribution is symmetrical, not all symmetrical distributions are normal

Source: https://statisticsbyjim.com/basics/normal-distribution/

References:

https://en.wikipedia.org/wiki/Normal_distribution

https://courses.lumenlearning.com/math4libarts/chapter/understanding-normal-distribution/

<img src="normal.png" alt="Normal Distribution" width="500"/>

Source: https://en.wikipedia.org/wiki/Normal_distribution

### Calculate Mean, Standard Deviation of Daily Returns for ICICI Bank

The normal distribution is parameterized by two parameters: the mean of the distribution $\mu$ and the variance $\sigma^2$. 

The sample mean of a normal distribution is given by, 

$\bar x = \frac{1}{n}\sum_{i=1}^{n}x_{i}$

Variance is given by, 

$\sigma^2 = \frac{1}{n}\sqrt \sum_{i=1}^{n}(x_{i}-\bar x)^2$. 

And standard deviation is square root of variance and is denoted by $\sigma$.

- In investing, standard deviation is used as an indicator of market volatility and thus of risk. The more unpredictable the price action and the wider the range, the greater the risk.

In [None]:
icici_df.gain.mean()

In [None]:
icici_df.gain.std()

### Confidence Interval

- Confidence interval is a range of values, bounded above and below the mean value.

- It is the probability that a parameter will fall between a set of values for a certain proportion of times. 

- Most often used confidence intervals are either 90% or 95% or 99%.

In [None]:
from scipy import stats

In [None]:
icici_ci_90 = stats.norm.interval(0.90,
                                  loc=icici_df.gain.mean(),
                                  scale=icici_df.gain.std())

In [None]:
icici_ci_90

In [None]:
icici_df[icici_df.gain < icici_ci_90[0]]

### VaR - Value At Risk

- Value at Risk (VaR) is a statistic that is used in risk management to predict the greatest possible losses over a specific time frame.

<img src="var_investopedia.png" alt="Normal Distribution" width="500"/>

Source: https://en.wikipedia.org/wiki/Normal_distribution

- What is the value at risk if invested one lakh?

In [None]:
invest_amt = 100000

In [None]:
invest_amt

In [None]:
icici_ci_90[0]

In [None]:
icici_var = invest_amt * icici_ci_90[0] / 100

In [None]:
icici_var

#### Note: 

- Value at risk for 95% CI is 1930.23 rupees, if invested one lakh rupees for day trading in ICICI Bank.

https://www.moneycontrol.com/news/business/markets/december-21-share-market-live-updates-stock-market-today-december-latest-news-bse-nse-sensex-nifty-covid-coronavirus-9739291.html

### Outlier Events

- Values that are greater than +3 standard deviations from the mean, or less than -3 standard deviations are considered as outliers.

In [None]:
icici_ci_99_7 = stats.norm.interval(0.997,
                                    loc=icici_df.gain.mean(),
                                    scale=icici_df.gain.std())

In [None]:
icici_ci_99_7

In [None]:
icici_df[icici_df.gain > icici_ci_99_7[1]]

### Yes Bank 

In [None]:
yes_df.gain.mean()

In [None]:
yes_df.gain.std()

In [None]:
yes_ci_90 = stats.norm.interval(0.90,
                                loc=yes_df.gain.mean(),
                                scale=yes_df.gain.std())

In [None]:
yes_ci_90

- Yes Bank higher risk of higher loss. But it also provides opportunity for higher gains.

- Volatility is not always a bad thing, as it can sometimes provide entry points from which investors can take advantage.

### What is the probability that the stock will make a loss of 3% or more?

In [None]:
plt.figure(figsize = (12, 5))
sn.kdeplot(yes_df.gain, label = 'Yes Bank' );
plt.legend();

#### CDF - Cummulative Distribution Function

- CDF is the probability that a random variable (let X) will take a value less than or equal to the random variable (X).

- For example: The probability that the stock will make a loss of 3% or more is the sum of all probabilities of the stocks making 3%, 2.9%, 2.8% and so on until highest loss it has encountered historically.

$$\int_{-\infty}^{-3} p(x) \; dx  $$

<img src="cdf.png" alt="Cummulative Distribution Function" width="400"/>

#### ICICI Bank - probability of a loss of 3% or more

In [None]:
stats.norm.cdf( -3.0,
               loc=icici_df.gain.mean(),
               scale=icici_df.gain.std())

#### Yes Bank - probability of a loss of 3% or more

In [None]:
stats.norm.cdf( -2.0,
               loc=yes_df.gain.mean(),
               scale=yes_df.gain.std())

#### Note:

- Yes Bank higher probablity compared to ICICI Bank.

### What is the probability that the stock will make a gain of 3% or more?

#### ICICI Bank - probability of a gain of 3%

In [None]:
1 - stats.norm.cdf(3.0,
                   loc=icici_df.gain.mean(),
                   scale=icici_df.gain.std())               

#### Yes Bank - probability of a gain of 3%

In [None]:
1 - stats.norm.cdf(3.0,
                   loc=yes_df.gain.mean(),
                   scale=yes_df.gain.std())               

#### Note:
- Yes Bank has 16% probability of making a gain of 3% or more.

### Explore few other distributions

[statdist](https://statdist.com/)

## Ex1: Participant Exercise

1. Download the daily stock price of four or five randomly selected stocks. The stocks can belong to one sector.

2. Find out the value at risk (Var) of each stock at 95%.

3. Plot the daily gain or loss of all the stocks in one plot.

4. Find out the probability of making 4% gain in each of the stock.