**Introduction and Overview of the Stock Market Index Analysis Project**

In our project, we're analyzing a year's worth of financial data from a top data analytics website. We're exploring the performance of key companies, their categorization as LargeCap or Non-LargeCap, and the metrics of nine important indexes. We aim to gain valuable insights for informed decision-making in the financial sector by examining ratings, prices, and fluctuation rates. We aim to create compelling visualizations using Python to dive into specifics for each index and company and extract meaningful insights for strategic planning and improved decision-making.

Three major companies: DJ, NASDAQ, and SNP.

DJ and NASDAQ fall under Non-LargeCap, while SNP belongs to LargeCap.

A total of 9 indexes:

	-> DJ : 'D30', 'DSI', 'IA’
	-> SNP : '400', '500', '300’
	-> NASDAQ: 'SOX', 'NDX', 'NQGI’
    
Key Metrics

	-> Ratings: Measure of company performance.
	-> Prices: Financial value of each index.
	-> Fluctuation Rate: Indicates the volatility of each index.

In [None]:
# Definition of the Index class representing a generic financial index
class Index:
    def __init__(self, date, index_name, company, type, rating, price, fluct_rate):
        self.date = date
        self.index_name = index_name
        self.company = company
        self.type = type
        self.rating = rating
        self.price = price
        self.fluct_rate = fluct_rate
        self.unique_id = id(self)

    def __str__(self):
        return f"{self.unique_id},{self.date},{self.index_name},{self.company},{self.type},{self.rating},{self.price},{self.fluct_rate}"

# Definition of LargeCapIndex as a subclass of Index, inheriting its attributes
class LargeCapIndex(Index):
    def __init__(self, date, index_name, company, rating, price, fluct_rate):
        super().__init__(date, index_name, company, "LargeCap", rating, price, fluct_rate)

# Definition of NonLargeCapIndex as a subclass of Index, inheriting its attributes
class NonLargeCapIndex(Index):
    def __init__(self, date, index_name, company, rating, price, fluct_rate):
        super().__init__(date, index_name, company, "non-LargeCap", rating, price, fluct_rate)

# Definition of SNP as a subclass of LargeCapIndex, inherited from both LargeCapIndex and Index
class SNP(LargeCapIndex):
    def __init__(self, date, index_name, rating, price, fluct_rate):
        super().__init__(date, index_name, "SNP", rating, price, fluct_rate)

# Definition of DJ as a subclass of NonLargeCapIndex, inherited from both NonLargeCapIndex and Index
class DJ(NonLargeCapIndex):
    def __init__(self, date, index_name, rating, price, fluct_rate):
        super().__init__(date, index_name, "DJ", rating, price, fluct_rate)

# Definition of NASDAQ as a subclass of NonLargeCapIndex, inherited from both NonLargeCapIndex and Index
class NASDAQ(NonLargeCapIndex):
    def __init__(self, date, index_name, rating, price, fluct_rate):
        super().__init__(date, index_name, "NASDAQ", rating, price, fluct_rate)

# Testing the DJ class
DJIndex = DJ("2022-11-15", "index_nameA", 2, 641, 86.06)
print(str(DJIndex))


In [None]:
# Importing required libraries and extracting the objects from the .dat binary file and loading into data.csv file

import pickle
import csv

with open('Indexpkl50209.dat', 'rb') as pickle_file:
    try:
        while(True):
            disp = pickle.load(pickle_file)
            with open('data.csv', 'w') as f:
                f.write("uniqueId,date,index_name,company,type,rating,price,fluct_rate\n")
                for obj in disp:
                     f.write(str(obj)+'\n')
    except EOFError:
        pass

In [None]:
# Importing required libraries and reading the csv file created into stock_data data set variable

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

stock_data = pd.read_csv("data.csv")
stock_data = stock_data[0:10000]
stock_data

In [None]:
# Overview of the Data set 

stock_data.info()

In [None]:
# Checking for any nulls

stock_data.isna().sum()
#stock_data.isnull().sum() -- same output

In [None]:
# Examining statistical measures 

print(stock_data['date'].mode()[0])
print('------------------------------------')
print(round(stock_data.groupby('index_name')['rating'].mean()))
print('------------------------------------')
print(round(stock_data['rating'].mean()))
print('------------------------------------')
print(round(stock_data.groupby('index_name')['price'].mean()))
print('------------------------------------')
print(round(stock_data['price'].mean()))
print('------------------------------------')
print(round(stock_data.groupby('index_name')['fluct_rate'].mean(),2))
print('------------------------------------')
print(round(stock_data['fluct_rate'].mean(),2))

In [None]:
# Cleaning columns with null values with the statistical measure values

cols = ['date','rating']

for column in cols:
    if(column == 'date'):
        stock_data[column].fillna(stock_data[column].mode()[0], inplace=True)
    elif(column == 'rating'):
        stock_data[column].fillna(round(stock_data[column].mean()), inplace=True)

stock_data['price'] = stock_data.groupby('index_name')['price'].transform(lambda x: x.fillna(int(round(x.mean()))))
stock_data['fluct_rate'] = stock_data.groupby('index_name')['fluct_rate'].transform(lambda x: x.fillna(round(x.mean(),2)))

In [None]:
# Checking for any nulls

stock_data.isna().sum()

In [None]:
# Cleaned Data set

stock_data

In [None]:
# Pie Chart 1

# Create a figure for the pie chart with a specified size
plt.figure(figsize=(6, 6))

# Generate a pie chart using the 'type' column counts from the 'stock_data' DataFrame
# The slices will be labeled with the unique values in the 'type' column
# The autopct parameter displays the percentage on each slice with one decimal place
# The startangle parameter sets the starting angle for the first slice to 90 degrees (vertical)
plt.pie(stock_data['type'].value_counts(), labels=stock_data['type'].value_counts().index, autopct='%1.1f%%', startangle=90)

# Set the title of the pie chart
plt.title('Pie Chart: Distribution of Market Type')

# Display the pie chart
plt.show()


In [None]:
# Pie Chart 2

# Filter the 'stock_data' DataFrame to include only rows where the 'type' column is 'LargeCap'
large_cap_data = stock_data[(stock_data['type'] == 'LargeCap')]

# Create a figure for the pie chart with a specified size
plt.figure(figsize=(4, 4))

# Generate a pie chart using the counts of 'index_name' values from the filtered 'large_cap_data' DataFrame
# The slices will be labeled with the unique values in the 'index_name' column
# The autopct parameter displays the percentage on each slice with one decimal place
# The startangle parameter sets the starting angle for the first slice to 90 degrees (vertical)
plt.pie(large_cap_data['index_name'].value_counts(), labels=large_cap_data['index_name'].value_counts().index, autopct='%1.1f%%', startangle=90, colors=plt.cm.Paired.colors)

# Set the title of the pie chart
plt.title('Pie Chart: Distribution of SNP Indexes')

# Display the pie chart
plt.show()


**Observations and Insights from the Pie Charts:**

Pie Chart 1: Illustrates the distribution of data across two distinct company/market types and the corresponding market share held by each. Notably, a limited number of companies operate with substantial capital, whereas a more significant presence is observed among mid-sized and smaller companies operating at moderate and lower capital levels.

Pie Chart 2: Depicts the distribution of data within the SNP large-cap company indexes (300, 400, 500). Impressively, the allocation across these indexes within the SNP large-cap company realm appears to be relatively equitable, suggesting a balanced distribution of market representation among the specified indexes.

In [None]:
# Box Plot

# Create a figure for the box plot with a specified size
plt.figure(figsize=(10, 6))

# Generate a box plot using Seaborn, where 'company' is on the x-axis and 'rating' is on the y-axis
# The data is drawn from the 'stock_data' DataFrame
sns.boxplot(x=stock_data['company'], y=stock_data['rating'], data=stock_data, palette='Set2')

# Set the title of the box plot
plt.title('Box Plot of Rating by Company Type')

# Set labels for x and y axes
plt.xlabel('Company')
plt.ylabel('Rating')

# Display the box plot
plt.show()


**Observations and Insights from the Box Plot:**

- All companies share a median rating of 3.0, indicating similar central tendencies.

- Consistent and uniform data distribution within each company.

- Identical interquartile ranges and whiskers suggest equal data variability.

- Absence of outliers implies a lack of extreme values.

- Statistical analysis may not reveal significant variations in central tendency, spread, or distribution shape across the three companies based on rating alone.

In [None]:
# Generic data frame preparation to be used further in the visualizations

# Segregating the data according to each index

#DJ
dj_data = stock_data[stock_data['company'] == 'DJ']

#NASDAQ
nasdaq_data = stock_data[stock_data['company'] == 'NASDAQ']

#SNP
snp_data = stock_data[stock_data['company'] == 'SNP']

# This step is done to break the date column into months ,year and quarter to be used further in the visualizations
# Assigning the 'stock_data' DataFrame to a new variable 'stock'
stock = stock_data

# Converting 'date' to datetime
stock['date'] = pd.to_datetime(stock['date'])

# Extract month from the 'date' column
stock['month'] = stock['date'].dt.month

# Extract year from the 'date' column
stock['year'] = stock['date'].dt.year

# Extract quarter from the 'date' column
stock['quarter'] = stock['date'].dt.to_period("Q")


In [None]:
# Group by 'index_name', 'quarter', and 'year', then calculate the mean of 'price'
result = stock.groupby(['index_name', 'quarter', 'year'])['fluct_rate'].mean().reset_index()

# Filtered dataframe using quarter, index name, and fluct rate to create a heatmap

quarter_index_fluctrate_data = [
    # Quarter 1
    [result[(result.quarter == '2022Q1') & (result.index_name == '300')].fluct_rate,
     result[(result.quarter == '2022Q1') & (result.index_name == '400')].fluct_rate,
     result[(result.quarter == '2022Q1') & (result.index_name == '500')].fluct_rate,
     result[(result.quarter == '2022Q1') & (result.index_name == 'D30')].fluct_rate,
     result[(result.quarter == '2022Q1') & (result.index_name == 'IA')].fluct_rate,
     result[(result.quarter == '2022Q1') & (result.index_name == 'DSI')].fluct_rate,
     result[(result.quarter == '2022Q1') & (result.index_name == 'NQGI')].fluct_rate,
     result[(result.quarter == '2022Q1') & (result.index_name == 'SOX')].fluct_rate,
     result[(result.quarter == '2022Q1') & (result.index_name == 'NDX')].fluct_rate],

    # Quarter 2
    [result[(result.quarter == '2022Q2') & (result.index_name == '300')].fluct_rate,
     result[(result.quarter == '2022Q2') & (result.index_name == '400')].fluct_rate,
     result[(result.quarter == '2022Q2') & (result.index_name == '500')].fluct_rate,
     result[(result.quarter == '2022Q2') & (result.index_name == 'D30')].fluct_rate,
     result[(result.quarter == '2022Q2') & (result.index_name == 'IA')].fluct_rate,
     result[(result.quarter == '2022Q2') & (result.index_name == 'DSI')].fluct_rate,
     result[(result.quarter == '2022Q2') & (result.index_name == 'NQGI')].fluct_rate,
     result[(result.quarter == '2022Q2') & (result.index_name == 'SOX')].fluct_rate,
     result[(result.quarter == '2022Q2') & (result.index_name == 'NDX')].fluct_rate],

    # Quarter 3
    [result[(result.quarter == '2022Q3') & (result.index_name == '300')].fluct_rate,
     result[(result.quarter == '2022Q3') & (result.index_name == '400')].fluct_rate,
     result[(result.quarter == '2022Q3') & (result.index_name == '500')].fluct_rate,
     result[(result.quarter == '2022Q3') & (result.index_name == 'D30')].fluct_rate,
     result[(result.quarter == '2022Q3') & (result.index_name == 'IA')].fluct_rate,
     result[(result.quarter == '2022Q3') & (result.index_name == 'DSI')].fluct_rate,
     result[(result.quarter == '2022Q3') & (result.index_name == 'NQGI')].fluct_rate,
     result[(result.quarter == '2022Q3') & (result.index_name == 'SOX')].fluct_rate,
     result[(result.quarter == '2022Q3') & (result.index_name == 'NDX')].fluct_rate],

    # Quarter 4
    [result[(result.quarter == '2022Q4') & (result.index_name == '300')].fluct_rate,
     result[(result.quarter == '2022Q4') & (result.index_name == '400')].fluct_rate,
     result[(result.quarter == '2022Q4') & (result.index_name == '500')].fluct_rate,
     result[(result.quarter == '2022Q4') & (result.index_name == 'D30')].fluct_rate,
     result[(result.quarter == '2022Q4') & (result.index_name == 'IA')].fluct_rate,
     result[(result.quarter == '2022Q4') & (result.index_name == 'DSI')].fluct_rate,
     result[(result.quarter == '2022Q4') & (result.index_name == 'NQGI')].fluct_rate,
     result[(result.quarter == '2022Q4') & (result.index_name == 'SOX')].fluct_rate,
     result[(result.quarter == '2022Q4') & (result.index_name == 'NDX')].fluct_rate]
]

# Create a DataFrame using the data with specified column and index names
quarter_index_fluctrate_df = pd.DataFrame(quarter_index_fluctrate_data, 
                                          columns=['D30', 'DSI', 'IA', '400', '500', '300', 'SOX', 'NDX', 'NQGI'], 
                                          index=['Q1', 'Q2', 'Q3', 'Q4'])

# Display the DataFrame
quarter_index_fluctrate_df.head()


In [None]:
# Display a heatmap using Matplotlib's imshow function with specified data and colormap
plt.imshow(quarter_index_fluctrate_data, cmap='YlOrBr')

# Add a color bar to the heatmap for reference
cbar = plt.colorbar()

# Set the title of the heatmap
plt.title("Heatmap of Average Fluctuation Rate for Indexes\nAcross Different Quarters")

# Set labels for x and y axes
plt.xlabel("Index")
plt.ylabel("Quarters")

# Set the tick positions and labels for x-axis (Indexes) with rotation
plt.xticks(range(len(quarter_index_fluctrate_df.columns)),
           quarter_index_fluctrate_df.columns, rotation=90)

# Set the tick positions and labels for y-axis (Quarters)
plt.yticks(range(len(quarter_index_fluctrate_df.index)),
           quarter_index_fluctrate_df.index)

# Display the heatmap
plt.show()


**Observations and Insights from the Heatmap:**

In examining all indexes, a consistent trend emerges with an average fluctuation rate hovering around 15-16. However, notable exceptions include NDX and SOX, where the average fluctuation rate in quarter 1 was notably high, indicating heightened volatility during the initial months of the year. Interestingly, this volatility subsided in the subsequent months. On the flip side, NQGI exhibited the lowest average fluctuation rate among all indexes in quarter 1, and it continued to stabilize around 15 as the year progressed. Similar trends of stabilization were observed in several other indexes, although they remained relatively volatile. 

In [None]:
# Histogram

# Fluctuation Rate Histogram
plt.hist(dj_data['fluct_rate'],color = 'cornflowerblue', histtype = 'bar')
# x-axis label
plt.xlabel('Fluctuation')
# frequency label
plt.ylabel('')
# plot title
plt.title('DJ Fluctuation Rate histogram')
# function to show the plot
plt.show()


In [None]:
# Fluctuation Rate Histogram
plt.hist(nasdaq_data['fluct_rate'],color = 'orange', histtype = 'bar')
# x-axis label
plt.xlabel('Fluctuation')
# frequency label
plt.ylabel('')
# plot title
plt.title('NASDAQ Fluctuation Rate histogram')
# function to show the plot
plt.show()


In [None]:
# Fluctuation Rate Histogram
plt.hist(snp_data['fluct_rate'],color = 'green', histtype = 'bar')
# x-axis label
plt.xlabel('Fluctuation')
# frequency label
plt.ylabel('')
# plot title
plt.title('SNP Fluctation Rate histogram')
# function to show the plot
plt.show()


**Observations and Insights from the Histograms:**

- Normal Distribution:
  - All histograms exhibit a typical bell-shaped curve centered around a fluctuation rate of 15.

- Higher Frequency in NASDAQ:
  - Noticeable spike in frequency for fluctuation rates between 15 to 17.5.
  - Particularly evident in NASDAQ company indexes ('SOX,' 'NDX,' 'NQGI').

- Insight for Heatmap:
  - Supports the higher average fluctuation rate observed in the Heatmap.
  - Suggests that NASDAQ indexes experience relatively more significant fluctuations compared to others.

In [None]:
# Scatter Plot

# Group the 'stock' DataFrame by 'date' and calculate the mean of 'price' and 'fluct_rate' for each date
result_scatter = stock.groupby(['date']).agg({'price': 'mean', 'fluct_rate': 'mean'}).reset_index()

# Set the style of the Seaborn plots to white grid
sns.set(style="whitegrid")

# Create a scatter plot using Seaborn with specified parameters
plt.figure(figsize=(10, 6))
scatter = sns.scatterplot(x='price', y='fluct_rate', data=result_scatter, palette=None, marker="o", s=100) 
# Set labels for x and y axes
plt.xlabel('Avg Price')
plt.ylabel('Avg Fluctuation Rate')

# Set the title of the plot
plt.title('Scatter Plot of Avg Fluctuation Rate VS Avg Price for All Indexes in 2022')

# Display the plot
plt.show()

**Observations and Insights from the Scatter Plot:**

- Non-Linear Relationship: There isn't a straightforward linear correlation between average price and average fluctuation rate across all indexes.

- Cluster Formation: Notably, a tight cluster emerges around a mean price of 250 and a mean fluctuation rate of 15.This close grouping suggests a potential strong correlation between these variables and implies a level of stability. 

- Trading Range Identification: Understanding this typical price movement within a certain range could be valuable for traders and investors in their decision-making processes.

In [None]:
# Bar Chart

DJIndexes = ['D30','DSI','IA']
Quarters = ['Q1', 'Q2','Q3', 'Q4']

for idx in DJIndexes:
    # Calculate average pricesfor each quarter
    average_price_Q1 = stock[(stock['month'].isin([1, 2, 3])) & (stock['year'] == 2022) & (stock['index_name'] == idx)]['price'].mean()
    average_price_Q2 = stock[(stock['month'].isin([4, 5, 6])) & (stock['year'] == 2022) & (stock['index_name'] == idx)]['price'].mean()
    average_price_Q3 = stock[(stock['month'].isin([7, 8, 9])) & (stock['year'] == 2022) & (stock['index_name'] == idx)]['price'].mean()
    average_price_Q4 = stock[(stock['month'].isin([10, 11, 12])) & (stock['year'] == 2022) & (stock['index_name'] == idx)]['price'].mean()
    
    # Create a DataFrame for the average prices
    Average_Price = [average_price_Q1,average_price_Q2,average_price_Q3,average_price_Q4]

    # Plot the stacked bar chart
    plt.bar(Quarters, Average_Price , width =0.5, color = ['green', 'yellow','orange','red'])

    #Setting the limits for the bar chart
    start_value = 200
    end_value = 270
    step = 5
    plt.ylim(start_value, end_value)
    plt.yticks(range(start_value, end_value + 1, step))

    # Naming the x-axis
    plt.xlabel('Quarters')
    # Naming the y-axis
    plt.ylabel('Average Price')
    # plot title
    plt.title('Average Price for Quarters 1 to 4 in 2022 for DJ ' + idx + ' index')
    # Display the plot
    plt.show()

**Observations and Insights from the Bar Charts:**

Upon detailed analysis of individual indexes within the DJ company (D30, DSI, IA), a discernible trend emerges in the mean price over each quarter.

- D30 and IA Performance Trend: Both D30 and IA exhibit a decreasing trend in mean price over successive quarters. This suggests a strong performance at the beginning of the year followed by a decline in the subsequent quarters. 

- DSI Performance Consistency: In contrast, DSI demonstrates relatively consistent performance across quarters, except for a dip in mean price during quarter 2. 

- Factors Influencing Price Decline: Declines in mean price for underperforming indexes, especially in specific quarters, may result from internal (management changes, operational issues) and external (global events, exchange board changes) factors. Understanding these is crucial for assessing overall performance.

In [None]:
# Line Plot

# Group data by company, month, and year, then calculate the mean price
result_line = stock.groupby(['company', 'month', 'year'])['price'].mean().reset_index()

# Filter data for the year 2022
result_line_2022 = result_line[result_line['year'] == 2022]

# Separate data for each company
result_line_2022_DJ = result_line_2022[result_line_2022['company'] == 'DJ']
result_line_2022_NASDAQ = result_line_2022[result_line_2022['company'] == 'NASDAQ']
result_line_2022_SNP = result_line_2022[result_line_2022['company'] == 'SNP']

# Plotting the data for each company
plt.plot(result_line_2022_SNP['month'], result_line_2022_SNP['price'], label='SNP')
plt.plot(result_line_2022_NASDAQ['month'], result_line_2022_NASDAQ['price'], label='NASDAQ')
plt.plot(result_line_2022_DJ['month'], result_line_2022_DJ['price'], label='DJ')

# Adding labels to the x and y axes
plt.xlabel('Month')  # x-axis label
plt.ylabel('Average Price')  # y-axis label

# Adding a legend to identify each line
plt.legend(loc='best')

# Display the plot
plt.show()


**Observations and Insights from the Line Plot:**

- NASDAQ Fluctuations:
  - Sudden decrease in average price in the second month.
  - Subsequent rapid increase, indicating high variability compared to SNP and DJ.

- General Trend for All Companies:
  - Overall decreasing trend in average price over the financial year.

- Stabilization Around Mean of 250:
  - Companies eventually settle, showing a stable pattern around an average of 250.

**Conclusion**

- Dominance of mid-sized and smaller companies.
- Limited companies operate with substantial capital.
- SNP large-cap indexes (300, 400, 500) show balanced distribution.Consistent fluctuation rate trend around 15-16 across all indexes.Exceptions in NDX and SOX with heightened volatility in Quarter 1.
- The overall indexes in DJ company is the most stable of the three companies, suggesting that its price is least likely to collapse or yield a significant.
- Tight cluster around mean price of 250 and mean fluctuation rate of 15 suggests strong correlation and stability at the end of financial year.
- D30 and IA Indexes exhibit declining mean price trend over the financial year. Price decline probably influenced by internal (management changes, operational issues) and external (global events, exchange board changes) factors. This may help potential investors in predicting the market and help make better investment decisions
