A Data-Driven Market Intelligence System for Stock Risk and Behavior Analysis at the Nairobi Securities Exchange

Problem Statement

Despite the Nairobi Securities Exchange (NSE) seeing a surge in equity turnover-rising 18% to Ksh 56 billion in 2025 ([SE Half-Year Results, 2025](https://www.nse.co.ke/wp-content/uploads/NSE-Plc-Unaudited-Group-results-for-the-6-months-ended-30-June-2025.pdf)), retail participation remains hampered by a critical information gap. Research from the Institute of Certified Investment and Financial Analysts reveals that 77% of Kenyan retail investors rely on "personal research" and social intuition because they lack accessible analytical tools [ICIFA Annual Report, 2024](https://icifa.co.ke/static/resources/others/annual-report-2024465e3dbed42d.pdf)).

While the market added over Ksh 1 Trillion in capitalization since 2023, most investors suffer from "herding behavior," where decisions are made by following the crowd rather than technical data [USIU-Africa Research, 2025](https://erepo.usiu.ac.ke/xmlui/bitstream/handle/11732/8460/MASILA%20BRIAN%20SALU%20MBA%202024.pdf?sequence=1&isAllowed=y). This project bridges this gap by converting raw daily prices into behavioral risk clusters, moving investors from intuition to evidence-based decision-making.

In [None]:
Objectives

**Main Objective**
- To develop a data-driven stock market intelligence system for the Nairobi Securities Exchange that analyzes historical stock price behavior and sector characteristics to support informed and risk-aware investment decisions.

**Specific Objectives**
1) Feature Engineering: To derive financial metrics including Rolling Volatility, Daily Returns, and Maximum Drawdowns to quantify stock behavior.

2) Behavioral Segmentation: To apply Unsupervised Machine Learning (K-Means/DBSCAN) to group stocks into risk-based clusters (e.g., Stable, High-Volatility, or Speculative).

3) Sector Risk Analysis: To identify systemic risks and stability patterns across different market sectors.

4) Interactive Deployment: To present these insights through a Streamlit Dashboard that allows users to select stocks, view their risk profile, and compare them against their sectors.

In [1]:
import pandas as pd
import numpy as np

In [2]:
df_2021 = pd.read_csv("../Data/NSE_data_all_stocks_2021_upto_31dec2021.csv")
df_2022 = pd.read_csv("../Data/NSE_data_all_stocks_2022.csv")
df_2023 = pd.read_csv("../Data/NSE_data_all_stocks_2023.csv")
df_2024 = pd.read_csv("../Data/NSE_data_all_stocks_2024.csv")


from IPython.display import display

display(df_2021.head())
print(" " * 2)
display(df_2022.head())
print(" " * 2)
display(df_2023.head())
print(" " * 2)
display(df_2024.head())

Unnamed: 0,DATE,CODE,NAME,12m Low,12m High,Day Low,Day High,Day Price,Previous,Change,Change%,Volume,Adjust
0,04-Jan-21,EGAD,Eaagads Ltd,8.2,14,12.5,12.5,12.5,12.5,-,-,3200,-
1,04-Jan-21,KUKZ,Kakuzi Plc,300.0,397,365.0,365.0,365.0,365.0,-,-,-,-
2,04-Jan-21,KAPC,Kapchorua Tea Kenya Plc,59.0,90,78.0,78.0,78.0,78.0,-,-,-,-
3,04-Jan-21,LIMT,Limuru Tea Plc,360.0,475,360.0,360.0,360.0,360.0,-,-,100,-
4,04-Jan-21,SASN,Sasini Plc,14.8,20,19.5,19.5,19.5,19.5,-,-,-,-


  


Unnamed: 0,Date,Code,Name,12m Low,12m High,Day Low,Day High,Day Price,Previous,Change,Change%,Volume,Adjusted Price
0,3-Jan-22,EGAD,Eaagads Ltd,10.0,15.0,13.5,13.8,13.5,13.5,-,-,4000,-
1,3-Jan-22,KUKZ,Kakuzi Plc,355.0,427.0,385.0,385.0,385.0,385.0,-,-,-,-
2,3-Jan-22,KAPC,Kapchorua Tea Kenya Plc,80.0,101.0,99.5,99.5,99.5,95.5,4,4.19%,100,-
3,3-Jan-22,LIMT,Limuru Tea Plc,260.0,360.0,320.0,320.0,320.0,320.0,-,-,-,-
4,3-Jan-22,SASN,Sasini Plc,16.75,22.6,18.7,18.7,18.7,18.7,-,-,-,-


  


Unnamed: 0,Date,Code,Name,12m Low,12m High,Day Low,Day High,Day Price,Previous,Change,Change%,Volume,Adjusted Price
0,3-Jan-23,EGAD,Eaagads Ltd,10.35,14.5,10.5,10.5,10.5,10.5,-,-,1900.00,-
1,3-Jan-23,KUKZ,Kakuzi Plc,342.0,440.0,385.0,385.0,385.0,385.0,-,-,-,-
2,3-Jan-23,KAPC,Kapchorua Tea Kenya Plc,207.0,280.0,115.75,115.75,115.75,113.25,2.5,2.21%,100,-
3,3-Jan-23,LIMT,Limuru Tea Plc,365.0,380.0,420.0,420.0,420.0,420.0,-,-,-,-
4,3-Jan-23,SASN,Sasini Plc,15.1,22.0,22.0,22.5,22.45,22.45,-,-,6900.00,-


  


Unnamed: 0,Date,Code,Name,12m Low,12m High,Day Low,Day High,Day Price,Previous,Change,Change%,Volume,Adjusted Price
0,2-Jan-24,EGAD,Eaagads Ltd,10.35,14.5,12.8,12.8,12.8,13.95,-1.15,-8.24%,100,-
1,2-Jan-24,KUKZ,Kakuzi Plc,342.0,440.0,385.0,385.0,385.0,385.0,-,-,-,-
2,2-Jan-24,KAPC,Kapchorua Tea Kenya Plc,207.0,280.0,215.0,215.0,215.0,215.0,-,-,-,-
3,2-Jan-24,LIMT,Limuru Tea Plc,365.0,380.0,380.0,380.0,380.0,380.0,-,-,-,-
4,2-Jan-24,SASN,Sasini Plc,15.1,22.0,20.0,20.0,20.0,20.0,-,-,3300.00,-


Mapping 2021 column names to match 2022-2024

In [3]:
# Renaming 2021 columns to match 2022-2024
df_2021 = df_2021.rename(columns={
    'DATE': 'Date',
    'CODE': 'Code',
    'NAME': 'Name',
    'Adjust': 'Adjusted Price'
})

df_2021.head()

Unnamed: 0,Date,Code,Name,12m Low,12m High,Day Low,Day High,Day Price,Previous,Change,Change%,Volume,Adjusted Price
0,04-Jan-21,EGAD,Eaagads Ltd,8.2,14,12.5,12.5,12.5,12.5,-,-,3200,-
1,04-Jan-21,KUKZ,Kakuzi Plc,300.0,397,365.0,365.0,365.0,365.0,-,-,-,-
2,04-Jan-21,KAPC,Kapchorua Tea Kenya Plc,59.0,90,78.0,78.0,78.0,78.0,-,-,-,-
3,04-Jan-21,LIMT,Limuru Tea Plc,360.0,475,360.0,360.0,360.0,360.0,-,-,100,-
4,04-Jan-21,SASN,Sasini Plc,14.8,20,19.5,19.5,19.5,19.5,-,-,-,-


In [4]:
df_2021.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17746 entries, 0 to 17745
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Date            17746 non-null  object
 1   Code            17746 non-null  object
 2   Name            17746 non-null  object
 3   12m Low         17746 non-null  object
 4   12m High        17746 non-null  object
 5   Day Low         17746 non-null  object
 6   Day High        17746 non-null  object
 7   Day Price       17746 non-null  object
 8   Previous        17746 non-null  object
 9   Change          17746 non-null  object
 10  Change%         17746 non-null  object
 11  Volume          17746 non-null  object
 12  Adjusted Price  17746 non-null  object
dtypes: object(13)
memory usage: 1.8+ MB


In [5]:
df_2021.isna().sum()

Date              0
Code              0
Name              0
12m Low           0
12m High          0
Day Low           0
Day High          0
Day Price         0
Previous          0
Change            0
Change%           0
Volume            0
Adjusted Price    0
dtype: int64

In [6]:
df_2022.isna().sum() 

Date              0
Code              0
Name              0
12m Low           0
12m High          0
Day Low           0
Day High          0
Day Price         0
Previous          0
Change            0
Change%           0
Volume            0
Adjusted Price    0
dtype: int64

In [7]:
df_2023.isna().sum()  

Date              0
Code              0
Name              0
12m Low           0
12m High          0
Day Low           0
Day High          0
Day Price         0
Previous          0
Change            0
Change%           0
Volume            0
Adjusted Price    0
dtype: int64

In [8]:
df_2024.isna().sum()   

Date              0
Code              0
Name              0
12m Low           0
12m High          0
Day Low           0
Day High          0
Day Price         0
Previous          0
Change            0
Change%           0
Volume            0
Adjusted Price    0
dtype: int64

In [9]:
print(len(df_2021))
print(len(df_2022))
print(len(df_2023))
print(len(df_2024))

17746
16806
17274
18119
