## Conclusion

This project tackled a common problem: most ordinary investors in the Nairobi Securities Exchange; lack the tools and information needed to make smart investment decisions. To fix this, we coverted raw historical stock price data into clear, useful risk indicators including how much a stock's price fluctuates, its daily returns, and how badly it has dropped in the past.

Machine learning was then used to group NSE stocks by behavior, separating them into low-risk, moderate-risk, and high-risk categories. This removes guesswork and gives investors an objective way to compare stocks. The analysis also looked at patterns across different sectors, revealing which industries tend to be stable and which are more volatile.

All of this was packaged into an easy to use interactive dashboard built using streamlit, so that investors can explore stocks, compare them, and understand their risk level without needing a finance degree.

The result is a practical, data-driven tool that helps everyday investors make better decisions moving away from following the crowd and toward evidence based investing in Kenya's stock market.

## Key Insights
KMeans and Hierarchical clustering produced identical, high-quality results (silhouette: *0.72*) and are the recommended methods. DBSCAN left 54% of stocks unclassified and should be retired for this dataset.


•⁠  ⁠*91% of NSE stocks are Low Risk (Cluster 0)* — moderate volatility, muted returns, reflecting a low-liquidity emerging market bourse.

•⁠  ⁠*Risk label ≠ return quality.* Several Cluster 0 stocks (SGL, TCL, UCHM) have negative mean returns. Low volatility does not guarantee positive performance.

•⁠  ⁠*Cluster 1 (IMH, KPLC, KEGN)* shows the strongest 60-day momentum in the universe (+37.7%) but also the deepest historical drawdowns (-55.2%) — high-frequency trend plays with tail risk.

•⁠  ⁠*Cluster 2 (ORCH)* is a statistical outlier. Its 205% 60-day return and 0.89 Sharpe ratio stem from near-zero trading frequency (4.25%) — a liquidity mirage, not a replicable signal.

•⁠  ⁠*Cluster 3 (KNRE)* is the only stock with a negative Sharpe ratio, negative mean return, and daily trading — the clearest "avoid" signal in the universe.

•⁠  ⁠Banking sector leads quality within Cluster 0. SCBK (+33.2%), DTK (+40.2%), and ABSA (+28.5%) combine low volatility with the strongest 60-day momentum in the low-risk tier — a rare convergence of capital preservation and near-term price strength on the NSE.


## Recommendation

1. Build a Simple Investor App - Create a website or mobile app where any investor can search for an NSE stock and immediately see how risky it is and how it has been performing. Keep it simple enough that someone with no finance background can use it comfortably.

2. Teach Investors How to Use Risk Profiles - Hold workshops and create short videos and simple graphics that explain what the different risk categories mean and how investors can use them to pick better stocks and manage their money wisely.

3. Best low-risk picks — Cluster 0 stocks with confirmed upward momentum: ⁠ SCBK ⁠, ⁠ DTK ⁠, ⁠ ABSA ⁠, ⁠ COOP ⁠, ⁠ NCBA

4. Encourage Data-Based Decisions - Push investors to stop following the crowd and instead look at real numbers. The app should clearly show key indicators that measure actual stock performance and explain what those numbers mean in everyday language.

5. Tactical/growth exposure — Cluster 1 with strict position caps (≤10%): ⁠ IMH ⁠, ⁠ KPLC ⁠, ⁠ KEGN ⁠



## Model Improvement Recommendations

### 1. Handle Outliers Before Clustering
ORCH and KNRE are forming single-stock clusters, distorting the model. Pre-screen outliers using Isolation Forest or Z-score thresholds, cluster the core 55-stock universe separately, then assign outliers post-hoc.

### 2. Fix Feature Scaling
Switch from standard to *robust scaling* (median/IQR) — current outliers inflate standard deviation and skew cluster centroids. Also drop redundant features; ⁠ volatility_mean ⁠ and ⁠ volatility_5d ⁠ are likely highly correlated.

### 3. Validate Optimal k
k=4 appears fixed. Confirm with elbow method, silhouette scores across k=2–10, and gap statistic. Two single-stock clusters suggest the model may be slightly over-specified.

### 4. Add Fundamental Features
Price-derived metrics only capture market behaviour, not intrinsic value. Add P/E, P/B, dividend yield, and beta (stock vs NSE-20) to distinguish value from momentum within the same volatility tier.

### 5. Test Cluster Stability
Bootstrap resample (100× on 80% subsamples) and measure per-stock assignment consistency. Stocks with <70% consistency are border cases and should not be used as strong buy/avoid signals.

### 6. Roll Clusters Over Time
The model is a static snapshot. Implement rolling 6-month clustering and build a transition matrix — stocks migrating from Low Risk → Medium-High Risk quarter-on-quarter are early warning signals.

### 7. Validate with Forward Returns
Confirm the clusters have predictive power: do higher-risk clusters produce higher forward volatility and deeper drawdowns? If not, the separation is statistically clean but investment-useless.
### 6. Keep the Tool Up to Date 
Regularly update the system with fresh market data and new information so that the risk profiles stay accurate and useful over time.