# Indian Stock Analyzer

**AI‑Powered Market Trend & Stock Signal Platform (NSE / India)**

This notebook is the **complete project report + runnable pipeline** for our *Indian Stock Analyzer*.
It is structured exactly as required:

1. **Problem Definition & Objective**  
2. **Data Understanding & Preparation**  
3. **Model / System Design**  
4. **Core Implementation (runnable)**  
5. **Evaluation & Analysis**  
6. **Ethical Considerations & Responsible AI**  
7. **Conclusion & Future Scope**

> **Important:** The project supports **Live → Cache → Demo** fallback. If you don’t have API keys / internet, this notebook still runs using demo data.


## 0) Setup
This section wires the local project code into the notebook runtime.

Project root (from the submitted ZIP): `Indian Stock Analyzer/`


In [None]:
# --- Path + environment setup ---
import sys
from pathlib import Path

PROJECT_ROOT = Path(r"/mnt/data/isa/Indian Stock Analyzer")
assert PROJECT_ROOT.exists(), f"Project root not found: {PROJECT_ROOT}"

# Add project to import path
if str(PROJECT_ROOT) not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT))

# Load .env used by the Streamlit app (same behavior as app.py)
try:
    from dotenv import load_dotenv
    load_dotenv(dotenv_path=PROJECT_ROOT / ".env", override=True)
except Exception as e:
    print("dotenv not available (ok).", e)

print("✅ Project root added:", PROJECT_ROOT)


In [None]:
# --- Basic dependency checks (non-fatal) ---
import importlib

def check(pkg: str):
    try:
        importlib.import_module(pkg)
        return True
    except Exception as e:
        print(f"⚠️ Missing/failed import: {pkg} -> {e}")
        return False

_ = [check(p) for p in ["pandas","numpy","yfinance","plotly","requests","sklearn"]]


---

## 1) Problem Definition & Objective

### Selected Project Track
**Track:** *AI/ML Decision Support System (Hybrid: Time‑Series + News + LLM + Risk/Optimization)*

### Clear Problem Statement
Indian retail investors and early traders typically rely on fragmented tools:
- Separate apps for charts/indicators
- Separate sites for news
- Separate screeners for fundamentals
- No consistent way to combine signals and **explain** “why this stock is ranked higher today”.

**Problem:** Build a unified system that converts raw market + macro + news signals into **transparent, explainable rankings** and **risk-aware suggestions**.

### Real‑World Relevance & Motivation
- Indian markets are highly event-driven (results, RBI policy, geopolitics, commodity moves)
- Retail users often overreact to noise; we need **signal fusion + guardrails**
- APIs can fail / rate-limit. A reliable tool must handle outages (**cache/demo fallback**)

### Primary Objectives
- Compute multi-timeframe technical signals (RSI, MACD, SMA/EMA, volatility, breakouts)
- Integrate **news bundles + sentiment** signals
- Integrate **fundamentals & ownership/macro context** when available
- Combine signals into an **explainable composite score** and verdict
- Provide **forecast-style probability** (rise/fall/hold) with abstain thresholds
- Provide **risk metrics + portfolio guardrails / sizing**
- Keep the system robust using Live → Cache → Demo routing


---

## 2) Data Understanding & Preparation

### Data Sources (as implemented in the codebase)
Your project supports multiple providers through adapters and a router:

**Price / OHLCV**
- `analyzer/data_adapters/prices_yf.py` (Yahoo Finance)
- `analyzer/data_adapters/alpha_vantage.py`
- `analyzer/data_adapters/finnhub_adapter.py`
- `analyzer/data_adapters/fmp_adapter.py`
- `analyzer/data_adapters/prices_router.py` (provider routing + resilience)

**Fundamentals**
- `analyzer/data_adapters/fundamentals_yf.py`
- `analyzer/data_adapters/fundamentals_providers.py`
- `analyzer/features/fundamentals.py` + `fundamentals_pro_v2.py`

**News & Sentiment**
- `analyzer/data_adapters/news_providers.py` (news bundle)
- `analyzer/features/news_sentiment.py`, `sentiment.py` (signal)

**Macro & Vol Proxy**
- `analyzer/data_adapters/macro_provider.py`
- `analyzer/data_adapters/vol_proxy.py`

**Universe**
- `analyzer/data/nse_symbols.csv`

### Preparation Strategy
- Standardize symbol mapping to Yahoo tickers (e.g., `RELIANCE.NS`) using `analyzer/symbols`
- Clean OHLCV time series (missing values, non-trading days)
- Feature engineering:
  - Returns, rolling volatility
  - RSI / MACD / SMA/EMA
  - Multi-timeframe alignment and confluence signals

### Router & Reliability
Provider routing is configured via:
- `analyzer/config/router-spec.yaml`
- `analyzer/config/runtime.yaml`

The design supports:
- **Live mode** when API/provider works
- **Cache mode** when data already exists
- **Demo mode** when providers are blocked


In [None]:
# --- Inspect the universe (NSE symbols) ---
import pandas as pd

universe_path = PROJECT_ROOT / "analyzer" / "data" / "nse_symbols.csv"
universe = pd.read_csv(universe_path)
print("Universe rows:", len(universe))
universe.head(10)


In [None]:
# --- Inspect router & runtime configuration (YAML) ---
from pathlib import Path
import yaml

router_spec_path = PROJECT_ROOT / "analyzer" / "config" / "router-spec.yaml"
runtime_path = PROJECT_ROOT / "analyzer" / "config" / "runtime.yaml"

router_spec = yaml.safe_load(router_spec_path.read_text(encoding="utf-8"))
runtime_cfg = yaml.safe_load(runtime_path.read_text(encoding="utf-8"))

list(router_spec.keys()), list(runtime_cfg.keys())


---

## 3) Model / System Design

### Technique Used (Hybrid AI System)
This project is **not a single black‑box model**. It is a **hybrid pipeline**:

- **Rule + indicator-based technical engine** (interpretable)
- **ML/Calibration components** (optional, model artifacts can be plugged in)
- **News/Sentiment scoring** (NLP sentiment)
- **Probabilistic forecasting shim** (`predict_prob`) with regime-aware thresholds
- **Risk & portfolio sizing layer** (guardrails, exposure limits, turnover)
- **LLM Copilot (Big Bull 3.0)** for explanation / memo (local Ollama optional)

### High-Level Architecture

```
User (Streamlit UI / Notebook)
        |
        v
Symbol Resolver (NSE → Yahoo)
        |
        v
Data Router (Live → Cache → Demo)
  |      |        |
  v      v        v
Prices  Fundamentals  News/Macro
  |         |           |
  +---- Feature Engineering ----+
               |
               v
TA Indicators + Strategy Engine + Sentiment
               |
               v
Score Fusion (Technical + Fundamental + News)
               |
               v
Forecast Probability + Risk Guardrails
               |
               v
Rankings + Explainable Output (+ optional LLM memo)
```

### Justification of Design Choices
- **Explainability:** indicators and feature-based scores are transparent
- **Robustness:** router + cache + demo mode prevents total failure
- **Safety:** risk guardrails and abstain thresholds prevent over-confidence
- **Scalability:** adapters allow adding/removing providers without rewriting the UI


---

## 4) Core Implementation (Runs Top-to-Bottom)

We now run the core pipeline for a small watchlist.
You can change the `symbols` list to any NSE tickers.


In [None]:
# --- Choose symbols to analyze ---
# Use Yahoo-format ('.NS') OR normal names (we will resolve)
symbols = ["RELIANCE", "TCS", "INFY", "HDFCBANK", "SBIN"]

from analyzer.symbols import resolve_to_yahoo
resolved = [resolve_to_yahoo(s) for s in symbols]
resolved


In [None]:
# --- Fetch price history (resilient) OR generate demo series ---
import numpy as np
import pandas as pd

from analyzer.data_adapters.prices_yf import get_price_history_resilient


def demo_ohlcv(n=260, start=100.0, seed=7):
    rng = np.random.default_rng(seed)
    rets = rng.normal(0.0005, 0.02, size=n)
    px = start * np.exp(np.cumsum(rets))
    idx = pd.bdate_range(end=pd.Timestamp.today().normalize(), periods=n)
    df = pd.DataFrame({
        "Open": px * (1 + rng.normal(0, 0.003, size=n)),
        "High": px * (1 + rng.normal(0.01, 0.004, size=n)),
        "Low":  px * (1 - rng.normal(0.01, 0.004, size=n)),
        "Close": px,
        "Volume": rng.integers(2e6, 15e6, size=n)
    }, index=idx)
    return df

prices = {}
for sym in resolved:
    try:
        df = get_price_history_resilient(sym, period="1y", interval="1d")
        if df is None or getattr(df, "empty", True):
            raise ValueError("empty")
        prices[sym] = df
    except Exception:
        prices[sym] = demo_ohlcv(seed=abs(hash(sym)) % (2**32))

{sym: prices[sym].shape for sym in prices}


In [None]:
# --- Compute Technical Indicators (RSI, MACD, SMA/EMA, etc.) ---
from analyzer.ta.indicators import compute_all_indicators

indicators = {sym: compute_all_indicators(df) for sym, df in prices.items()}

# show one
sample_sym = resolved[0]
indicators[sample_sym].tail(5)


In [None]:
# --- Compute key Technical Signals used in the app ---
from analyzer.ta.signals import (
    mtf_alignment_signal,
    hybrid_confluence_signal,
    ema200_macd_signal,
    atr_breakout_signal,
    bollinger_volume_signal,
    macd_hist_contraction_signal,
)

def signal_pack(df_ind):
    return {
        "mtf_alignment": mtf_alignment_signal(df_ind),
        "hybrid_confluence": hybrid_confluence_signal(df_ind),
        "ema200_macd": ema200_macd_signal(df_ind),
        "atr_breakout": atr_breakout_signal(df_ind),
        "bollinger_volume": bollinger_volume_signal(df_ind),
        "macd_hist_contraction": macd_hist_contraction_signal(df_ind),
    }

signals = {sym: signal_pack(indicators[sym]) for sym in resolved}
signals


In [None]:
# --- Strategy Engine (100 strategies) + prediction projection helpers ---
from analyzer.ta.strategy_engine import run_strategies
from analyzer.ta.projections import overall_predicted_pct, tally_strategy_buckets

strategy_rows = {}
for sym in resolved:
    rows = run_strategies(indicators[sym])
    strategy_rows[sym] = rows

# Quick bucket counts per symbol
bucket_counts = {sym: tally_strategy_buckets(strategy_rows[sym]) for sym in resolved}
bucket_counts


### Fundamentals + News/Sentiment (Hybrid Signals)
The Streamlit app uses:
- `get_fundamentals_snapshot` + `compute_fundamentals`
- `get_news_bundle` + `get_sentiment_signal`

All calls are guarded (if blocked, we still proceed).


In [None]:
# --- Fundamentals snapshot + computed fundamentals ---
from analyzer.data_adapters.prices_yf import get_fundamentals_snapshot
from analyzer.features.fundamentals import compute_fundamentals

fund_snap = {}
fund_comp = {}
for sym in resolved:
    try:
        snap = get_fundamentals_snapshot(sym)
    except Exception:
        snap = {}
    fund_snap[sym] = snap

    try:
        fund_comp[sym] = compute_fundamentals(snap)
    except Exception:
        fund_comp[sym] = {}

# show one
sym = resolved[0]
fund_snap[sym], list(fund_comp[sym].keys())[:12]


In [None]:
# --- News bundle + daily sentiment signal ---
from analyzer.data_adapters.news_providers import get_news_bundle
from analyzer.features.sentiment import get_sentiment_signal

news = {}
sent = {}
for sym in resolved:
    try:
        news[sym] = get_news_bundle(sym, max_items=12)
    except Exception:
        news[sym] = {"items": [], "source": "unavailable"}

    try:
        sent[sym] = get_sentiment_signal(sym, news_bundle=news[sym])
    except Exception:
        sent[sym] = {"sentiment": 0.0, "confidence": 0.0, "note": "unavailable"}

# show one
resolved[0], sent[resolved[0]]


### Unified Investability Score (Explainable Fusion)
The app fuses **technical + fundamental + news** to compute a final decision.

Key functions:
- `analyzer/decision/investability.py` → `combine_scores`, `investability`


In [None]:
# --- Combine scores + investability verdict ---
from analyzer.decision.investability import combine_scores, investability

combined = {}
for sym in resolved:
    try:
        # combine_scores expects (ta, fa, news) style objects; our pipeline uses indicators + fundamentals + sentiment
        combined[sym] = combine_scores(
            technical=signals[sym],
            fundamentals=fund_comp[sym],
            sentiment=sent[sym],
        )
    except Exception as e:
        combined[sym] = {"error": str(e)}

verdicts = {}
for sym in resolved:
    try:
        verdicts[sym] = investability(combined[sym])
    except Exception as e:
        verdicts[sym] = {"error": str(e)}

verdicts


### Forecast Probability (Regime-aware)
The project exposes a stable API `predict_prob` which:
- prefers the modern BigBull pipeline if present
- falls back to ensemble or heuristic
- uses regime/horizon thresholds and can abstain

File: `analyzer/forecast/prob_predictor.py`


In [None]:
# --- Forecast-style probability payload ---
from analyzer.forecast.prob_predictor import predict_prob

prob_payload = {}
for sym in resolved:
    try:
        df = prices[sym]
        prob_payload[sym] = predict_prob(
            df=df,
            fundamentals_info=fund_snap.get(sym),
            news_bundle=news.get(sym),
            horizon_key="20D",
            models_dir=str(PROJECT_ROOT / "models")
        )
    except Exception as e:
        prob_payload[sym] = {"error": str(e)}

# show
prob_payload[resolved[0]]


### Ranking (Stocks → Ordered list)
We build a simple ranking using the probability + confidence and investability output.
This is **explainable** (we can show each component).


In [None]:
# --- Build an explainable ranking table ---
import pandas as pd

rows = []
for sym in resolved:
    p = prob_payload.get(sym, {})
    inv = verdicts.get(sym, {})
    rows.append({
        "symbol": sym,
        "verdict": p.get("verdict", "NA"),
        "p_up": p.get("p_up", None),
        "confidence_pct": p.get("confidence_pct", None),
        "expected_return": p.get("expected_return", None),
        "abstain": p.get("abstain", None),
        "investability": inv.get("verdict") if isinstance(inv, dict) else str(inv),
        "notes": (p.get("details", {}) or {}).get("source")
    })

rank_df = pd.DataFrame(rows)
rank_df["rank_score"] = (
    rank_df["p_up"].fillna(0.5) * (rank_df["confidence_pct"].fillna(0) / 100.0)
)
rank_df = rank_df.sort_values("rank_score", ascending=False)
rank_df


### Risk & Portfolio Optimization
The project includes risk sizing and guardrails. The same logic used in backtesting can
translate signal tables into **portfolio weights**.

Files:
- `analyzer/modeling/position_sizing.py`
- `analyzer/backtest/positioning.py` (weights_from_signals)
- `analyzer/ui/tabs/portfolio_risk.py` (Streamlit tab wrapper)

Below we demonstrate building a tiny weights panel from probability payloads.


In [None]:
# --- Build a minimal long-form signals table for weights_from_signals ---
import pandas as pd
from datetime import datetime

from analyzer.modeling.position_sizing import PositionSizingConfig
from analyzer.backtest.positioning import weights_from_signals, PortfolioGuardrails

# Create a one-timestamp signal snapshot.
# If your quantile models exist, use predict_quantiles_for_frame outputs instead.
now = pd.Timestamp.utcnow().normalize()

sig_rows = []
for sym in resolved:
    p = prob_payload.get(sym, {})
    # use conservative interval placeholders if quantiles not available
    med = float(p.get("expected_return", 0.0) or 0.0)
    sig_rows.append({
        "timestamp": now,
        "symbol": sym,
        "median": med,
        "low": med - 0.03,
        "high": med + 0.03,
        "p_up": float(p.get("p_up", 0.5) or 0.5),
        "atr_pct": 0.02,
    })

signals_long = pd.DataFrame(sig_rows)

cfg = PositionSizingConfig(
    max_single_weight=0.20,
    min_p_up=0.55,
    min_confidence_pct=60.0,
)
guards = PortfolioGuardrails(max_gross_exposure=1.0, max_single_weight=0.20, max_turnover_per_step=None)

w = weights_from_signals(signals_long, cfg, guards)
w


### Backtesting (User Backtest Engine)
The project includes a backtesting engine:
- `analyzer/backtest/engine.py` + `simple_engine.py` + `walkforward.py`

We demonstrate a safe call. If backtest parameters mismatch, the cell will not crash.


In [None]:
# --- Run a small backtest (guarded) ---
from analyzer.backtest.engine import run_user_backtest, BacktestError

bt_out = {}
for sym in resolved[:2]:
    try:
        res = run_user_backtest(
            symbol=sym,
            period="1y",
            interval="1d",
            strategy_name="EMA200_MACD",  # one of the strategies used in the app
            initial_cash=100000,
            slippage_bps=5,
            fee_bps=2,
        )
        bt_out[sym] = {
            "final_equity": res.get("final_equity"),
            "cagr": res.get("cagr"),
            "sharpe": res.get("sharpe"),
            "max_drawdown": res.get("max_drawdown"),
            "trades": res.get("trades"),
        }
    except BacktestError as e:
        bt_out[sym] = {"error": f"BacktestError: {e}"}
    except Exception as e:
        bt_out[sym] = {"error": str(e)}

bt_out


---

## 5) Evaluation & Analysis

### Metrics Used
Your project uses/mentions multiple evaluation views depending on the module:

**Technical / Strategy evaluation**
- win rate, hit rate
- Sharpe ratio
- max drawdown
- CAGR / total return
- trade count, average trade return

**Forecast evaluation**
- calibrated confidence (bucket-based)
- abstain rate vs precision target
- horizon-based thresholding

**System reliability**
- cache hit rate
- fallback usage (demo mode)
- provider health diagnostics

### Sample Outputs
- Ranking table (above)
- Probability payload with `p_up`, `confidence_pct`, `expected_return`
- Optional LLM memo via Big Bull 3.0 (Ollama)

### Limitations
- Market regime shifts can break indicator relationships
- News sentiment is noisy and may lag actual price moves
- Live APIs can rate-limit; system must rely on cache
- Forecast artifacts may be absent (models/ empty) → fallback heuristics used


---

## 6) Ethical Considerations & Responsible AI

### Bias & Fairness
- The system may implicitly favor large-cap, liquid stocks due to better data coverage.
- News coverage is uneven: popular stocks get more headlines.

### Dataset Limitations
- Yahoo/API data can contain missing candles or adjusted series changes.
- Fundamentals from free sources may be delayed.

### Responsible Use
- This is a **decision-support** tool, not guaranteed prediction.
- Outputs should be used with risk management and personal judgement.
- Always disclose that predictions are probabilistic and can be wrong.

### Safety Features Included
- Abstain thresholds (don’t force a trade)
- Confidence calibration buckets (when enabled)
- Portfolio guardrails (gross exposure, max single weight, turnover caps)


---

## 7) Conclusion & Future Scope

### Summary
We built a **unified Indian Stock Analyzer** that:
- Pulls market data via a resilient router
- Computes multi-indicator technical analysis
- Integrates fundamentals + news sentiment
- Produces explainable rankings and probability outputs
- Includes backtesting and risk guardrails
- Remains reliable using cache/demo fallbacks

### Future Enhancements
- Add official NSE/BSE corporate actions + better adjustments
- Improve sentiment with transformer-based finance NLP
- Add sector-neutral ranking and factor exposure control
- Train & bundle horizon-specific models (quantiles + classifier) in `models/`
- Automate scheduled EOD caching (for faster UI + no API rate-limit)


---

## Appendix: LLM Copilot (Big Bull 3.0) — Prompt Engineering

The project includes a local LLM agent (`analyzer/agents/bigbull_agent.py`) that:
- builds a structured context bundle (`context_bus.py`)
- queries Ollama with a persona model tag
- verifies JSON output (`verifier.py`)
- optionally calibrates confidence buckets

If Ollama is available locally, you can call it like this:


In [None]:
# --- Optional: Big Bull 3.0 local memo (requires Ollama running) ---
try:
    from analyzer.agents.bigbull_agent import query_bigbull
    # Note: We pass cached/demo prices to avoid live fetch.
    sym = resolved[0]
    out = query_bigbull(
        symbol=sym,
        prices=prices[sym],
        indicators=indicators[sym],
        fundamentals=fund_snap.get(sym),
        news=news.get(sym),
        extra_prompt="Give a concise trading/investing memo with risks and invalidation levels."
    )
    out.keys(), str(out.get("memo", ""))[:500]
except Exception as e:
    print("LLM copilot not available in this environment:", e)
