
# Trade Readiness Score (TRS) — Fundamentals × Sentiment × News

**Created:** 2025-09-09 18:17 UTC  
**Purpose:** Daily pipeline to generate a **Top‑N list** of trade candidates by fusing:
- **Fundamentals Analyst:** quality/valuation/red flags → *Q score (0–100)*
- **Sentiment Analyst:** finance + social text sentiment → *S score (−1 to +1 → scaled)*
- **News Analyst:** event/catalyst & macro risk tagging → *N score (0–100)*
- **Fusion:** `TRS = 0.45·Q + 0.35·S' + 0.20·N'`

**Outputs**
- `trs_signals.csv`: Ticker, TRS, sub-scores, price/ATR/DMA, entry/stop/targets, position size
- On-screen **Top‑N** table

> 🚦 Designed to be robust: all external sources are **optional** with graceful fallbacks.  
> If you don't set any API keys, it still runs with Google News RSS + price data.


## 0. Setup

In [12]:

# If running first time, uncomment the next cell to install requirements.
# (Keep them commented if your environment already has these packages.)
# %pip install -q --upgrade pandas numpy yfinance transformers torch scikit-learn #     feedparser beautifulsoup4 lxml requests_cache duckdb python-dateutil tqdm nltk
#
# Optional (for charts):
# %pip install -q matplotlib
#
# Note: Transformers models (FinBERT, Tweet-RoBERTa) will download on first use.
# !pip install requests_cache


## 1. Imports

In [13]:

import os, sys, math, time, json, re, gc, logging, textwrap, itertools, statistics, hashlib
from dataclasses import dataclass
from datetime import datetime, timedelta, timezone
from collections import defaultdict, Counter

import numpy as np
import pandas as pd

import requests
import requests_cache
import feedparser

import yfinance as yf

from dateutil import parser as dateparser
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.metrics import roc_auc_score

# Optional plotting (disable if headless):
try:
    import matplotlib.pyplot as plt
    HAVE_MPL = True
except Exception:
    HAVE_MPL = False

# NLP (lazy import later to speed cold start)
TRANSFORMERS_AVAILABLE = True
try:
    import torch
    from transformers import AutoTokenizer, AutoModelForSequenceClassification
except Exception as e:
    TRANSFORMERS_AVAILABLE = False

# Optional VADER fallback
try:
    import nltk
    from nltk.sentiment import SentimentIntensityAnalyzer
    nltk.download('vader_lexicon', quiet=True)
    HAVE_VADER = True
except Exception:
    HAVE_VADER = False

# Cache HTTP
requests_cache.install_cache('trs_cache', expire_after=300)  # 5 minutes

logging.basicConfig(level=logging.INFO, format='%(asctime)s | %(levelname)s | %(message)s')
logger = logging.getLogger("TRS")


[nltk_data] Error loading vader_lexicon: <urlopen error [SSL:
[nltk_data]     CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data]     unable to get local issuer certificate (_ssl.c:1000)>


## 2. Configuration

In [None]:

# ========= USER CONFIG =========

# Universe (NSE examples — you can mix in US tickers as well)
TICKERS = [
        "ATUL.NS",
    "CENTRALBK.NS",
    "COCHINSHIP.NS",
    "FACT.NS",
    "GSPL.NS",
    "HINDPETRO.NS",
    "IRFC.NS",
    "LUPIN.NS",
    "NBCC.NS",
    "TARIL.NS",
    "UCOBANK.NS"
]

# Output
TOP_N = 10
OUTPUT_CSV = "trs_signals.csv"

# Risk & Sizing
CAPITAL = 1_000_000  # total portfolio in INR (or your base currency)
RISK_PER_TRADE = 0.0075  # 0.75% of equity risk per trade
ATR_MULT_STOP = 1.5
ATR_MULT_TARGET = 2.5

# Price data
HISTORY_PERIOD = "9mo"
PRICE_INTERVAL = "1d"

# TRS Weights
W_Q = 0.45
W_S = 0.35
W_N = 0.20

# Sentiment windows
SENTIMENT_LOOKBACK_HOURS = 72  # aggregate sentiment over last 72 hours
ROLLING_SENT_PERCENTILE_DAYS = 90  # map S to rolling percentile

# Macro blackout
MACRO_BLACKOUT_HOURS = 6  # suppress signals around major macro events

# API Keys (optional)
ALPHA_VANTAGE_KEY = os.getenv("ALPHA_VANTAGE_KEY", "")   # Fundamentals + News
NEWSAPI_KEY       = os.getenv("NEWSAPI_KEY", "") 
# Reddit social is fetched via public JSON (no key), but can be flaky.

# NLP model choices (change if you like)
MODEL_FINBERT = "ProsusAI/finbert"
MODEL_TWEET   = "cardiffnlp/twitter-roberta-base-sentiment-latest"

# Region for macro events (manual list below). Set to [] to disable.
REGION = "IN"

# Macro events (manual list). You can add your local calendars here.
MANUAL_MACRO_EVENTS = [
    # Example format: ("IN", "RBI MPC Policy", "2025-10-04 10:00", "Asia/Kolkata"),
    # ("IN", "MoSPI CPI Release", "2025-09-12 17:30", "Asia/Kolkata"),
]

# ==============================


## 3. Utilities

In [15]:

def ts_now_utc():
    return datetime.now(timezone.utc)

def to_utc(dt):
    if dt.tzinfo is None:
        return dt.replace(tzinfo=timezone.utc)
    return dt.astimezone(timezone.utc)

def parse_when(s):
    try:
        return to_utc(dateparser.parse(s))
    except Exception:
        return None

def safe_pct(a, b):
    try:
        if b == 0 or pd.isna(b): return np.nan
        return 100.0 * (a - b) / b
    except Exception:
        return np.nan

def winsorize(series, lower=0.01, upper=0.99):
    if len(series) == 0:
        return series
    lo = series.quantile(lower)
    hi = series.quantile(upper)
    return series.clip(lo, hi)

def zscore(series):
    s = series.astype(float)
    return (s - s.mean()) / (s.std(ddof=0) + 1e-9)

def percentile_rank(x, vec):
    # returns 0..100 percentile rank of x within vec
    vec = np.array(vec, dtype=float)
    if len(vec) == 0 or np.all(np.isnan(vec)):
        return 50.0
    return float(np.sum(vec <= x)) / len(vec) * 100.0

def rolling_percentile(series, window=90):
    out = []
    for i in range(len(series)):
        ref = series[max(0, i - window):i+1]
        out.append(percentile_rank(series.iloc[i], ref))
    return pd.Series(out, index=series.index)

def ema(series, span=20):
    return series.ewm(span=span, adjust=False).mean()

def atr(df, period=14):
    high = df["High"]
    low = df["Low"]
    close = df["Close"]
    prev_close = close.shift(1)
    tr = pd.concat([high - low, (high - prev_close).abs(), (low - prev_close).abs()], axis=1).max(axis=1)
    return tr.rolling(period).mean()

def position_size(capital, risk_per_trade, stop_distance):
    if stop_distance <= 0 or pd.isna(stop_distance):
        return 0.0
    risk_amt = capital * risk_per_trade
    return math.floor(risk_amt / stop_distance)


## 4. Fundamentals Analyst

In [16]:

AV_BASE = "https://www.alphavantage.co/query"

def av_get(function, **params):
    if not ALPHA_VANTAGE_KEY:
        return None, "No Alpha Vantage key"
    p = dict(apikey=ALPHA_VANTAGE_KEY, function=function)
    p.update(params)
    try:
        r = requests.get(AV_BASE, params=p, timeout=30)
        r.raise_for_status()
        data = r.json()
        if "Note" in data or "Error Message" in data:
            return None, data.get("Note") or data.get("Error Message")
        return data, None
    except Exception as e:
        return None, str(e)

def fundamentals_overview(symbol):
    data, err = av_get("OVERVIEW", symbol=symbol)
    if err or not data:
        return None
    # Keep key ratios as floats where possible
    keep = [
        "EBITDA", "PERatio", "PEGRatio", "BookValue", "DividendYield", "ProfitMargin",
        "OperatingMarginTTM", "ReturnOnEquityTTM", "ReturnOnAssetsTTM", "QuarterlyEarningsGrowthYOY",
        "QuarterlyRevenueGrowthYOY", "AnalystTargetPrice", "TrailingPE", "ForwardPE",
        "PriceToBookRatio", "EVToEBITDA", "EVToRevenue", "Beta"
    ]
    out = {}
    for k in keep:
        v = data.get(k, None)
        try:
            out[k] = float(v) if v not in (None, "None", "null", "") else np.nan
        except Exception:
            out[k] = np.nan
    return out

def fundamentals_score(symbols):
    # Returns Q score per symbol (0..100), robust to missing data
    rows = []
    for s in symbols:
        row = {"symbol": s}
        f = fundamentals_overview(s)
        if f is None:
            row.update({"Q_raw": np.nan, "Q": 50.0, "Q_detail": {"note": "No AV data"}})
        else:
            # Simple composite: quality + profitability + valuation (lower EV/EBITDA better)
            prof = winsorize(pd.Series([
                f.get("ReturnOnEquityTTM"),
                f.get("ReturnOnAssetsTTM"),
                f.get("OperatingMarginTTM"),
                f.get("ProfitMargin"),
            ], dtype=float), 0.05, 0.95).mean()

            growth = winsorize(pd.Series([
                f.get("QuarterlyEarningsGrowthYOY"),
                f.get("QuarterlyRevenueGrowthYOY"),
            ], dtype=float), 0.05, 0.95).mean()

            # Valuation inverse (cheaper → better). Use EV/EBITDA + P/B if available.
            val = winsorize(pd.Series([
                -f.get("EVToEBITDA", np.nan),  # negative because lower is better
                -f.get("PriceToBookRatio", np.nan),
                -f.get("TrailingPE", np.nan)
            ], dtype=float), 0.05, 0.95).mean()

            components = [x for x in [prof, growth, val] if not pd.isna(x)]
            if len(components) == 0:
                raw = np.nan
            else:
                raw = np.nanmean(components)

            # Rank across universe
            row["Q_raw"] = raw
            row["Q_detail"] = {"prof": float(prof) if not pd.isna(prof) else None,
                               "growth": float(growth) if not pd.isna(growth) else None,
                               "val": float(val) if not pd.isna(val) else None,
                               "beta": f.get("Beta", None)}
        rows.append(row)

    df = pd.DataFrame(rows)
    if df["Q_raw"].notna().sum() >= 2:
        # Convert to 0..100 via percentile rank
        vals = df["Q_raw"].fillna(df["Q_raw"].median())
        ranks = vals.rank(pct=True)
        df["Q"] = (ranks * 100).clip(0, 100)
    else:
        df["Q"] = 50.0

    # Keep columns
    df = df[["symbol", "Q", "Q_raw", "Q_detail"]]
    return df


## 5. News Analyst

In [17]:

def google_news_rss(query, hours=72, lang="en"):
    # Build Google News RSS query (no API key required)
    url = f"https://news.google.com/rss/search?q={requests.utils.quote(query)}&hl={lang}"
    try:
        d = feedparser.parse(url)
        cutoff = ts_now_utc() - timedelta(hours=hours)
        items = []
        for e in d.entries:
            # Parse published date if present
            pub = e.get("published", "") or e.get("updated", "")
            when = parse_when(pub) or ts_now_utc()
            if when < cutoff:
                continue
            link = e.get("link", "")
            title = e.get("title", "")
            summary = e.get("summary", "")
            items.append({"title": title, "summary": summary, "link": link, "published": when})
        return items
    except Exception as e:
        logger.warning(f"Google RSS error for {query}: {e}")
        return []

def newsapi_search(query, hours=72):
    if not NEWSAPI_KEY:
        return []
    cutoff = ts_now_utc() - timedelta(hours=hours)
    url = "https://newsapi.org/v2/everything"
    params = {
        "q": query,
        "sortBy": "publishedAt",
        "language": "en",
        "pageSize": 100,
        "apiKey": NEWSAPI_KEY,
    }
    try:
        r = requests.get(url, params=params, timeout=30)
        r.raise_for_status()
        data = r.json()
        out = []
        for a in data.get("articles", []):
            when = parse_when(a.get("publishedAt")) or ts_now_utc()
            if when < cutoff: 
                continue
            out.append({
                "title": a.get("title", ""),
                "summary": a.get("description", ""),
                "link": a.get("url", ""),
                "published": when
            })
        return out
    except Exception as e:
        logger.warning(f"NewsAPI error for {query}: {e}")
        return []

def tag_news_event(text):
    t = text.lower()
    tags = []
    if any(k in t for k in ["guidance cut", "profit warning", "probe", "fraud", "litigation", "default", "insolvency", "bankruptcy", "pledge shares", "pledged shares"]):
        tags.append(("NEG_REGULATORY", -30))
    if any(k in t for k in ["downgrade", "cut to", "revised down", "misses estimates"]):
        tags.append(("NEG_ANALYST", -15))
    if any(k in t for k in ["upgrade", "raises price target", "beats estimates", "record revenue"]):
        tags.append(("POS_ANALYST", +12))
    if any(k in t for k in ["merger", "acquisition", "stake buy", "promoter buying"]):
        tags.append(("MNA", +10))
    if any(k in t for k in ["mgmt change", "resigns", "ceo resigns", "cfo resigns"]):
        tags.append(("MGMT_CHANGE", -5))
    if any(k in t for k in ["pledge", "pledged"]):
        tags.append(("PLEDGE", -10))
    return tags

def macro_blackout_multiplier(events, hours=MACRO_BLACKOUT_HOURS, region=REGION):
    # Reduce N score if close to macro events
    now = ts_now_utc()
    for (reg, name, when_str, tz) in events:
        if region and reg != region: 
            continue
        when = parse_when(when_str)
        if not when:
            continue
        if abs((when - now).total_seconds()) <= hours * 3600:
            return 0.7  # 30% penalty
    return 1.0

def news_score_for_ticker(ticker, hours=72):
    # Query building
    q1 = f"{ticker} stock"
    q2 = ticker.replace(".NS", "")  # try bare name for Indian tickers

    items = []
    items += google_news_rss(q1, hours=hours)
    items += google_news_rss(q2, hours=hours)
    items += newsapi_search(q1, hours=hours)
    items += newsapi_search(q2, hours=hours)

    # Deduplicate by title hash
    seen = set()
    uniq = []
    for it in items:
        h = hashlib.md5((it["title"] or "").encode("utf-8")).hexdigest()
        if h not in seen:
            seen.add(h)
            uniq.append(it)

    # Sentiment via FinBERT (headline-level)
    s_score = 0.0
    n = 0
    tags = []
    texts = []
    for it in uniq:
        title = it.get("title") or ""
        summary = it.get("summary") or ""
        texts.append(title + ". " + summary)
        tags.extend(tag_news_event(title + " " + summary))
    # Sentiment model run
    if len(texts) > 0:
        s = classify_finbert(texts)  # returns list in -1..+1
        if len(s) > 0:
            s_score = float(np.mean(s))
            n = len(s)
    # Base N score 50 + 40*sentiment, then apply tag adjustments and macro penalty
    base = 50.0 + 40.0 * s_score
    for (tag, delta) in tags:
        base += delta
    base = float(np.clip(base, 0, 100))

    # Macro blackout penalty
    mult = macro_blackout_multiplier(MANUAL_MACRO_EVENTS, hours=MACRO_BLACKOUT_HOURS, region=REGION)
    base *= mult
    base = float(np.clip(base, 0, 100))

    detail = {
        "headline_count": int(n),
        "avg_headline_sent": float(s_score),
        "tags": tags,
        "macro_mult": mult
    }
    return base, detail


## 6. Sentiment Analyst (Finance + Social)

In [18]:

_FINBERT = {"tok": None, "model": None}
_TWEET  = {"tok": None, "model": None}
_VADER  = {"sid": None}

def _load_finbert():
    if _FINBERT["tok"] is None and TRANSFORMERS_AVAILABLE:
        _FINBERT["tok"] = AutoTokenizer.from_pretrained(MODEL_FINBERT)
        _FINBERT["model"] = AutoModelForSequenceClassification.from_pretrained(MODEL_FINBERT)
    return _FINBERT["tok"], _FINBERT["model"]

def _load_tweet():
    if _TWEET["tok"] is None and TRANSFORMERS_AVAILABLE:
        _TWEET["tok"] = AutoTokenizer.from_pretrained(MODEL_TWEET)
        _TWEET["model"] = AutoModelForSequenceClassification.from_pretrained(MODEL_TWEET)
    return _TWEET["tok"], _TWEET["model"]

def _load_vader():
    if _VADER["sid"] is None and HAVE_VADER:
        _VADER["sid"] = SentimentIntensityAnalyzer()
    return _VADER["sid"]

def softmax(x):
    e = np.exp(x - np.max(x, axis=-1, keepdims=True))
    return e / np.sum(e, axis=-1, keepdims=True)

def classify_finbert(texts):
    # Returns list of sentiment scores in [-1, +1] using FinBERT (pos-neg)
    try:
        tok, model = _load_finbert()
        if tok is None or model is None:
            raise RuntimeError("Transformers not available; using VADER fallback")
        enc = tok(texts, return_tensors="pt", padding=True, truncation=True, max_length=128)
        with torch.no_grad():
            out = model(**enc).logits.numpy()
        probs = softmax(out)
        # FinBERT labels: 0=negative, 1=neutral, 2=positive (ProsusAI/finbert)
        score = probs[:,2] - probs[:,0]
        return score.tolist()
    except Exception as e:
        # Fallback to VADER compound
        sid = _load_vader()
        if sid is None:
            return [0.0] * len(texts)
        out = []
        for t in texts:
            out.append(sid.polarity_scores(t).get("compound", 0.0))
        return out

def classify_tweet_roberta(texts):
    # Returns [-1,+1] using cardiffnlp/twitter-roberta-base-sentiment-latest
    try:
        tok, model = _load_tweet()
        if tok is None or model is None:
            raise RuntimeError("Transformers not available; using VADER fallback")
        enc = tok(texts, return_tensors="pt", padding=True, truncation=True, max_length=128)
        with torch.no_grad():
            out = model(**enc).logits.numpy()
        probs = softmax(out)
        # labels: 0=negative, 1=neutral, 2=positive
        score = probs[:,2] - probs[:,0]
        return score.tolist()
    except Exception as e:
        sid = _load_vader()
        if sid is None:
            return [0.0] * len(texts)
        out = []
        for t in texts:
            out.append(sid.polarity_scores(t).get("compound", 0.0))
        return out

def reddit_search_posts(sub, query, hours=72, limit=50):
    # Public JSON search; may rate-limit. Use headers.
    url = f"https://www.reddit.com/r/{sub}/search.json"
    params = {"q": query, "restrict_sr": "1", "sort": "new", "t": "week", "limit": str(limit)}
    headers = {"User-Agent": "Mozilla/5.0 TRS-Agent/1.0"}
    try:
        r = requests.get(url, params=params, headers=headers, timeout=30)
        r.raise_for_status()
        js = r.json()
        out = []
        cutoff = ts_now_utc() - timedelta(hours=hours)
        for c in js.get("data", {}).get("children", []):
            d = c.get("data", {})
            created = datetime.fromtimestamp(d.get("created_utc", time.time()), tz=timezone.utc)
            if created < cutoff: 
                continue
            title = d.get("title", "")
            selftext = d.get("selftext", "")
            score = d.get("score", 0)
            num_comments = d.get("num_comments", 0)
            out.append({"title": title, "body": selftext, "created": created, "score": score, "num_comments": num_comments})
        return out
    except Exception as e:
        logger.warning(f"Reddit error for r/{sub} {query}: {e}")
        return []

REDDIT_SUBS = ["IndianStreetBets", "IndianStockMarket", "stocks"]

def social_sentiment_for_ticker(ticker, hours=72):
    q = ticker.replace(".NS","")
    texts = []
    for sub in REDDIT_SUBS:
        posts = reddit_search_posts(sub, q, hours=hours, limit=40)
        for p in posts:
            t = (p["title"] + " " + (p["body"] or "")).strip()
            if len(t) > 0:
                texts.append(t)
    if len(texts) == 0:
        return 0.0, {"posts": 0, "avg": 0.0}
    s = classify_tweet_roberta(texts)
    if len(s) == 0:
        return 0.0, {"posts": 0, "avg": 0.0}
    # Weighted by engagement proxy (we don't have engagement here -> simple mean)
    avg = float(np.mean(s))
    return avg, {"posts": int(len(s)), "avg": avg}


## 7. Price/TA & TRS Fusion

In [19]:

def fetch_price(ticker, period=HISTORY_PERIOD, interval=PRICE_INTERVAL):
    try:
        df = yf.download(ticker, period=period, interval=interval, auto_adjust=False, progress=False, multi_level_index=False)
        if df is None or df.empty:
            return None
        df = df.rename_axis("Date").reset_index()
        if "Adj Close" not in df.columns:
            df["Adj Close"] = df["Close"]
        return df
    except Exception as e:
        logger.warning(f"Price fetch error {ticker}: {e}")
        return None

def enrich_ta(df):
    df = df.copy()
    df["SMA50"] = df["Close"].rolling(50).mean()
    df["SMA200"] = df["Close"].rolling(200).mean()
    df["ATR14"] = atr(df.set_index("Date"), 14).values
    return df

def map_sentiment_to_percentile_series(scores_series, window_days=ROLLING_SENT_PERCENTILE_DAYS):
    # Map raw sentiment (-1..+1) to 0..100 using rolling percentile on its own history
    # If not enough history, linearly map [-1..+1] -> [0..100].
    if len(scores_series) < 5:
        return 50.0 + 50.0 * scores_series
    rp = rolling_percentile(scores_series, window=window_days)
    return rp

def compute_trs_row(sym, q_score, s_raw, n_score, price_df):
    # s_raw is -1..+1, convert to percentile 0..100 using history if available
    s_prime = 50.0 + 50.0 * s_raw  # fallback
    # If we had a time series of S, we'd do rolling percentile; here we do static map.
    trs = W_Q*q_score + W_S*s_prime + W_N*n_score
    trs = float(np.clip(trs, 0, 100))

    latest = price_df.iloc[-1]
    close = float(latest["Close"])
    sma50 = float(latest["SMA50"]) if not pd.isna(latest["SMA50"]) else np.nan
    sma200 = float(latest["SMA200"]) if not pd.isna(latest["SMA200"]) else np.nan
    atr14 = float(latest["ATR14"]) if not pd.isna(latest["ATR14"]) else np.nan

    bias_up = (not pd.isna(sma50) and close > sma50) and (not pd.isna(sma200) and close > sma200)

    # Entry suggestion & risk
    direction = "LONG" if trs >= 70 and bias_up else ("AVOID/SHORT" if trs <= 35 else "WATCH")
    stop = close - ATR_MULT_STOP * atr14 if direction == "LONG" else (close + ATR_MULT_STOP*atr14 if direction=="AVOID/SHORT" else np.nan)
    tgt  = close + ATR_MULT_TARGET * atr14 if direction == "LONG" else (close - ATR_MULT_TARGET*atr14 if direction=="AVOID/SHORT" else np.nan)
    shares = position_size(CAPITAL, RISK_PER_TRADE, abs(close - stop)) if not pd.isna(stop) else 0

    return {
        "symbol": sym,
        "TRS": trs,
        "Q": float(q_score),
        "S_prime": float(s_prime),
        "S_raw": float(s_raw),
        "N": float(n_score),
        "Close": close,
        "SMA50": sma50,
        "SMA200": sma200,
        "ATR14": atr14,
        "BiasUp": bool(bias_up),
        "Direction": direction,
        "Entry": close,
        "Stop": float(stop) if not pd.isna(stop) else np.nan,
        "Target": float(tgt) if not pd.isna(tgt) else np.nan,
        "PositionSize": int(shares)
    }


## 8. Run pipeline

In [20]:

def run_trs_pipeline(symbols):
    # Fundamentals
    dfQ = fundamentals_score(symbols)
    qmap = {r.symbol: r.Q for r in dfQ.itertuples(index=False)}

    rows = []
    for sym in symbols:
        logger.info(f"Processing {sym}")
        # Price
        pr = fetch_price(sym)
        if pr is None or pr.empty:
            logger.warning(f"No price data for {sym}; skipping.")
            continue
        pr = enrich_ta(pr)

        # Sentiment (news + social)
        n_score, n_detail = news_score_for_ticker(sym, hours=SENTIMENT_LOOKBACK_HOURS)
        s_social, s_detail = social_sentiment_for_ticker(sym, hours=SENTIMENT_LOOKBACK_HOURS)

        # Fuse S: combine news headline sentiment and social (simple average)
        # Note: news_score_for_ticker returns N, but also has avg headline sentiment in [-1,+1]
        # We can recompute a news-only sentiment by re-running classify on titles; here we mix social only.
        # For robustness, you can fetch the news texts from news_score_for_ticker again.
        s_raw = s_social  # keep it simple; news contributes via N

        # Q score
        q = qmap.get(sym, 50.0)

        # TRS row
        row = compute_trs_row(sym, q, s_raw, n_score, pr)
        # Add details
        row["NewsDetail"] = n_detail
        row["SocialDetail"] = s_detail
        rows.append(row)

    out = pd.DataFrame(rows).sort_values("TRS", ascending=False).reset_index(drop=True)
    return out

signals = run_trs_pipeline(TICKERS)
signals.head(20)


2025-09-16 23:31:04,960 | INFO | Processing ATUL.NS
2025-09-16 23:31:10,851 | INFO | Processing CENTRALBK.NS
2025-09-16 23:31:22,855 | INFO | Processing COCHINSHIP.NS
2025-09-16 23:31:25,414 | INFO | Processing FACT.NS
Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
2025-09-16 23:31:35,560 | INFO | Processing GSPL.NS
2025-09-16 23:

Unnamed: 0,symbol,TRS,Q,S_prime,S_raw,N,Close,SMA50,SMA200,ATR14,BiasUp,Direction,Entry,Stop,Target,PositionSize,NewsDetail,SocialDetail
0,FACT.NS,56.99603,50.0,71.06172,0.421234,48.122142,1005.5,961.866998,,32.217865,False,WATCH,1005.5,,,0,"{'headline_count': 98, 'avg_headline_sent': 0....","{'posts': 2, 'avg': 0.4212344028055668}"
1,NBCC.NS,51.535402,50.0,75.67493,0.513499,12.745881,109.589996,107.7838,,2.790715,False,WATCH,109.589996,,,0,"{'headline_count': 1, 'avg_headline_sent': -0....","{'posts': 1, 'avg': 0.5134986042976379}"
2,ATUL.NS,50.560543,50.0,50.0,0.0,52.802715,6467.0,6657.31,,115.107143,False,WATCH,6467.0,,,0,"{'headline_count': 8, 'avg_headline_sent': 0.0...","{'posts': 0, 'avg': 0.0}"
3,CENTRALBK.NS,50.0,50.0,50.0,0.0,50.0,36.939999,36.564,,0.685714,False,WATCH,36.939999,,,0,"{'headline_count': 0, 'avg_headline_sent': 0.0...","{'posts': 0, 'avg': 0.0}"
4,COCHINSHIP.NS,50.0,50.0,50.0,0.0,50.0,1821.599976,1761.169995,,64.742885,False,WATCH,1821.599976,,,0,"{'headline_count': 0, 'avg_headline_sent': 0.0...","{'posts': 0, 'avg': 0.0}"
5,GSPL.NS,50.0,50.0,50.0,0.0,50.0,319.600006,311.624998,,8.907144,False,WATCH,319.600006,,,0,"{'headline_count': 0, 'avg_headline_sent': 0.0...","{'posts': 0, 'avg': 0.0}"
6,HINDPETRO.NS,50.0,50.0,50.0,0.0,50.0,402.0,408.461,,7.467856,False,WATCH,402.0,,,0,"{'headline_count': 0, 'avg_headline_sent': 0.0...","{'posts': 0, 'avg': 0.0}"
7,TARIL.NS,50.0,50.0,50.0,0.0,50.0,523.349976,506.225001,,14.089288,False,WATCH,523.349976,,,0,"{'headline_count': 0, 'avg_headline_sent': 0.0...","{'posts': 0, 'avg': 0.0}"
8,UCOBANK.NS,50.0,50.0,50.0,0.0,50.0,29.870001,29.6426,,0.630715,False,WATCH,29.870001,,,0,"{'headline_count': 0, 'avg_headline_sent': 0.0...","{'posts': 0, 'avg': 0.0}"
9,LUPIN.NS,48.840582,50.0,50.0,0.0,44.202909,2051.800049,1940.939995,,39.078587,False,WATCH,2051.800049,,,0,"{'headline_count': 4, 'avg_headline_sent': -0....","{'posts': 0, 'avg': 0.0}"


## 9. Save signals & show Top‑N

In [21]:

signals.to_csv(OUTPUT_CSV, index=False)
print(f"Saved to {OUTPUT_CSV}")

topn = signals.head(TOP_N).copy()
display_cols = ["symbol","TRS","Direction","Entry","Stop","Target","PositionSize","Q","S_prime","N","SMA50","SMA200","ATR14","BiasUp"]
topn[display_cols]


Saved to trs_signals.csv


Unnamed: 0,symbol,TRS,Direction,Entry,Stop,Target,PositionSize,Q,S_prime,N,SMA50,SMA200,ATR14,BiasUp
0,FACT.NS,56.99603,WATCH,1005.5,,,0,50.0,71.06172,48.122142,961.866998,,32.217865,False
1,NBCC.NS,51.535402,WATCH,109.589996,,,0,50.0,75.67493,12.745881,107.7838,,2.790715,False
2,ATUL.NS,50.560543,WATCH,6467.0,,,0,50.0,50.0,52.802715,6657.31,,115.107143,False
3,CENTRALBK.NS,50.0,WATCH,36.939999,,,0,50.0,50.0,50.0,36.564,,0.685714,False
4,COCHINSHIP.NS,50.0,WATCH,1821.599976,,,0,50.0,50.0,50.0,1761.169995,,64.742885,False
5,GSPL.NS,50.0,WATCH,319.600006,,,0,50.0,50.0,50.0,311.624998,,8.907144,False
6,HINDPETRO.NS,50.0,WATCH,402.0,,,0,50.0,50.0,50.0,408.461,,7.467856,False
7,TARIL.NS,50.0,WATCH,523.349976,,,0,50.0,50.0,50.0,506.225001,,14.089288,False
8,UCOBANK.NS,50.0,WATCH,29.870001,,,0,50.0,50.0,50.0,29.6426,,0.630715,False
9,LUPIN.NS,48.840582,WATCH,2051.800049,,,0,50.0,50.0,44.202909,1940.939995,,39.078587,False


## 10. (Optional) Quick ATR backtest on latest signals

In [22]:

# A very simple forward-simulation template:
# - Enter on next day open (not executed here)
# - Exit at stop/target if hit intraday, or after max_hold days
# This is just a template; for realistic testing, you need walk-forward TRS recomputation.

def quick_backtest_template(top_df, hold_days=10):
    results = []
    for r in top_df.itertuples(index=False):
        sym = r.symbol
        df = fetch_price(sym, period="6mo", interval="1d")
        if df is None or len(df) < 50:
            continue
        # Entry assumed at last available Close for illustration
        entry_idx = df.index[-1]
        entry_price = float(df.loc[entry_idx, "Close"])
        atr14 = float(atr(df.set_index("Date"), 14).iloc[-1])
        if math.isnan(atr14) or atr14 <= 0:
            continue
        stop = entry_price - ATR_MULT_STOP*atr14 if r.Direction=="LONG" else entry_price + ATR_MULT_STOP*atr14
        tgt  = entry_price + ATR_MULT_TARGET*atr14 if r.Direction=="LONG" else entry_price - ATR_MULT_TARGET*atr14

        # Simulate next 'hold_days' bars (imperfect because we need future data)
        # Here, we simply compute hypothetical exit if we had that data. Template only.
        results.append({
            "symbol": sym, "entry": entry_price, "stop": stop, "target": tgt,
            "hypo_R": (tgt - entry_price) / (entry_price - stop) if r.Direction=="LONG" else (entry_price - tgt) / (stop - entry_price)
        })
    return pd.DataFrame(results)

# Example run (commented to avoid confusion; this is illustrative only)
# bt = quick_backtest_template(signals.head(TOP_N))
# bt



## Tips & Next Steps
- **Improve fundamentals:** If you trade Indian equities, Alpha Vantage coverage can be sparse. Consider paid sources (FMP, TickerTape/Trendlyne APIs) or broker research dumps to populate Q metrics more reliably.
- **Better news features:** Keep a per-ticker headline cache and compute a **rolling S' percentile** over 90 days to stabilize sentiment.
- **Macro calendar:** Wire a real macro API (TradingEconomics, etc.) or maintain a small CSV of key **RBI/MoSPI/Fed/ECB** dates to drive the blackout multiplier.
- **Execution & slippage:** Integrate your broker’s API (Groww/Kite/etc.) and include slippage/fees in the sizing + backtest.
- **Model ensembling:** Blend FinBERT with Loughran–McDonald lexicon deltas and Tweet‑RoBERTa; calibrate weights on validation PnL, not just AUC.
- **Walk-forward backtest:** Recompute TRS daily from **only data available at that time** and evaluate out-of-sample performance.
