# News-Sentiment DMA Screener ‚Äî README

A single-file Python tool that:

* Fetches India equity news via **Google News RSS** using **dynamic company names** loaded from **NSE index CSVs** (Nifty 50 / Next 50 / Bank / 500 / Midcap 100 / Smallcap 100).
* Filters to an **allowlist of publishers** (Moneycontrol, Economic Times, Mint/LiveMint, Business Standard, CNBC TV18) with a safe **fallback** if none match.
* Extracts **full article text** (trafilatura ‚Üí readability-lxml ‚Üí newspaper3k) + AMP/canonical cleanup.
* Runs **FinBERT** sentiment on **headlines** and **articles**.
* Applies **recency weighting** (‚â§24h = 1.5√ó, 24‚Äì48h = 1.2√ó).
* Outputs a per-ticker **bias**: `LONG` / `SHORT` / `NEUTRAL`.

---

## 1) What you get

For each ticker, the script prints a table:

| column           | meaning                                                     |
| ---------------- | ----------------------------------------------------------- |
| `symbol`         | NSE ticker with `.NS` suffix (e.g., `LUPIN.NS`)             |
| `headline_avg`   | Recency-weighted sentiment from headlines only (‚àí1..+1)     |
| `article_avg`    | Average sentiment from extracted article bodies / summaries |
| `combined_score` | Final score (prefers article if available; else headline)   |
| `bias`           | `LONG` if > +0.05, `SHORT` if < ‚àí0.05, else `NEUTRAL`       |
| `n_headlines`    | Number of headlines considered                              |
| `n_articles`     | Number of articles whose text was extracted / used          |

> Tip: If `n_articles` is 0, the script still uses headline sentiment (and, if enabled, RSS summary fallback).

---

## 2) Requirements

* Python **3.10+**
* Packages:

  * Always: `requests`, `feedparser`, `pandas`, `transformers`
  * Optional (recommended for better extraction):
    `trafilatura`, `readability-lxml`, `lxml`, `newspaper3k`
* Model: `ProsusAI/finbert` (downloaded automatically by ü§ó Transformers)
* macOS/Apple Silicon: MPS is fine (transformers prints `Device set to use mps:0`)

Install:

```bash
pip install requests feedparser pandas transformers
# optional but recommended:
pip install trafilatura readability-lxml lxml newspaper3k
```

> If you use a GPU/Metal, Transformers will auto-choose the device. No config needed.

---

## 3) How it works

1. **Dynamic company names**
   The script warms up an NSE session, downloads multiple index CSVs, and builds `NAME_MAP = {SYMBOL: "Company Name"}`.
   Example: `"LUPIN" ‚Üí "Lupin"`, `"CENTRALBK" ‚Üí "Central Bank of India"`.

2. **News fetching (Google News RSS)**
   For each ticker, it queries with:

   * `"Company Name" stock india`
   * `Company Name shares`
   * `SYMBOL stock india`
     It keeps **allowlisted publishers** if present; otherwise it **returns all** to avoid empty results.

3. **Text extraction**
   For each link, it:

   * Canonicalizes/cleans the URL (removes AMP and tracking where safe).
   * Tries `trafilatura` ‚Üí `readability` ‚Üí `newspaper3k` in order.
   * Uses article text if ‚â•120 chars; else falls back to RSS summary (if available).

4. **Sentiment & weighting**

   * Headline sentiment: recency-weighted (‚â§24h: 1.5√ó, 24‚Äì48h: 1.2√ó, else 1.0√ó).
   * Article sentiment: average of chunked body text (512-token budget heuristic).
   * Combined: `0.7 * article_avg + 0.3 * headline_avg` *if* any articles were read; otherwise `headline_avg`.

5. **Bias rule**

   * `combined_score > +0.05` ‚Üí **LONG**
   * `combined_score < ‚àí0.05` ‚Üí **SHORT**
   * otherwise **NEUTRAL**

---

## 4) Running it

Edit the `tickers` list at the bottom and run:

```bash
python news_sentiment_dma.py
```

Example:

```python
if __name__ == "__main__":
    tickers = [
        "CENTRALBK.NS",
        "LUPIN.NS",
        "UCOBANK.NS",
    ]
    sentiment_df = build_sentiment_table(tickers)
    print(sentiment_df)
```

> Note: In Python, each item in the list needs a comma. A missing comma will concatenate adjacent strings.

---

## 5) Configuration knobs

* **Allowlist domains**: update `ALLOWLIST` to tighten/loosen publisher filtering.
* **Recency weighting**: tweak `recency_weight()` thresholds/weights.
* **Neutral band**: adjust `SENTIMENT_NEUTRAL_BAND` (default 0.05).
* **Extraction threshold**: change `MIN_ARTICLE_CHARS` (default 200; logic uses 120 in the final gate).
* **Max items**: `MAX_HEADLINES`, `MAX_ARTICLES_PER_TICKER`.
* **Token budget**: `MAX_TOKENS_PER_ARTICLE` (rough 4 chars/token heuristic).
* **Dynamic names**: extend/override `DEFAULT_NSE_INDEX_URLS` or add entries to `EXTRA_NAME_MAP`.

---

## 6) Troubleshooting

* **All zeros / no headlines**

  * Your query might be too strict or network is blocked. Try printing raw rows:

    ```python
    raw_df = pd.concat([fetch_news_for_ticker("LUPIN.NS")], ignore_index=True)
    print(raw_df[["title","link","allowlisted"]])
    ```
  * If allowlist filters out everything, the script **falls back** to returning all publishers.

* **`n_articles = 0`**

  * Many finance sites are AMP/JS/paywalled; extraction can fail.
  * Lower thresholds (`MIN_ARTICLE_CHARS`), ensure optional libs are installed, and rely on RSS **summary fallback** (already enabled).

* **NSE CSV errors**

  * NSE can be finicky without cookies. The script warms up a session; re-run if a CSV fails transiently.
  * You can limit to fewer CSV URLs if needed.

* **Model errors**

  * If Transformers downloads stall, try `pip install -U transformers` and ensure internet access.

---

## 7) Extending it

* **Combine with DMA/RSI screener**
  Use `combined_score`/`bias` as a **news gate**: only consider longs where both **technicals** (DMA/RSI) and **news** are bullish.

* **Add Bing News RSS fallback**
  You can implement a second fetcher to merge Bing RSS results if Google News is sparse.

* **Recency within articles**
  Weight article paragraphs by detected timestamps or TF-IDF to emphasize fresh info.

* **Caching**
  Cache `NAME_MAP` (JSON) and news results to speed up repeated runs.

---

## 8) Notes & disclaimers

* This is **for research/education**. It‚Äôs not investment advice. Backtest before live trading.
* Respect publishers‚Äô **robots/terms**; avoid aggressive scraping.
* Sentiment models can misread sarcasm, corporate wording, or headlines that invert sentiment (e.g., ‚Äúloss narrows‚Äù). Use as one input among many.

---

## 9) Quick reference (key functions)

* `load_name_map_from_nse()` ‚Üí builds `{SYMBOL: Company}` dynamically
* `get_company_name(ticker)` ‚Üí returns company name from `NAME_MAP` for `"LUPIN.NS"`
* `fetch_news_for_ticker(ticker)` ‚Üí DataFrame of news rows for that ticker
* `analyze_ticker_news(df_news, ticker)` ‚Üí dict with sentiment & bias for 1 ticker
* `build_sentiment_table(tickers)` ‚Üí final table across tickers

---

Happy screening! If you want, I can add a **CSV export** (e.g., `sentiment_df.to_csv`) or a small **CLI** wrapper (`--tickers`, `--since`) for notebook-free runs.
