## finance_news.txt

The London Stock Exchange Group (LSEG) sold its stake in Euroclear for €455 million as part of its strategy to focus on innovation. LSEG shares remained steady and have risen so far this year, reflecting investor confidence in its strategic moves.

US stock index futures showed mixed trading ahead of the Federal Reserve's policy decision, with Dow futures down 0.02% and Nasdaq up 0.79%. Investors expect rates to remain steady, despite pressure from President Trump, while Fed officials advocate a cautious approach amid economic challenges.

Larsen & Toubro announced a 4% drop in its October to December quarter of the fiscal year ending 2025-26. The company's net profit dropped due to the impact of the central government-imposed Labour Code norms. Here's all you need to know about the company results.

Gland Pharma reported a 27.74% increase in Q3 net profit to ₹261.48 crore, up from ₹204.69 crore last year. Revenue rose 22.49% YoY to ₹1695.36 crore, with EBITDA increasing 21% YoY to ₹434.9 crore.

Elcid Investments reported a net profit of ₹47.37 crore for the December quarter, recovering from a loss of ₹7 crore last year. Revenue rose to ₹61.37 crore, up from a loss of ₹5.25 crore, despite a 65% drop from its record high.

Over the past year, fixed deposit rates across banks have fallen by 1.00%-1.50%. Against this backdrop of declining yields, the Floating Rate Savings Bond, 2020 (Taxable) is offering an interest rate of 8.05% per annum—raising the question: can it serve as a viable alternative to fixed deposits?

The Indian stock market has sharply corrected from its September 2024 peak, with Kotak Securities noting a lack of value despite declines. Key factors influencing foreign investor outflows include stretched valuations, weak earnings, and economic concerns.

Shares in India’s Adani Group plunged after U.S. prosecutors unsealed an indictment alleging securities and wire fraud. The group’s stock was hit hard as markets reacted to the legal news.

SBI Life reported a 22% YoY jump in net premium income to ₹30,245 crore. One-time premium receipts grew 24%, while renewal premiums from existing policies increased by nearly 21%, underscoring strong customer retention.

Assets under management (AUM) rose 16%, increasing from ₹4,416.8 billion as of December 31, 2024, to ₹5,117.1 billion as of December 31, 2025, with a debt–equity allocation of 59:41. Nearly 95% of the debt portfolio is invested in AAA-rated and sovereign instruments, the company said.


In [None]:
import re
import yfinance as yf
from transformers import pipeline

In [None]:
def get_ticker_symbol(company_name):
    if len(company_name) <= 1 or company_name.lower() == 'fed':
        return "N/A"
    try:
        search = yf.Search(company_name, max_results=1)
        return search.quotes[0]['symbol'] if search.quotes else "N/A"
    except:
        return "N/A"

def clean_org_name(name):
    name = name.replace("##", "")
    # filter out entities like 'E'
    if len(name) <= 1: return None
    return name

In [None]:
ner_pipe = pipeline("ner", model="dbmdz/bert-large-cased-finetuned-conll03-english", aggregation_strategy="simple")

def process_finance_news(file_path):
    final_report = {}

    with open(file_path, 'r', encoding='utf-8') as f:
        for line in f:
            clean_text = ' '.join(line.split('] ')[1:]) if ']' in line else line
            entities = ner_pipe(clean_text)

            # Clean and deduplicate organizations in the current line
            orgs = []
            for e in entities:
                if e['entity_group'] == 'ORG':
                    name = clean_org_name(e['word'])
                    if name: orgs.append(name)

            # If no valid orgs found, move to next line
            if not orgs: continue

            # Capture metrics
            val_pattern = r'([€₹$]\s*\d+\.?\d*\s*(?:crore|million|billion)?|\d+\.?\d*%)'
            metrics = {"Net Profit": "N/A", "EBITDA": "N/A", "Change": "N/A"}

            profit_match = re.search(fr'net profit (?:of|to|increased to)?\s*{val_pattern}', clean_text, re.I)
            if profit_match: metrics["Net Profit"] = profit_match.group(1)

            ebitda_match = re.search(fr'EBITDA (?:increasing|at)?\s*\d*%?\s*(?:to|at)?\s*{val_pattern}', clean_text, re.I)
            if ebitda_match: metrics["EBITDA"] = ebitda_match.group(1)

            if metrics["Net Profit"] == "N/A":
                change_match = re.search(val_pattern, clean_text)
                if change_match: metrics["Change"] = change_match.group(1)

            for org in orgs:
                ticker = get_ticker_symbol(org)
                if ticker != "N/A":
                    if ticker not in final_report or final_report[ticker]['Net Profit'] == "N/A":
                        final_report[ticker] = {"Org": org, "Net Profit": metrics["Net Profit"], "EBITDA": metrics["EBITDA"], "Change": metrics["Change"]}

    return final_report

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

Device set to use cuda:0


In [None]:
results = process_finance_news('/content/finance_news.txt')

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


In [None]:
results = process_finance_news('finance_news.txt')
print(f"{'Ticker':<10} | {'Organization':<28} | {'Net Profit':<15} | {'EBITDA':<12} | {'Change'}")
print("-" * 89)
for ticker, data in results.items():
    print(f"{ticker:<10} | {data['Org']:<28} | {data['Net Profit']:<15} | {data['EBITDA']:<12} | {data['Change']}")

Ticker     | Organization                 | Net Profit      | EBITDA       | Change
-------------------------------------------------------------------------------------
LS4C.DE    | London Stock Exchange Group  | N/A             | N/A          | €455 million
LSEG.L     | LSEG                         | N/A             | N/A          | €455 million
EFR-USD    | Federal Reserve              | N/A             | N/A          | 0.02%
LT.NS      | Larsen & Toubro              | N/A             | N/A          | 4%
GLAND.BO   | Gland Pharma                 | ₹261.48 crore   | 1%           | N/A
TDADX      | TDA                          | ₹261.48 crore   | 1%           | N/A
ELCIDIN.BO | Elcid Investments            | ₹47.37 crore    | N/A          | N/A
SBILIFE.BO | SBI Life                     | N/A             | N/A          | 22%


## Observations

The model identified most of the organisations but failed to extract:
- Dow Jones
- Nasdaq
- Kotak

The text file refers to LSEG using the abbreviation and the expanded form. The model treated them as separate strings and wrongly identified the same ORG twice. The Yahoo Finance API retrieved different regional tickers LSEG.L (London) and LS4C.DE (Frankfurt).

TDA is probably the result of the model wrongly identifying the suffix part of EBI`TDA` as an ORG and the regex logic assigned Gland Pharma's metrics to "TDA".

The Federal Reserve is a regulatory body and not a tradable stock, but the model identified the Federal Reserve as an organization. This led to Dow Jones not being recognised as an ORG. Dow Jones metrics were assigned to Federal Reserve.

Gland Pharma's `change` metric was not extracted.