#### Saturday, February 14, 2026

This all runs in one pass.

# 05 - Local LLM Financial Analysis on Your RTX 4090

Your 4090 with 24GB VRAM can run powerful language models **locally** -- no API costs,
no rate limits, no data leaving your machine. This notebook shows how to use them
for financial document analysis.

---

## What You'll Build

1. **Load a Local LLM** - Run Phi-3 or Mistral entirely on your GPU
2. **Earnings Call Analysis** - Summarize and extract key info from transcripts
3. **SEC Filing Parser** - Pull actionable data from 10-K/10-Q filings
4. **Financial Q&A** - Ask natural language questions about company data
5. **News Digest** - Summarize batches of headlines into a morning brief
6. **Risk Factor Extraction** - Identify key risks from filings
7. **Model Comparison** - Benchmark different LLMs for financial tasks
8. **Structured Output** - Get JSON-formatted analysis for pipeline integration
9. **Combined Intelligence** - Merge LLM analysis with sentiment + technicals + Chronos

---

## Models That Fit on Your 4090

| Model | Params | VRAM (fp16) | VRAM (4-bit) | Quality |
|-------|--------|-------------|-------------|--------|
| **microsoft/Phi-3-mini-4k-instruct** | 3.8B | ~7.5 GB | ~2.5 GB | Great for its size |
| **microsoft/Phi-3.5-mini-instruct** | 3.8B | ~7.5 GB | ~2.5 GB | Improved Phi-3 |
| **meta-llama/Llama-3.2-3B-Instruct** | 3B | ~6 GB | ~2 GB | Excellent reasoning |
| **mistralai/Mistral-7B-Instruct-v0.3** | 7B | ~14 GB | ~4.5 GB | Strong all-around |
| **meta-llama/Llama-3.1-8B-Instruct** | 8B | ~16 GB | ~5 GB | Very capable |
| **mistralai/Mistral-Nemo-Instruct-2407** | 12B | N/A (too big fp16) | ~7.5 GB | Powerful 4-bit |

We'll primarily use **Phi-3-mini** for speed and **Mistral-7B** (4-bit quantized) for quality.

---
## 1. Setup & Model Loading

In [1]:
import torch
import time
import json
import re
import warnings
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    pipeline,
)

warnings.filterwarnings('ignore')
plt.style.use('dark_background')

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Device: {device}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    total_vram = torch.cuda.get_device_properties(0).total_memory / 1024**3
    print(f"VRAM: {total_vram:.1f} GB")

Device: cuda
GPU: NVIDIA GeForce RTX 4090
VRAM: 23.5 GB


In [2]:
# Load Phi-3 Mini -- fast, capable, fits easily on the 4090
# First run downloads ~7.5GB. Cached at ~/.cache/huggingface/ after that.

MODEL_ID = "microsoft/Phi-3-mini-4k-instruct"

print(f"Loading {MODEL_ID}...")
print("(First run downloads the model. Subsequent runs use cache.)\n")

t0 = time.time()
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    dtype=torch.float16,
    device_map="cuda",
)
load_time = time.time() - t0

mem_used = torch.cuda.memory_allocated(0) / 1024**3
print(f"Model loaded in {load_time:.1f}s")
print(f"GPU memory: {mem_used:.1f} GB / {total_vram:.1f} GB ({mem_used/total_vram*100:.0f}%)")
print(f"Remaining VRAM: {total_vram - mem_used:.1f} GB (plenty for inference)")

Loading microsoft/Phi-3-mini-4k-instruct...
(First run downloads the model. Subsequent runs use cache.)



Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Model loaded in 5.4s
GPU memory: 7.1 GB / 23.5 GB (30%)
Remaining VRAM: 16.4 GB (plenty for inference)


In [3]:
def ask_llm(prompt, system_prompt=None, max_new_tokens=512, temperature=0.3):
    """
    Send a prompt to the local LLM and get a response.
    
    Parameters:
        prompt: The user question/instruction
        system_prompt: Optional system-level instruction
        max_new_tokens: Maximum length of response
        temperature: 0.0 = deterministic, 1.0 = creative
    
    Returns:
        The model's text response
    """
    messages = []
    if system_prompt:
        messages.append({"role": "system", "content": system_prompt})
    messages.append({"role": "user", "content": prompt})
    
    # Apply chat template
    input_text = tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )
    inputs = tokenizer(input_text, return_tensors="pt").to(device)
    
    t0 = time.time()
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=temperature,
            do_sample=temperature > 0,
            top_p=0.9,
            pad_token_id=tokenizer.eos_token_id,
        )
    elapsed = time.time() - t0
    
    # Decode only the new tokens (skip the input)
    new_tokens = outputs[0][inputs['input_ids'].shape[1]:]
    response = tokenizer.decode(new_tokens, skip_special_tokens=True)
    
    tokens_generated = len(new_tokens)
    tokens_per_sec = tokens_generated / elapsed if elapsed > 0 else 0
    
    return response, {'time': elapsed, 'tokens': tokens_generated, 'tok_per_sec': tokens_per_sec}


# Quick test
response, stats = ask_llm("What are the three most important financial ratios for evaluating a stock? Be concise.")
print(f"Response ({stats['tokens']} tokens in {stats['time']:.1f}s, {stats['tok_per_sec']:.0f} tok/s):\n")
print(response)

Response (222 tokens in 4.9s, 45 tok/s):

The three most important financial ratios for evaluating a stock are:

1. Price-to-Earnings (P/E) Ratio: This ratio compares the current market price of a stock to its earnings per share (EPS). A lower P/E ratio may indicate that the stock is undervalued, while a higher P/E ratio may suggest overvaluation.

2. Debt-to-Equity (D/E) Ratio: This ratio measures a company's financial leverage by comparing its total debt to its shareholders' equity. A lower D/E ratio indicates that the company is using less debt to finance its operations, which may be a sign of financial stability.

3. Return on Equity (ROE): This ratio measures a company's profitability by comparing its net income to its shareholders' equity. A higher ROE indicates that the company is generating more profit per dollar of equity, which may be a sign of efficient management.


---
## 2. Earnings Call Analysis

Earnings calls are goldmines of information. The CEO and CFO discuss:
- Revenue, margins, and guidance
- Strategic direction and new products
- Risks and challenges
- Management tone (confident vs cautious)

A local LLM can process these transcripts in seconds.

In [4]:
# Simulated earnings call excerpt (in practice, use APIs like Financial Modeling Prep,
# Seeking Alpha, or scrape from SEC EDGAR)

EARNINGS_TRANSCRIPT = """
NVIDIA Corporation Q4 FY2024 Earnings Call Excerpt

Jensen Huang, CEO:
Thank you. Q4 was an extraordinary quarter. Revenue was $22.1 billion, up 265% year-over-year 
and up 22% sequentially. Data center revenue was $18.4 billion, up 409% year-over-year. 
The demand for accelerated computing and generative AI has driven a significant step-up in 
investment by cloud service providers, large enterprises, and sovereign AI infrastructure.

Our Hopper architecture continues to see incredible adoption. H100 demand remains extremely 
strong, and we're ramping production of our next-generation Blackwell platform. We expect 
Blackwell to generate significant revenue in the second half of fiscal 2025.

Gaming revenue was $2.9 billion, up 56% year-over-year, driven by GeForce RTX 40 series GPUs. 
Our RTX technology is becoming the standard for PC gaming and content creation.

Colette Kress, CFO:
Gross margin for the quarter was 76%, reflecting strong pricing in data center products. 
We expect gross margins to remain in the mid-70s percent range for the next several quarters. 
Operating expenses were $3.2 billion, up 26% year-over-year as we invest in research and 
development for our AI platform.

For Q1 FY2025, we expect revenue of approximately $24 billion, plus or minus 2%. We continue 
to see strong demand across all our data center products. The pipeline for Blackwell is 
already several billion dollars.

Supply remains tight relative to demand. We are working closely with our manufacturing partners 
to increase production capacity. We expect supply constraints to gradually ease through the 
second half of fiscal 2025.

Q&A Highlights:
- Analyst: What is the competitive landscape for AI accelerators?
  Jensen: We have a significant moat through our CUDA software ecosystem. Over 4 million 
  developers use CUDA. Our competitors would need to replicate not just the hardware, but 
  the entire software stack that has been built over 15 years.

- Analyst: What about China revenue impact from export controls?
  Colette: China data center revenue was significant in prior years. The export controls 
  have reduced our addressable market in China. However, demand from other regions has more 
  than offset this impact. We are developing compliant products for the China market.

- Analyst: How should we think about capital allocation?
  Colette: We returned $2.8 billion to shareholders through buybacks and dividends this 
  quarter. We plan to continue our balanced approach to capital allocation, investing in 
  growth while returning capital to shareholders.
"""

print(f"Transcript length: {len(EARNINGS_TRANSCRIPT)} characters")
print(f"Approximately {len(EARNINGS_TRANSCRIPT.split())} words")

Transcript length: 2591 characters
Approximately 386 words


In [5]:
# Analysis 1: Executive Summary
SYSTEM_PROMPT = """You are a financial analyst assistant. Analyze earnings call transcripts 
and provide clear, actionable insights for day traders. Be concise and focus on information 
that impacts stock price."""

prompt = f"""Analyze this earnings call transcript and provide:
1. A 2-3 sentence executive summary
2. Key numbers (revenue, margins, guidance)
3. Bull case (reasons the stock could go up)
4. Bear case (reasons the stock could go down)
5. Overall sentiment: BULLISH, BEARISH, or NEUTRAL

Transcript:
{EARNINGS_TRANSCRIPT}"""

print("Analyzing earnings call...\n")
response, stats = ask_llm(prompt, system_prompt=SYSTEM_PROMPT, max_new_tokens=600)
print(f"[{stats['time']:.1f}s, {stats['tok_per_sec']:.0f} tok/s]\n")
print(response)

Analyzing earnings call...

[5.6s, 50 tok/s]

Executive Summary:
NVIDIA's Q4 FY2024 earnings show a robust 265% year-over-year revenue increase to $22.1 billion, with data center revenue surging by 409%. The company's Hopper architecture and RTX technology continue to drive strong demand.

Key Numbers:
- Revenue: $22.1 billion (265% YoY increase)
- Data Center Revenue: $18.4 billion (409% YoY increase)
- Gross Margin: 76%
- Operating Expenses: $3.2 billion (26% YoY increase)
- Q1 FY2025 Revenue Guidance: ~$24 billion

Bull Case:
The stock could rise due to the strong demand for NVIDIA's AI and data center products, the expected revenue from the Blackwell platform, and the company's solid gross margins.

Bear Case:
Potential downside could come from supply constraints easing later in fiscal 2025, which may affect future revenue growth.

Overall Sentiment: BULLISH




In [6]:
# Analysis 2: Extract key metrics as structured data
prompt_metrics = f"""Extract the following financial metrics from this earnings call transcript.
Return ONLY a JSON object with these fields (use null if not mentioned):

- quarterly_revenue
- revenue_yoy_growth_pct
- gross_margin_pct
- data_center_revenue
- gaming_revenue
- next_quarter_guidance
- operating_expenses
- shareholder_returns
- key_product_mentions (list of strings)
- management_tone (one of: very_confident, confident, cautious, concerned)

Transcript:
{EARNINGS_TRANSCRIPT}"""

print("Extracting structured metrics...\n")
response_json, stats = ask_llm(prompt_metrics, max_new_tokens=400, temperature=0.1)
print(f"[{stats['time']:.1f}s]\n")
print(response_json)

# Try to parse the JSON
try:
    # Find JSON block in response
    json_match = re.search(r'\{[^{}]*(?:\{[^{}]*\}[^{}]*)*\}', response_json, re.DOTALL)
    if json_match:
        metrics = json.loads(json_match.group())
        print("\n--- Parsed Metrics ---")
        for k, v in metrics.items():
            print(f"  {k}: {v}")
except (json.JSONDecodeError, AttributeError) as e:
    print(f"\nNote: Could not auto-parse JSON ({e}). This is common with smaller models.")
    print("Larger models (Mistral-7B, Llama-8B) are more reliable at structured output.")

Extracting structured metrics...

[4.4s]

```json
{
  "quarterly_revenue": 22.1,
  "revenue_yoy_growth_pct": 265,
  "gross_margin_pct": 76,
  "data_center_revenue": 18.4,
  "gaming_revenue": 2.9,
  "next_quarter_guidance": 24,
  "operating_expenses": 3.2,
  "shareholder_returns": 2.8,
  "key_product_mentions": [
    "Hopper architecture",
    "H100",
    "Blackwell platform",
    "GeForce RTX 40 series GPUs",
    "CUDA software ecosystem"
  ],
  "management_tone": "very_confident"
}
```

--- Parsed Metrics ---
  quarterly_revenue: 22.1
  revenue_yoy_growth_pct: 265
  gross_margin_pct: 76
  data_center_revenue: 18.4
  gaming_revenue: 2.9
  next_quarter_guidance: 24
  operating_expenses: 3.2
  shareholder_returns: 2.8
  key_product_mentions: ['Hopper architecture', 'H100', 'Blackwell platform', 'GeForce RTX 40 series GPUs', 'CUDA software ecosystem']
  management_tone: very_confident


In [7]:
# Analysis 3: Management tone and sentiment cues
prompt_tone = f"""Analyze the management tone in this earnings call. Focus on:

1. Confidence level: Are executives confident or hedging?
2. Forward guidance language: Strong commitments vs vague promises?
3. Risk acknowledgment: Are they transparent about challenges?
4. Key phrases that signal bullish or bearish intent
5. Compare CEO tone vs CFO tone

Quote specific phrases from the transcript to support your analysis.

Transcript:
{EARNINGS_TRANSCRIPT}"""

print("Analyzing management tone...\n")
response_tone, stats = ask_llm(prompt_tone, system_prompt=SYSTEM_PROMPT, max_new_tokens=500)
print(f"[{stats['time']:.1f}s]\n")
print(response_tone)

Analyzing management tone...

[9.9s]

1. Confidence Level:
   - Jensen Huang's tone is highly confident, as evidenced by phrases like "Q4 was an extraordinary quarter" and "demand for accelerated computing and generative AI has driven a significant step-up in investment."
   - Colette Kress also conveys confidence but with a focus on financial details, stating "gross margin for the quarter was 76%" and "we expect gross margins to remain in the mid-70s percent range."

2. Forward Guidance Language:
   - Jensen Huang provides strong commitments with "We expect Blackwell to generate significant revenue in the second half of fiscal 2025."
   - Colette Kress offers a more cautious approach, with "For Q1 FY2025, we expect revenue of approximately $24 billion, plus or minus 2%."

3. Risk Acknowledgment:
   - Colette Kress acknowledges supply constraints, "Supply remains tight relative to demand," and mentions working to increase production capacity.
   - Jensen Huang addresses the competitive

---
## 3. SEC Filing Analysis

SEC filings (10-K annual, 10-Q quarterly, 8-K events) contain critical information
that moves stock prices. They're also extremely long and dense -- perfect for LLM analysis.

In [8]:
# Simulated SEC 10-K Risk Factors excerpt
# In production, use the SEC EDGAR API: https://efts.sec.gov/LATEST/search-index?q=...

SEC_RISK_FACTORS = """
RISK FACTORS (Excerpt from NVIDIA 10-K Filing)

Risks Related to Our Business and Industry

Our operating results have in the past fluctuated and may in the future fluctuate, and if 
our operating results are below the expectations of securities analysts or investors, our 
stock price could decline.

We derive a significant portion of our revenue from a limited number of customers and various 
different end markets. Revenue concentration in a small number of customers or different end 
markets may cause significant fluctuations in our results. Sales to our top customers 
represented approximately 45% of total revenue. Loss of a major customer or a significant 
reduction in purchases by any one of them could materially adversely affect our results.

The semiconductor industry is highly competitive. We face competition from companies such as 
AMD, Intel, and various other chip designers and foundries. Some competitors have greater 
financial resources, more established customer relationships, and broader product portfolios. 
Additionally, major cloud service providers are developing their own AI accelerator chips, 
which could reduce their reliance on our products.

Export controls and trade restrictions, particularly those related to China and other 
countries, have adversely affected and could in the future adversely affect our business. 
The U.S. government has implemented export controls on advanced AI chips to certain countries. 
China represented approximately 20-25% of our data center revenue in prior periods, and these 
restrictions have materially reduced our revenue from China.

Our products are complex and may contain defects or may be subject to security vulnerabilities 
that could harm our reputation and adversely affect our business. Product defects or security 
vulnerabilities could result in significant warranty or other costs, damage our reputation, 
and lead to loss of customers.

We depend on third-party foundries, primarily TSMC, for the manufacture of our products. 
Any disruption at TSMC, whether from natural disaster, geopolitical tension regarding Taiwan, 
or capacity constraints, could significantly impact our ability to meet demand and adversely 
affect our business.

The AI market is rapidly evolving, and our success depends on our ability to anticipate 
customer needs and develop appropriate solutions. If the adoption of AI technologies is 
slower than expected, or if alternative approaches to AI computing emerge that do not favor 
our platform, our growth could be materially impacted.
"""

print(f"SEC filing excerpt: {len(SEC_RISK_FACTORS.split())} words")

SEC filing excerpt: 374 words


In [9]:
# Analyze risk factors
prompt_risks = f"""Analyze these SEC filing risk factors for NVIDIA. For each risk, rate its:
- Severity (High/Medium/Low)
- Likelihood in the next 12 months (High/Medium/Low)
- Potential stock price impact

Then provide:
1. The TOP 3 risks a day trader should monitor
2. What news events would trigger these risks
3. How a trader should position if each risk materializes

Risk Factors:
{SEC_RISK_FACTORS}"""

print("Analyzing SEC risk factors...\n")
response_risks, stats = ask_llm(prompt_risks, system_prompt=SYSTEM_PROMPT, max_new_tokens=700)
print(f"[{stats['time']:.1f}s]\n")
print(response_risks)

Analyzing SEC risk factors...

[9.4s]

Based on the NVIDIA 10-K filing risk factors, here are the top 3 risks a day trader should monitor, potential news events that could trigger these risks, and suggested trading positions if each risk materializes:

1. Risk: Revenue concentration in a small number of customers
   - Severity: High
   - Likelihood in the next 12 months: Medium
   - Potential stock price impact: High

   News events that could trigger this risk: Announcement of a major customer reducing or terminating its contract with NVIDIA.

   Trading position: If this risk materializes, consider shorting NVIDIA stock as the company's revenue and earnings could be significantly impacted.

2. Risk: Competition from other chip designers and foundries
   - Severity: Medium
   - Likelihood in the next 12 months: High
   - Potential stock price impact: Medium

   News events that could trigger this risk: Announcement of a major competitor launching a new product that directly competes w

In [10]:
# Extract competitive landscape insights
prompt_competitive = f"""Based on this SEC filing excerpt, analyze NVIDIA's competitive position:

1. What are their competitive advantages (moats)?
2. Who are the biggest competitive threats?
3. What would cause their competitive position to weaken?
4. Rate their competitive position: STRONG, MODERATE, or WEAK

Keep it concise and actionable for a stock trader.

Filing excerpt:
{SEC_RISK_FACTORS}"""

print("Analyzing competitive landscape...\n")
response_comp, stats = ask_llm(prompt_competitive, system_prompt=SYSTEM_PROMPT, max_new_tokens=400)
print(f"[{stats['time']:.1f}s]\n")
print(response_comp)

Analyzing competitive landscape...

[7.5s]

1. Competitive Advantages (Moats):
   - Strong brand recognition and reputation in the AI and gaming markets.
   - Continuous innovation and development of advanced GPUs and AI chips.
   - Strategic partnerships and collaborations with major cloud service providers.

2. Biggest Competitive Threats:
   - Competitors with greater financial resources and established customer relationships (e.g., AMD, Intel).
   - Emerging competition from major cloud service providers developing their own AI accelerator chips.
   - Export controls and trade restrictions impacting revenue from China.

3. Factors Weakening Competitive Position:
   - Loss of a major customer or significant reduction in purchases by existing customers.
   - Disruption at TSMC, impacting manufacturing capacity and ability to meet demand.
   - Slower adoption of AI technologies or emergence of alternative approaches that do not favor NVIDIA's platform.

4. Competitive Position Rating:

---
## 4. Financial Q&A

Ask natural language questions about any financial document.
This is like having a research analyst on demand.

In [11]:
class FinancialQA:
    """
    Interactive Q&A system over financial documents.
    Maintains context across questions for follow-ups.
    """
    
    def __init__(self, document, document_type="financial document"):
        self.document = document
        self.document_type = document_type
        self.system_prompt = f"""You are a financial analyst. You have been given a 
{document_type} to analyze. Answer questions based ONLY on information in the document. 
If the answer is not in the document, say so. Be concise and precise."""
    
    def ask(self, question, max_new_tokens=300):
        prompt = f"""Document:
{self.document}

Question: {question}"""
        response, stats = ask_llm(
            prompt, 
            system_prompt=self.system_prompt,
            max_new_tokens=max_new_tokens,
            temperature=0.1,
        )
        return response, stats


# Create a Q&A system for the earnings call
qa = FinancialQA(EARNINGS_TRANSCRIPT, "earnings call transcript")

questions = [
    "What was the quarterly revenue and how does it compare to last year?",
    "What is the revenue guidance for next quarter?",
    "What did management say about competition from AMD?",
    "How much did they return to shareholders?",
    "What is the biggest risk mentioned in this call?",
]

print("=== Financial Q&A Session ===\n")
for q in questions:
    print(f"Q: {q}")
    answer, stats = qa.ask(q)
    print(f"A: {answer}")
    print(f"   [{stats['time']:.1f}s]\n")

=== Financial Q&A Session ===

Q: What was the quarterly revenue and how does it compare to last year?
A: The quarterly revenue for NVIDIA Corporation in Q4 FY2024 was $22.1 billion, which is up 265% year-over-year and up 22% sequentially.
   [1.0s]

Q: What is the revenue guidance for next quarter?
A: The document does not provide specific revenue guidance for the next quarter.
   [0.3s]

Q: What did management say about competition from AMD?
A: The document does not provide any specific information about competition from AMD.
   [0.3s]

Q: How much did they return to shareholders?
A: They returned $2.8 billion to shareholders through buybacks and dividends this quarter.
   [0.5s]

Q: What is the biggest risk mentioned in this call?
A: The biggest risk mentioned in this call is the tight supply relative to demand, which the company expects to gradually ease through the second half of fiscal 2025.
   [0.7s]



---
## 5. News Digest -- Morning Brief Generator

Feed the LLM a batch of headlines and get a synthesized morning brief.
This is what a junior analyst does every morning -- now your GPU does it in seconds.

In [12]:
# Simulated morning news headlines
MORNING_HEADLINES = [
    "NVIDIA Q4 revenue of $22.1B crushes estimates of $20.4B",
    "NVIDIA guides Q1 revenue to $24B, above consensus of $22.2B",
    "Fed minutes show officials divided on rate cut timing",
    "China tech stocks rally on stimulus hopes",
    "Tesla Cybertruck production hits 1,000 units per week",
    "Apple reportedly in talks to bring Gemini AI to iPhone",
    "S&P 500 futures up 0.8% on strong NVIDIA earnings",
    "AMD launches MI300X AI chip, claims performance lead over H100",
    "Bitcoin surges past $52,000 amid ETF inflows",
    "US jobless claims fall to 194,000, below 200,000 estimate",
    "Oil prices rise 2% on Middle East supply concerns",
    "Meta announces new AI model that matches GPT-4 performance",
    "10-year Treasury yield falls to 4.25% after weak housing data",
    "Microsoft Copilot adoption grows to 40% of Fortune 500",
    "Retail sales rise 0.6%, beating expectations of 0.3%",
]

# Generate a morning brief
headlines_text = "\n".join(f"- {h}" for h in MORNING_HEADLINES)

prompt_brief = f"""You are a senior market strategist preparing a morning brief for day traders.
Based on these headlines, write a concise morning brief that covers:

1. MARKET OUTLOOK: Overall tone for today's session (bullish/bearish/mixed)
2. KEY CATALYST: The #1 story moving markets today
3. SECTOR FOCUS: Which sectors to watch and why
4. RISK EVENTS: What could go wrong today
5. TRADE IDEAS: 2-3 specific stocks to watch with directional bias

Headlines:
{headlines_text}"""

print("Generating morning brief...\n")
brief, stats = ask_llm(prompt_brief, max_new_tokens=600, temperature=0.3)
print(f"{'='*60}")
print(f"  MORNING BRIEF -- Generated in {stats['time']:.1f}s")
print(f"{'='*60}\n")
print(brief)

Generating morning brief...

  MORNING BRIEF -- Generated in 7.4s

**Morning Brief for Day Traders**

1. **MARKET OUTLOOK**: Bullish
   - The market sentiment appears positive today, with strong earnings from NVIDIA and positive economic data.

2. **KEY CATALYST**: NVIDIA's Q4 revenue and guidance
   - NVIDIA's Q4 revenue of $22.1B and Q1 guidance of $24B are driving the market today, as they surpassed estimates.

3. **SECTOR FOCUS**: Tech and Financial
   - Tech, especially NVIDIA and AMD, are leading due to strong earnings and product launches. Financial sector watchers should monitor the impact of the Fed minutes on rate cut timing.

4. **RISK EVENTS**: Fed rate cut uncertainty and Bitcoin volatility
   - The divided stance on rate cuts among Fed officials could introduce volatility. Bitcoin's surge and ETF inflows also present potential risks.

5. **TRADE IDEAS**:
   - **NVIDIA (NVDA)**: Bullish, given strong earnings and guidance.
   - **AMD (AMD)**: Bullish, due to the launch of 

In [13]:
# Classify headlines by sector and impact
prompt_classify = f"""Classify each headline by:
- Sector: Tech, Finance, Energy, Macro, Crypto, Consumer
- Impact: HIGH, MEDIUM, LOW
- Direction: BULLISH, BEARISH, NEUTRAL
- Relevant tickers (if any)

Return one line per headline in this exact format:
HEADLINE | SECTOR | IMPACT | DIRECTION | TICKERS

Headlines:
{headlines_text}"""

print("Classifying headlines...\n")
classified, stats = ask_llm(prompt_classify, max_new_tokens=600, temperature=0.1)
print(f"[{stats['time']:.1f}s]\n")
print(classified)

Classifying headlines...

[10.1s]

NVIDIA Q4 revenue of $22.1B crushes estimates of $20.4B | Tech | HIGH | BULLISH | NVDA
NVIDIA guides Q1 revenue to $24B, above consensus of $22.2B | Tech | HIGH | BULLISH | NVDA
Fed minutes show officials divided on rate cut timing | Macro | MEDIUM | BEARISH | -
China tech stocks rally on stimulus hopes | Tech | MEDIUM | BULLISH | -
Tesla Cybertruck production hits 1,000 units per week | Consumer | MEDIUM | BULLISH | TSLA
Apple reportedly in talks to bring Gemini AI to iPhone | Tech | MEDIUM | BULLISH | AAPL
S&P 500 futures up 0.8% on strong NVIDIA earnings | Macro | LOW | BULLISH | SPY
AMD launches MI300X AI chip, claims performance lead over H100 | Tech | HIGH | BULLISH | AMD
Bitcoin surges past $52,000 amid ETF inflows | Crypto | HIGH | BULLISH | BTC
US jobless claims fall to 194,000, below 200,000 estimate | Macro | LOW | BULLISH | -
Oil prices rise 2% on Middle East supply concerns | Energy | MEDIUM | BULLISH | XOM
Meta announces new AI model tha

---
## 6. Loading Larger Models with 4-bit Quantization

For higher quality analysis, we can load Mistral-7B using 4-bit quantization.
This compresses the model from ~14GB to ~4.5GB, fitting easily on your 4090
with minimal quality loss.

In [14]:
# Free up GPU memory from Phi-3
print(f"Before cleanup: {torch.cuda.memory_allocated(0)/1024**3:.1f} GB used")
del model, tokenizer
torch.cuda.empty_cache()
print(f"After cleanup: {torch.cuda.memory_allocated(0)/1024**3:.1f} GB used")

Before cleanup: 7.1 GB used
After cleanup: 0.0 GB used


In [15]:
# Load Mistral-7B with 4-bit quantization
# This gives us a much more capable model while using ~4.5GB VRAM

LARGE_MODEL_ID = "mistralai/Mistral-7B-Instruct-v0.3"

# 4-bit quantization config
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",          # Normal Float 4 -- best quality
    bnb_4bit_compute_dtype=torch.float16, # Compute in fp16 for speed
    bnb_4bit_use_double_quant=True,       # Further compression
)

print(f"Loading {LARGE_MODEL_ID} (4-bit quantized)...")
print("(First run downloads ~14GB. Cached after that.)\n")

t0 = time.time()
tokenizer_large = AutoTokenizer.from_pretrained(LARGE_MODEL_ID)
model_large = AutoModelForCausalLM.from_pretrained(
    LARGE_MODEL_ID,
    quantization_config=bnb_config,
    device_map="auto",
)
load_time = time.time() - t0

mem_used = torch.cuda.memory_allocated(0) / 1024**3
print(f"Model loaded in {load_time:.1f}s")
print(f"GPU memory: {mem_used:.1f} GB / {total_vram:.1f} GB ({mem_used/total_vram*100:.0f}%)")
print(f"\n4-bit quantization saves ~10GB of VRAM with minimal quality loss.")

Loading mistralai/Mistral-7B-Instruct-v0.3 (4-bit quantized)...
(First run downloads ~14GB. Cached after that.)



Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Model loaded in 19.7s
GPU memory: 3.9 GB / 23.5 GB (16%)

4-bit quantization saves ~10GB of VRAM with minimal quality loss.


In [16]:
def ask_llm_large(prompt, system_prompt=None, max_new_tokens=512, temperature=0.3):
    """Send a prompt to the larger Mistral model."""
    messages = []
    if system_prompt:
        messages.append({"role": "system", "content": system_prompt})
    messages.append({"role": "user", "content": prompt})
    
    input_text = tokenizer_large.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )
    inputs = tokenizer_large(input_text, return_tensors="pt").to(device)
    
    t0 = time.time()
    with torch.no_grad():
        outputs = model_large.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=temperature,
            do_sample=temperature > 0,
            top_p=0.9,
            pad_token_id=tokenizer_large.eos_token_id,
        )
    elapsed = time.time() - t0
    
    new_tokens = outputs[0][inputs['input_ids'].shape[1]:]
    response = tokenizer_large.decode(new_tokens, skip_special_tokens=True)
    tokens_generated = len(new_tokens)
    
    return response, {'time': elapsed, 'tokens': tokens_generated,
                       'tok_per_sec': tokens_generated / elapsed if elapsed > 0 else 0}


# Test the larger model
print("Testing Mistral-7B...\n")
response, stats = ask_llm_large(
    "Compare NVIDIA and AMD as investment opportunities for 2024. "
    "Consider revenue growth, margins, AI exposure, and valuation. Be concise.",
    max_new_tokens=400
)
print(f"[{stats['time']:.1f}s, {stats['tok_per_sec']:.0f} tok/s]\n")
print(response)

Testing Mistral-7B...

[16.1s, 25 tok/s]

Comparing NVIDIA (NVDA) and Advanced Micro Devices (AMD) as investment opportunities for 2024, both companies are significant players in the technology sector, with a focus on semiconductors and graphics processing units (GPUs).

1. Revenue Growth: NVIDIA has consistently demonstrated impressive revenue growth, driven by its strong presence in the GPU market and its expanding AI and data center businesses. AMD, while also growing, has been playing catch-up in the GPU market but has made significant strides in CPUs and server processors.

2. Margins: NVIDIA's gross margins are generally higher due to its focus on high-end GPUs, which command premium prices. AMD, on the other hand, has a more balanced product portfolio, including CPUs and GPUs, which results in lower gross margins but a more stable revenue stream.

3. AI Exposure: NVIDIA is a clear leader in AI exposure, with its GPUs powering a significant portion of the world's AI supercomputer

---
## 7. Structured Output for Pipeline Integration

For automated trading pipelines, we need the LLM to return structured data
(JSON) that can be consumed by code. Larger models are better at this.

In [17]:
def analyze_earnings_structured(transcript, llm_fn=ask_llm_large):
    """
    Analyze an earnings call and return structured JSON output.
    Designed for pipeline integration.
    """
    prompt = f"""Analyze this earnings call transcript and return a JSON object with EXACTLY these fields:

{{
  "ticker": "<stock ticker>",
  "quarter": "<e.g. Q4 FY2024>",
  "revenue_billions": <number>,
  "revenue_yoy_pct": <number>,
  "gross_margin_pct": <number>,
  "guidance_revenue_billions": <number or null>,
  "guidance_vs_consensus": "<beat/miss/inline>",
  "sentiment_score": <-1.0 to 1.0>,
  "management_confidence": <1 to 10>,
  "top_3_positives": ["<string>", "<string>", "<string>"],
  "top_3_risks": ["<string>", "<string>", "<string>"],
  "trading_signal": "<STRONG_BUY/BUY/HOLD/SELL/STRONG_SELL>",
  "signal_reasoning": "<1 sentence>"
}}

Return ONLY the JSON object, no other text.

Transcript:
{transcript}"""

    response, stats = llm_fn(prompt, max_new_tokens=500, temperature=0.1)
    
    # Parse JSON from response
    try:
        json_match = re.search(r'\{[\s\S]*\}', response)
        if json_match:
            data = json.loads(json_match.group())
            return data, stats
    except json.JSONDecodeError:
        pass
    
    return {'raw_response': response, 'parse_error': True}, stats


# Run structured analysis
print("Running structured earnings analysis...\n")
analysis, stats = analyze_earnings_structured(EARNINGS_TRANSCRIPT)
print(f"[{stats['time']:.1f}s]\n")

if 'parse_error' not in analysis:
    print(json.dumps(analysis, indent=2))
    
    print("\n--- Pipeline-Ready Output ---")
    print(f"Ticker:     {analysis.get('ticker', 'N/A')}")
    print(f"Signal:     {analysis.get('trading_signal', 'N/A')}")
    print(f"Sentiment:  {analysis.get('sentiment_score', 'N/A')}")
    print(f"Confidence: {analysis.get('management_confidence', 'N/A')}/10")
    print(f"\nThis structured output can feed directly into your trading pipeline.")
else:
    print("Could not parse structured JSON. Raw response:")
    print(analysis['raw_response'])
    print("\nTip: Larger models or lower temperature improve JSON reliability.")

Running structured earnings analysis...

[10.7s]

{
  "ticker": "NVDA",
  "quarter": "Q4 FY2024",
  "revenue_billions": 22.1,
  "revenue_yoy_pct": 265,
  "gross_margin_pct": 76,
  "guidance_revenue_billions": 24,
  "guidance_vs_consensus": "beat",
  "sentiment_score": 0.9,
  "management_confidence": 9,
  "top_3_positives": [
    "Strong demand for accelerated computing and generative AI",
    "Investment by cloud service providers, large enterprises, and sovereign AI infrastructure",
    "H100 demand remains extremely strong"
  ],
  "top_3_risks": [
    "Supply constraints",
    "Export controls impact in China",
    "Competitive landscape for AI accelerators"
  ],
  "trading_signal": "BUY",
  "signal_reasoning": "Strong revenue growth, positive guidance, and a balanced approach to capital allocation indicate a positive outlook."
}

--- Pipeline-Ready Output ---
Ticker:     NVDA
Signal:     BUY
Sentiment:  0.9
Confidence: 9/10

This structured output can feed directly into your trading

---
## 8. Comparative Analysis -- Multiple Stocks

Use the LLM to compare multiple companies and generate relative trading ideas.

In [18]:
# Compile basic financials for comparison
import yfinance as yf

COMPARE_TICKERS = ['NVDA', 'AMD', 'INTC']

company_summaries = []
for ticker in COMPARE_TICKERS:
    try:
        stock = yf.Ticker(ticker)
        info = stock.info
        
        summary = (
            f"{ticker} ({info.get('shortName', 'N/A')}):\n"
            f"  Market Cap: ${info.get('marketCap', 0)/1e9:.0f}B\n"
            f"  Revenue (TTM): ${info.get('totalRevenue', 0)/1e9:.1f}B\n"
            f"  Gross Margin: {info.get('grossMargins', 0)*100:.1f}%\n"
            f"  P/E Ratio: {info.get('trailingPE', 'N/A')}\n"
            f"  Forward P/E: {info.get('forwardPE', 'N/A')}\n"
            f"  Revenue Growth: {info.get('revenueGrowth', 0)*100:.1f}%\n"
            f"  52-Week Range: ${info.get('fiftyTwoWeekLow', 0):.2f} - ${info.get('fiftyTwoWeekHigh', 0):.2f}\n"
            f"  Analyst Target: ${info.get('targetMeanPrice', 'N/A')}\n"
        )
        company_summaries.append(summary)
        print(summary)
    except Exception as e:
        print(f"Could not fetch {ticker}: {e}")
        company_summaries.append(f"{ticker}: Data unavailable")

NVDA (NVIDIA Corporation):
  Market Cap: $4451B
  Revenue (TTM): $187.1B
  Gross Margin: 70.0%
  P/E Ratio: 45.25
  Forward P/E: 23.631657
  Revenue Growth: 62.5%
  52-Week Range: $86.62 - $212.19
  Analyst Target: $253.88464

AMD (Advanced Micro Devices, Inc.):
  Market Cap: $338B
  Revenue (TTM): $34.6B
  Gross Margin: 52.5%
  P/E Ratio: 79.43295
  Forward P/E: 19.458391
  Revenue Growth: 34.1%
  52-Week Range: $76.48 - $267.08
  Analyst Target: $287.19565

INTC (Intel Corporation):
  Market Cap: $234B
  Revenue (TTM): $52.9B
  Gross Margin: 36.6%
  P/E Ratio: N/A
  Forward P/E: 47.20636
  Revenue Growth: -4.1%
  52-Week Range: $17.67 - $54.60
  Analyst Target: $47.11829



In [19]:
# LLM comparative analysis
all_summaries = "\n".join(company_summaries)

prompt_compare = f"""Compare these three semiconductor companies for a trader considering 
positions in this sector. Analyze:

1. RELATIVE VALUE: Which is cheapest relative to growth?
2. MOMENTUM: Which has the strongest business trajectory?
3. RISK/REWARD: Which offers the best asymmetric setup?
4. PAIR TRADE: If you had to go long one and short another, which pair and why?
5. RANKING: Rank all three from most to least attractive right now.

Company Data:
{all_summaries}"""

print("Generating comparative analysis...\n")
comparison, stats = ask_llm_large(prompt_compare, max_new_tokens=600)
print(f"[{stats['time']:.1f}s]\n")
print(comparison)

Generating comparative analysis...

[18.6s]

1. RELATIVE VALUE: Relative to growth, AMD and NVDA are cheaper than Intel. Both AMD and NVDA have higher revenue growth rates compared to Intel, but their P/E ratios and forward P/E ratios are lower than Intel, indicating a lower price for the same growth. However, when comparing AMD and NVDA, NVDA is more expensive in terms of P/E and forward P/E ratios, but it has a significantly higher revenue growth rate.

2. MOMENTUM: AMD and NVDA have shown stronger business trajectories than Intel. Both AMD and NVDA have higher revenue growth rates and have outperformed Intel in terms of stock price appreciation over the past year. AMD has shown particularly strong momentum, with a 52-week high that is more than triple its 52-week low, compared to NVDA's 52-week high that is more than double its 52-week low.

3. RISK/REWARD: AMD offers a better asymmetric setup due to its lower price and higher growth potential compared to NVDA. While NVDA's higher p

---
## 9. Combined Intelligence Dashboard

Merge LLM analysis with results from our other notebooks:
- **Notebook 02**: Sentiment scores
- **Notebook 03**: Technical signals
- **Notebook 04**: Chronos forecasts
- **This notebook**: LLM fundamental analysis

In [20]:
def generate_combined_report(ticker, headlines, llm_fn=ask_llm_large):
    """
    Generate a comprehensive stock report combining all signal sources.
    
    In a full pipeline, you'd pass in real data from notebooks 02-04.
    Here we use the LLM to synthesize a report from available information.
    """
    # Fetch basic data
    stock = yf.Ticker(ticker)
    info = stock.info
    hist = stock.history(period='1mo')
    
    price = info.get('currentPrice', info.get('previousClose', 0))
    change_1m = (hist['Close'].iloc[-1] / hist['Close'].iloc[0] - 1) * 100 if len(hist) > 1 else 0
    
    headlines_text = "\n".join(f"- {h}" for h in headlines[:10])
    
    prompt = f"""Generate a concise trading report for {ticker}.

Current Data:
- Price: ${price:.2f}
- 1-Month Change: {change_1m:+.1f}%
- Market Cap: ${info.get('marketCap', 0)/1e9:.0f}B
- P/E: {info.get('trailingPE', 'N/A')}
- Revenue Growth: {info.get('revenueGrowth', 0)*100:.1f}%

Recent Headlines:
{headlines_text}

Provide:
1. THESIS (1-2 sentences: bullish or bearish case)
2. KEY LEVELS: Support and resistance to watch
3. CATALYSTS: Upcoming events that could move the stock
4. TRADE SETUP: Entry, stop loss, and target if taking a position
5. OVERALL RATING: STRONG BUY / BUY / HOLD / SELL / STRONG SELL

Be specific with numbers."""
    
    response, stats = llm_fn(prompt, max_new_tokens=500, temperature=0.3)
    return response, stats


# Generate reports for key stocks
sample_headlines = {
    'NVDA': [
        "NVIDIA Q4 revenue of $22.1B crushes estimates",
        "NVIDIA guides Q1 to $24B, above consensus",
        "AMD launches MI300X, claims performance gains over H100",
        "Analysts raise NVDA price targets post earnings",
        "NVIDIA Blackwell GPU production ramping ahead of schedule",
    ],
    'TSLA': [
        "Tesla Cybertruck production hits 1,000 per week",
        "Tesla cuts prices in China amid competition",
        "Musk announces robotaxi unveil event",
        "Tesla Q4 deliveries miss estimates slightly",
        "Tesla FSD v12 shows significant improvement in testing",
    ],
}

for ticker, headlines in sample_headlines.items():
    print(f"\n{'='*60}")
    print(f"  {ticker} TRADING REPORT")
    print(f"{'='*60}\n")
    
    report, stats = generate_combined_report(ticker, headlines)
    print(f"[Generated in {stats['time']:.1f}s]\n")
    print(report)


  NVDA TRADING REPORT

[Generated in 13.2s]

1. THESIS: Bullish Case: NVIDIA's strong Q4 earnings, optimistic Q1 guidance, and accelerated production ramp-up of the Blackwell GPU indicate a robust demand for its products, potentially driving continued growth. The competition's new product launch (AMD's MI300X) may stimulate further innovation and market share gains for NVIDIA.

2. KEY LEVELS:
   - Support: $175 (50-day moving average)
   - Resistance: $190 (200-day moving average)

3. CATALYSTS:
   - Earnings Release (Q1 2023)
   - Product Launches (e.g., new GPU series)
   - Competitor Performance Reports
   - Tech Conferences (e.g., CES, GTC)

4. TRADE SETUP:
   - Entry: Upon a pullback to the 50-day moving average ($175)
   - Stop Loss: Below $165 to limit potential losses
   - Target: $205, representing a 15% increase from the entry price, considering the stock's recent high volatility and growth potential

5. OVERALL RATING: BUY

Disclaimer: This report is for informational purpo

---
## 10. Batch Document Processing

Process multiple documents efficiently -- useful for scanning
earnings across an entire sector in minutes.

In [21]:
# Batch headline analysis -- process many headlines at once

SECTOR_HEADLINES = {
    'Semiconductors': [
        "NVIDIA data center revenue up 409% year-over-year",
        "AMD MI300X wins major hyperscaler design win",
        "Intel foundry business reports $7B annual loss",
        "TSMC raises capex guidance to $32B on AI demand",
        "Broadcom AI revenue doubles to $3.7B",
    ],
    'Cloud/Software': [
        "Microsoft Azure revenue growth accelerates to 30%",
        "Salesforce cuts workforce by 10% to improve margins",
        "Snowflake product revenue grows 32% year-over-year",
        "ServiceNow raises full-year guidance above consensus",
        "Palantir government revenue growth slows to 11%",
    ],
    'Consumer Tech': [
        "Apple iPhone sales decline 3% in China",
        "Meta ad revenue beats by 5%, Reels monetization improving",
        "Amazon Prime membership hits 200 million worldwide",
        "Google Search revenue misses estimates by 2%",
        "Netflix adds 13 million subscribers, above guidance",
    ],
}

print("Generating sector-by-sector analysis...\n")

for sector, headlines in SECTOR_HEADLINES.items():
    headlines_text = "\n".join(f"  - {h}" for h in headlines)
    
    prompt = f"""Analyze these {sector} headlines as a sector analyst.
In 3-4 sentences: What's the overall sector trend? Which company stands out 
(positively or negatively)? What's the trade?

Headlines:
{headlines_text}"""
    
    response, stats = ask_llm_large(prompt, max_new_tokens=200, temperature=0.3)
    
    print(f"--- {sector} [{stats['time']:.1f}s] ---")
    print(response)
    print()

Generating sector-by-sector analysis...

--- Semiconductors [7.8s] ---
As a sector analyst, the overall trend in the semiconductor industry indicates a strong focus on AI and data center applications, as evidenced by the significant growth in NVIDIA's data center revenue, AMD's major hyperscaler design win, and TSMC's increased capex guidance due to AI demand. On the other hand, Intel's foundry business reporting a $7B annual loss suggests challenges in this area for the company. Among these companies, NVIDIA and TSMC stand out positively due to their strong performance and focus on emerging technologies.

In terms of trade, investing in companies that are well-positioned in the AI and data center sectors, such as NVIDIA and TSMC, could be a promising play given the increasing demand for these technologies. However, it's important to consider the overall market conditions and the individual company's financial health before making investment decisions.

--- Cloud/Software [8.0s] ---
In

---
## 11. Performance Benchmarks

How fast is your 4090 at various LLM tasks?

In [22]:
# Benchmark: tokens per second at different output lengths
prompt = "Explain the three most important things a day trader should know about reading earnings reports."

output_lengths = [50, 100, 200, 400]
benchmark_results = []

print("Benchmarking Mistral-7B (4-bit) generation speed...\n")
print(f"{'Max Tokens':>12} {'Time (s)':>10} {'Tokens/sec':>12} {'Output Len':>12}")
print("-" * 50)

for max_tok in output_lengths:
    response, stats = ask_llm_large(prompt, max_new_tokens=max_tok, temperature=0.3)
    benchmark_results.append({
        'max_tokens': max_tok,
        'time': stats['time'],
        'tok_per_sec': stats['tok_per_sec'],
        'actual_tokens': stats['tokens'],
    })
    print(f"{max_tok:>12} {stats['time']:>10.2f} {stats['tok_per_sec']:>12.1f} {stats['tokens']:>12}")

# Summary
avg_speed = np.mean([r['tok_per_sec'] for r in benchmark_results])
print(f"\nAverage: {avg_speed:.0f} tokens/sec")
print(f"\nAt this speed, the model can:")
print(f"  - Analyze an earnings transcript in ~5-10 seconds")
print(f"  - Classify 50 headlines in ~30 seconds")
print(f"  - Generate a full trading report in ~8-15 seconds")
print(f"  - Process a day's worth of news in under 5 minutes")

Benchmarking Mistral-7B (4-bit) generation speed...

  Max Tokens   Time (s)   Tokens/sec   Output Len
--------------------------------------------------
          50       2.07         24.2           50
         100       3.96         25.2          100
         200       8.00         25.0          200
         400      10.74         25.1          270

Average: 25 tokens/sec

At this speed, the model can:
  - Analyze an earnings transcript in ~5-10 seconds
  - Classify 50 headlines in ~30 seconds
  - Generate a full trading report in ~8-15 seconds
  - Process a day's worth of news in under 5 minutes


---
## 12. Tips for Getting Better LLM Output

### Prompt Engineering for Financial Analysis

| Technique | Example | Why It Works |
|-----------|---------|-------------|
| **Role assignment** | "You are a senior equity analyst at Goldman Sachs" | Primes the model for domain-specific language |
| **Structured output** | "Return a JSON object with these fields: ..." | Gets parseable data for pipelines |
| **Few-shot examples** | Show 1-2 example inputs and outputs | Teaches the exact format you want |
| **Chain of thought** | "Think step by step about the implications" | Improves reasoning quality |
| **Constraints** | "In exactly 3 bullet points" | Prevents rambling |
| **Low temperature** | `temperature=0.1` for factual extraction | Reduces hallucination |
| **High temperature** | `temperature=0.7` for creative analysis | Generates novel insights |

### Model Selection Guide

| Task | Best Model | Why |
|------|-----------|-----|
| Quick headline classification | Phi-3 Mini (fp16) | Fast, simple task |
| Earnings call deep dive | Mistral-7B (4-bit) | Needs reasoning depth |
| JSON structured output | Mistral-7B or Llama-8B | Better instruction following |
| Creative trade thesis | Mistral-7B (higher temp) | Needs creative reasoning |
| Bulk processing (1000+ items) | Phi-3 Mini (fp16) | Speed matters more |

### Common Pitfalls

- **Hallucination**: LLMs can invent financial numbers. Always verify against source data.
- **Recency cutoff**: The model's training data has a cutoff date. It doesn't know about recent events.
- **Overconfidence**: LLMs present speculation as fact. Treat outputs as analysis, not truth.
- **Context window**: Phi-3 has a 4K token window. Larger documents need chunking.

---
## 13. Summary & Architecture

### What We Built

| Component | Purpose |
|-----------|--------|
| **`ask_llm()` / `ask_llm_large()`** | Core inference functions for any prompt |
| **Earnings Analysis** | Summary, tone analysis, structured metric extraction |
| **SEC Filing Parser** | Risk factor analysis, competitive landscape assessment |
| **Financial Q&A** | `FinancialQA` class for interactive document queries |
| **Morning Brief Generator** | Synthesize headlines into actionable trading brief |
| **Structured Output** | JSON-formatted analysis for pipeline integration |
| **Comparative Analysis** | Multi-stock comparison with relative value insights |
| **Combined Report** | Full trading report integrating multiple data sources |
| **Batch Processing** | Sector-level analysis across multiple headline sets |

### Full Pipeline Architecture

```
Data Sources                  GPU Processing Pipeline              Output
┌──────────────┐    ┌─────────────────────────────────┐    ┌──────────────┐
│ yfinance     │    │ Notebook 02: Sentiment           │    │ Per-stock    │
│ RSS feeds    │───>│  FinBERT + RoBERTa + DistilRoBERTa│──>│ sentiment    │
│ News APIs    │    │                                   │    │ scores       │
├──────────────┤    ├─────────────────────────────────┤    ├──────────────┤
│ Price/volume │    │ Notebook 03: Technical            │    │ Buy/sell     │
│ OHLCV bars   │───>│  EMA, RSI, BB, VWAP, ORB         │──>│ signals      │
│              │    │  BacktestEngine validation         │    │              │
├──────────────┤    ├─────────────────────────────────┤    ├──────────────┤
│ Price history│    │ Notebook 04: Chronos              │    │ Probabilistic│
│              │───>│  Zero-shot forecasting             │──>│ forecasts    │
│              │    │  Confidence intervals              │    │              │
├──────────────┤    ├─────────────────────────────────┤    ├──────────────┤
│ Earnings     │    │ Notebook 05: LLM Analysis         │    │ Fundamental  │
│ SEC filings  │───>│  Phi-3 / Mistral-7B               │──>│ insights &   │
│ Transcripts  │    │  Structured JSON output            │    │ trade thesis │
└──────────────┘    └─────────────────────────────────┘    └──────────────┘
                                                                    │
                                                                    v
                                                           ┌──────────────┐
                                                           │ COMBINED     │
                                                           │ SIGNAL       │
                                                           │ + Risk Mgmt  │
                                                           └──────────────┘
```

### Coming Up Next

Potential future notebooks:
- **06_Backtesting_Engine.ipynb** - Full systematic backtesting with combined signals
- **07_Paper_Trading_Bot.ipynb** - Automated paper trading with Alpaca API

In [23]:
# Clean up GPU memory
if torch.cuda.is_available():
    mem_before = torch.cuda.memory_allocated(0) / 1024**3
    try:
        del model_large, tokenizer_large
    except NameError:
        pass
    torch.cuda.empty_cache()
    mem_after = torch.cuda.memory_allocated(0) / 1024**3
    print(f"GPU memory freed: {mem_before:.1f} GB -> {mem_after:.1f} GB")

print("\nNotebook 05 complete.")
print("You now have a full local LLM financial analysis pipeline on your 4090.")
print("No API costs, no rate limits, no data leaving your machine.")

GPU memory freed: 3.9 GB -> 3.4 GB

Notebook 05 complete.
You now have a full local LLM financial analysis pipeline on your 4090.
No API costs, no rate limits, no data leaving your machine.


In [24]:
!nvidia-smi

Sat Feb 14 09:46:17 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.09             Driver Version: 580.126.09     CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:04:00.0 Off |                  Off |
| 30%   40C    P2            100W /  450W |    7702MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 2070 ...    Off |   00