# TSLA Earnings Analysis Using RAG and LangChain

## Project Overview ‚Äî Tesla Earnings Analysis Agent (RAG + LangChain)

This notebook implements a lightweight **financial research agent** for analyzing **Tesla (TSLA) quarterly filings** using **Retrieval-Augmented Generation (RAG)**.

**What it does**
- Loads Tesla (TSLA) earnings PDFs, splits them into semantic chunks, embeds them, and stores them in a vector database (Chroma).
- Uses a retriever to fetch the most relevant filing evidence for a given question.
- Wraps retrieval as an agent tool (`search_tsla_docs`) and pairs it with a Python tool for **reproducible financial calculations** (e.g., revenue growth, margins, free cash flow, CAPEX).
- Produces a structured earnings analysis report along with a JSON output, both with **explicit source and page-level citations**.

**Why it matters**
- Grounds LLM-generated analysis directly in primary financial filings, reducing hallucinations.
- Ensures all numerical claims are auditable and computed programmatically.
- Generates portfolio-ready outputs: a human-readable report and a machine-friendly JSON summary.

![RAG - Indexing](https://mintcdn.com/langchain-5e9cc07a/I6RpA28iE233vhYX/images/rag_indexing.png?w=840&fit=max&auto=format&n=I6RpA28iE233vhYX&q=85&s=1838328a870c7353c42bf1cc2290a779)

Êû∂ÊßãÂúñ

```text
User Question
  ‚Üì
LLM (Agent Brain)
  ‚Üì decides which tool(s) to use
Tools
  ‚îú‚îÄ search_tsla_docs(query)  ‚Üí retrieves evidence from TSLA filings
  ‚îî‚îÄ python(query/code)       ‚Üí computes financial metrics (audit-ready)
  ‚Üì
Final Answer
  ‚îú‚îÄ Structured TSLA report
  ‚îî‚îÄ JSON summary (machine-readable)
```

> Key idea: Retrieval provides evidence, Python provides verifiable numbers, and the LLM provides reasoning + narrative.


# Â•ó‰ª∂ËàáÁí∞Â¢É

In [21]:
%%capture
!pip install -qU langchain
!pip install -qU langchain-openai
!pip install -qU langchain_community
!pip install -qU langchain_experimental
!pip install -qU langchain-chroma>=0.1.2
!pip install -qU chromadb
!pip install -qU pypdf
!pip install -qU python-dotenv

In [29]:
%%capture
!pip install -U \
  langchain>=0.2.0 \
  langchain-core>=0.2.0 \
  langchain-community>=0.2.0 \
  langchain-openai>=0.2.0 \
  langchain-text-splitters>=0.2.0 \
  chromadb \
  pydantic \
  pypdf \
  python-dotenv

# Environment & Config

In [32]:
import os

from dotenv import load_dotenv
import os

load_dotenv()  # Ëá™ÂãïËÆÄÂèñ .env
assert os.getenv("OPENAI_API_KEY") is not None
# os.environ["OPENAI_API_KEY"]

# Download TSLA PDF

In [26]:
import requests
from pathlib import Path
# ËºâÂÖ• Tesla PDFÔºà10-Q / Shareholder Letter / Earnings PDFÔºâ
url = "https://ir.tesla.com/_flysystem/s3/sec/000162828025045968/tsla-20250930-gen.pdf"

pdf_dir = Path("data/tsla")
pdf_dir.mkdir(parents=True, exist_ok=True)

pdf_path = pdf_dir / Path(url).name  # Ëá™ÂãïÁî®Á∂≤ÂùÄÊúÄÂæå‰∏ÄÊÆµÁï∂Ê™îÂêç

# ‚úÖ ‰∏ãËºâÔºàÂèØÈáçË∑ëÔºöÊ™îÊ°àÂ∑≤Â≠òÂú®Â∞±‰∏çÈáçÊäìÔºâ
if not pdf_path.exists():
    resp = requests.get(url, timeout=60)
    resp.raise_for_status()
    pdf_path.write_bytes(resp.content)

# ‚úÖ Âü∫Êú¨ sanity checkÔºöÊ™îÊ°àÂ§ßÂ∞è‰∏çË¶ÅÂ§™Â∞è
size_mb = pdf_path.stat().st_size / (1024 * 1024)
print(f"PDF saved: {pdf_path} ({size_mb:.2f} MB)")

PDF saved: data/tsla/tsla-20250930-gen.pdf (0.31 MB)


# Load PDF into LangChain Documents

In [28]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    add_start_index=True, # ÊúÉÂú® metadata Ë£°Âä†Ëµ∑Âßã‰ΩçÁΩÆÔºåÂ∞ç debug ÂæàÂ•Ω
)

splits = text_splitter.split_documents(docs)

print(f"Split into {len(splits)} chunks")
print("Example chunk meta:", splits[0].metadata)
print("Example chunk preview:", splits[0].page_content[:200])

Split into 183 chunks
Example chunk meta: {'producer': 'Qt 5.15.2', 'creator': 'wkhtmltopdf 0.12.6', 'creationdate': '2025-10-23T10:11:25+00:00', 'title': '', 'source': 'https://ir.tesla.com/_flysystem/s3/sec/000162828025045968/tsla-20250930-gen.pdf', 'total_pages': 42, 'page': 0, 'page_label': '1', 'company': 'TSLA', 'source_file': 'tsla-20250930-gen.pdf', 'start_index': 0}
Example chunk preview: UNITED	STATES
SECURITIES	AND	EXCHANGE	COMMISSION
Washington,	D.C.	20549
FORM	
10-Q
(Mark	One)
x
QUARTERLY	REPORT	PURSUANT	TO	SECTION	13	OR	15(d)	OF	THE	SECURITIES	EXCHANGE	ACT	OF	1934
For	the	quarterl


In [27]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader(str(pdf_path))
docs = loader.load() # docs ÊòØ„Äå‰∏ÄÈ†Å‰∏ÄÂÄã Document„Äç

# ‚úÖ Ë£ú metadataÔºàRAG ÈóúÈçµÔºâ
for d in docs: # metadata ÂæåÈù¢ÊúÉË¢´Áî®Âú® citation
    d.metadata.update({
        "company": "TSLA",
        "source": url,
        "source_file": pdf_path.name
    })

print(f"Loaded {len(docs)} pages")
print("Example metadata:", docs[0].metadata)


Loaded 42 pages
Example metadata: {'producer': 'Qt 5.15.2', 'creator': 'wkhtmltopdf 0.12.6', 'creationdate': '2025-10-23T10:11:25+00:00', 'title': '', 'source': 'https://ir.tesla.com/_flysystem/s3/sec/000162828025045968/tsla-20250930-gen.pdf', 'total_pages': 42, 'page': 0, 'page_label': '1', 'company': 'TSLA', 'source_file': 'tsla-20250930-gen.pdf'}


# Split Documents into Chunks

# Create / Load VectorStore (Chroma) + Embed

In [29]:
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

embedding = OpenAIEmbeddings(model="text-embedding-3-large")

persist_dir = "./tsla_chroma" # ÊòØ Ë≥áÊñôÂ∫´ÊâÄÂú®ÁöÑË≥áÊñôÂ§æ
collection_name = "tsla_earnings_pdf" # ÊòØ Ë≥áÊñôÂ∫´Ë£°ÁöÑ‰∏ÄÂÄã„ÄåË°® / ÂëΩÂêçÁ©∫Èñì„Äç

docsearch = Chroma(
    collection_name=collection_name, # Â¶ÇÊûú collection Â∑≤Â≠òÂú® ‚Üí ËÆÄÂèñÔºõ‰∏çÂ≠òÂú® ‚Üí Âª∫Á´ãÊñ∞ÁöÑ
    persist_directory=persist_dir,
    embedding_function=embedding,
)

# ‚úÖ ÈÅøÂÖçÈáçË§áÂØ´ÂÖ•ÔºöÂÖàË©¶ËëóÊü•‰∏Ä‰∏ãÊúâÊ≤íÊúâË≥áÊñô
# Chroma Ê≤íÊúâÁµ±‰∏ÄÁöÑ count APIÔºà‰∏çÂêåÁâàÊú¨Â∑ÆÁï∞ÂæàÂ§ßÔºâÔºåÊúÄÁ∞°ÂñÆÁî®Â∞èÊü•Ë©¢Ê∏¨Ë©¶
probe = docsearch.similarity_search("Tesla", k=1)

if len(probe) == 0:
    ids = docsearch.add_documents(splits)
    docsearch.persist()  # Êúâ‰∫õÁâàÊú¨ÈúÄË¶ÅÈ°ØÂºè persist
    print(f"Indexed {len(ids)} chunks into Chroma")
else:
    print("Chroma already has vectors, skip indexing.")


Indexed 183 chunks into Chroma


  docsearch.persist()  # Êúâ‰∫õÁâàÊú¨ÈúÄË¶ÅÈ°ØÂºè persist


```python
docsearch.add_documents(splits)
```

ÂØ¶ÈöõÂÅö‰∫Ü 4 ‰ª∂‰∫ãÔºö

- 1Ô∏è‚É£ Â∞ç splits[i].page_content ÂëºÂè´ OpenAI Embeddings
- 2Ô∏è‚É£ Áî¢ÁîüÂêëÈáèÔºà1536 / 3072 Á∂≠Ôºâ
- 3Ô∏è‚É£ Â≠òÈÄ≤ Chroma collection
- 4Ô∏è‚É£ ÂõûÂÇ≥ÊØèÁ≠ÜÁöÑ document_id (ids)

# Similarity Search Sanity Check

In [30]:
query = "revenue growth"
results = docsearch.similarity_search(query, k=3)

print(f"Query: {query}")
print(f"Top results: {len(results)}")

for i, r in enumerate(results, 1):
    meta = r.metadata
    print(f"\n--- Result {i} ---")
    print(f"(source={meta.get('source_file')}, page={meta.get('page')})")
    print(r.page_content[:400])


Query: revenue growth
Top results: 3

--- Result 1 ---
(source=tsla-20250930-gen.pdf, page=31)
volume	and	insurance	business	revenue.
Energy	Generation	and	Storage	Segment
Energy	generation	and	storage	revenue	increased	$1.04	billion,	or	44%,	in	the	three	months	ended	September	30,	2025	as	compared	to	the	three
months	ended	September	30,	2024.	Energy	generation	and	storage	revenue	increased	$1.91	billion,	or	27%,	in	the	nine	months	ended	September	30,	2025	as
compared	to	the	nine	months	end

--- Result 2 ---
(source=tsla-20250930-gen.pdf, page=32)
the	nine	months	ended	September	30,	2024.	The	decreases	were	primarily	due	to	the	changes	in	automotive	sales	revenue	and	cost	of	automotive	sales
revenue,	as	discussed	above,	as	well	as	decreases	in	regulatory	credits	revenue.
33

--- Result 3 ---
(source=tsla-20250930-gen.pdf, page=25)
Revenues
$
3,415
	
$
2,376
	
$
8,934
	
$
7,025
	
Cost	of	revenues	(2)
$
2,342
	
$
1,651
	
$
6,230
	
$
5,157
	
Gross	profit
$
1,073
	
$
725
	
$
2,704
	
$
1,86

# Create Retriever (for RAG/Agent)

In [31]:
retriever = docsearch.as_retriever(search_kwargs={"k": 5})

test_query = "Robotaxi and FSD timeline"
docs_hit = retriever.invoke(test_query)

print(f"Retriever hits: {len(docs_hit)}")
for i, d in enumerate(docs_hit, 1):
    print(f"\n--- Hit {i} ---")
    print(d.page_content[:300])
    print(d.metadata)


Retriever hits: 5

--- Hit 1 ---
including	through	product	offerings	and	features	utilizing	artificial	intelligence	such	as	Autopilot,	FSD	(Supervised),	and	other	software,	and	delivering	new
vehicles	and	vehicle	options.	In	addition,	we	believe	the	launch	of	our	Robotaxi	service	unlocks	the	potential	for	significant	business	growt
{'page': 28, 'creationdate': '2025-10-23T10:11:25+00:00', 'company': 'TSLA', 'source': 'https://ir.tesla.com/_flysystem/s3/sec/000162828025045968/tsla-20250930-gen.pdf', 'total_pages': 42, 'producer': 'Qt 5.15.2', 'creator': 'wkhtmltopdf 0.12.6', 'page_label': '29', 'start_index': 2989, 'source_file': 'tsla-20250930-gen.pdf', 'title': ''}

--- Hit 2 ---
the	Second	Circuit	affirmed	the	lower	court‚Äôs	order	and	dismissed	the	case.	On	March	22,	2023,	the	plaintiffs	in	the	Northern	District	of	California
consolidated	action	filed	a	motion	for	a	preliminary	injunction	to	order	Tesla	to	(1)	cease	using	the	term	‚ÄúFull	Self-Driving	Capability‚Äù	(FSD	Capabili
{'s

# Initialize OpenAI LLM

In [None]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-5.2",  # Êàñ gpt-4.1
    temperature=0
)

# ToolsÔºàsearch + pythonÔºâ

### üß† ÈÄôÊÆµÁ®ãÂºèÂú® Agent Êû∂Êßã‰∏≠ÁöÑËßíËâ≤

| Â±§Á¥ö | ‰Ω†ÁèæÂú®ÈÄôÊÆµÁöÑËßíËâ≤ |
|------|------------------|
| VectorStore | Â∑≤ÂÆåÊàê |
| Retriever | Â∑≤ÂÆåÊàê |
| Tool | ‚úÖ Â∞±ÊòØÈÄôÊÆµ |
| Agent Brain | ‰πãÂæåÁî® `create_agent` |
| Reasoning | Áî± LLM Ë≤†Ë≤¨ |


In [None]:
from typing import List
from langchain_core.tools import tool
from langchain_core.documents import Document
from langchain_experimental.tools import PythonREPLTool

python_tool = PythonREPLTool()

@tool("search_tsla_docs") # ÂëäË®¥ LangChainÔºö„ÄåÈÄôÊòØ‰∏ÄÂÄã LLM ÂèØ‰ª•Áî®ÁöÑÂ∑•ÂÖ∑„Äç„ÄÇ
def search_tsla_docs(query: str) -> str: 
    """
    Search TSLA earnings call transcripts, shareholder letters, and filings.
    Input: a natural language query
    Output: concatenated text with source/page metadata for evidence
    """
    # ÊääÊ™¢Á¥¢ÁµêÊûúÊãøÂá∫‰æÜÔºåÊîæÂà∞docsËÆäÊï∏ÔºåÊòØÂÄã List[Document] ÂûãÊÖã
    docs: List[Document] = retriever.invoke(query)  # retriever = docsearch.as_retriever() or ParentDocumentRetriever
    if not docs:
        return "No relevant documents found."

    # ÊääÁµêÊûúÊï¥ÁêÜÊàê„ÄåÂèØÂºïÁî®„ÄçÁöÑÊ†ºÂºèÔºàsource/page ÊòØ RAG ÁöÑÂëΩÔºâ
    chunks = []
    for i, d in enumerate(docs, 1): # enumerate ÊòØÂèñlistÁöÑ index Ë∑ü value
        meta = d.metadata or {} # Èò≤Ê≠¢ metadata ÊòØ None
        src = meta.get("source_file") or meta.get("source") or meta.get("source_id") or "unknown_source" # ÂèñÂæó‰æÜÊ∫êË≥áË®ä
        page = meta.get("page", "NA") # ÂèñÂæóÈ†ÅÁ¢ºÔºåÊ≤íÊúâÂ∞± NA
        text = (d.page_content or "").strip().replace("\n", " ") # ÁßªÈô§ÊèõË°å
        chunks.append(f"[{i}] source={src} page={page}\n{text}") # ÊääÊØèÂÄã chunk Âä†ÈÄ≤ÂéªÔºåchunksÊòØ list of strings

    return "\n\n".join(chunks) # ÊääÊâÄÊúâ chunk Áî®ÈõôÊèõË°åÊé•Ëµ∑‰æÜÔºåËÆäÊàê‰∏ÄÂÄãÂ§ßÂ≠ó‰∏≤ (Âõ†ÁÇ∫ Agent Tool ÁöÑÈªÉÈáëÊ≥ïÂâáÊòØÔºöTool output = LLM ÊúÉÁõ¥Êé•ËÆÄÁöÑÊñáÂ≠ó)

# Tools ÁµÑÂêàÔºöÊää„ÄåÂèØÁî®Â∑•ÂÖ∑ÔºàÊü•Êñá‰ª∂ + ÁÆóÊï∏Â≠óÔºâ„Äç‰∫§Áµ¶ LLMÔºåËÆìÂÆÉËÆäÊàê‰∏ÄÂÄãÊúÉÊü•Ë≠â„ÄÅÊúÉË®àÁÆóÁöÑË≤°Â†±Á†îÁ©∂Âì°ÔºàAgentÔºâ„ÄÇ
tools = [search_tsla_docs, python_tool]

‰Ω†ÁèæÂú®ÂõûÂÇ≥ÁöÑÊ†ºÂºèÔºåLLM ÊúÉÈÄôÊ®£ÁêÜËß£Ôºö
```text
[1] source=tsla-20250930-gen.pdf page=12
Tesla reported automotive gross margin...

[2] source=...
```

# AgentÔºàcreate_agentÔºâ 

In [None]:
from langchain.agents import create_agent

system_prompt = """
You are a financial research agent specialized in Tesla (TSLA).

Rules (MUST follow):
1) ALWAYS call the tool `search_tsla_docs` before making any factual/qualitative claim.
2) ALL numerical metrics (growth, margins, FCF, CAPEX) MUST be computed using the python tool.
3) Cite evidence as: (source=..., page=...)
4) If information is missing, explicitly say what is missing.
5) Output must follow the exact TSLA_TEMPLATE structure.
6) Do NOT invent numbers.

Tool usage:
- Use search_tsla_docs for evidence retrieval.
- Use python for calculations, tables, and any numeric derivations.

When you cite, cite the exact page you used.
"""

agent = create_agent(
    model=llm,            # OpenAI Chat model instance
    tools=tools,
    system_prompt=system_prompt
)

# TSLA_TEMPLATEÔºà‰∏ÄÊ¨°Ôºâ

In [22]:
TSLA_TEMPLATE = """
Produce a TSLA earnings analysis using the following structure:

Company: Tesla, Inc. (TSLA)
Period: Latest reported quarter (based on the provided documents)

Sections:
1. One-Line Verdict (Bull / Bear / Mixed)
2. Key Highlights (with citations)
3. Key Risks (with citations)
4. Financial Performance Snapshot (table)
   - Revenue, YoY, QoQ
   - Gross Margin, Operating Margin
   - Free Cash Flow (FCF), CAPEX
5. AI, Autonomy & Product Roadmap (with citations)
6. Forward-Looking Catalysts (with citations if mentioned)
7. Market Focus: what investors should watch next quarter
8. Open Questions (explicitly list missing info)
9. Citations (a clean list)

Also output a JSON object at the end, wrapped in a ```json code block``` with:
- verdict
- highlights
- risks
- financials (revenue growth, margins, FCF, CAPEX)
- ai_autonomy
- citations
"""

# Demo QueryÔºà‰∏ÄÊ¨°Ôºâ

In [23]:
from langchain_core.messages import HumanMessage

query = f"""
Analyze TSLA's latest earnings based ONLY on the indexed TSLA PDF(s).

{TSLA_TEMPLATE}

Questions to answer:
- What are the main positives and negatives this quarter?
- How are margins and free cash flow trending?
- What did management say about FSD, Robotaxi, and AI investment?
- What metrics will investors focus on next quarter?

Important:
- Use `search_tsla_docs` to retrieve evidence before claims.
- Use python tool for calculations (no mental math).
- Always cite (source=..., page=...) for qualitative statements.
"""

result = agent.invoke({"messages": [HumanMessage(content=query)]})


# ‰πæÊ∑®Ëº∏Âá∫ÔºàFinal report + JSON blockÔºâ

In [24]:
# ‚úÖ ÊúÄÁµÇÁ≠îÊ°àÈÄöÂ∏∏ÊòØÊúÄÂæå‰∏ÄÂâá message
final_text = result["messages"][-1].content

print(final_text)

Company: Tesla, Inc. (TSLA)  
Period: **Q3 2025 (three months ended September 30, 2025)** (source=tsla-20250930-gen.pdf, page=30)

---

## 1. One-Line Verdict (Bull / Bear / Mixed)
**Mixed** ‚Äî Revenue grew YoY, but **gross/operating margins declined materially** and key cash-flow items needed for quarterly FCF are **not disclosed** in the provided PDF. (source=tsla-20250930-gen.pdf, page=30) (source=tsla-20250930-gen.pdf, page=32)

---

## 2. Key Highlights (with citations)
- **Total revenue increased to $28.10B (+12% YoY)** in Q3 2025. (source=tsla-20250930-gen.pdf, page=30)  
- **Energy generation & storage revenue grew 44% YoY** (to $3.42B), attributed to higher Megapack/Powerwall deployments (with some offset from lower Megapack ASP). (source=tsla-20250930-gen.pdf, page=30) (source=tsla-20250930-gen.pdf, page=31)  
- **Services & other revenue grew 25% YoY** (to $3.48B), driven by used vehicle sales volume, paid Supercharging sessions, maintenance/collision revenue, and insurance

## ÔºàÂèØÈÅ∏ÔºâÂä†‰∏ÄÂÄã„ÄåËá™ÂãïÂàáÂá∫ JSON ÂçÄÂ°ä„ÄçÊñπ‰æøÂ±ïÁ§∫

In [25]:
import re

final_text = result["messages"][-1].content

# ÂòóË©¶Êäì ```json ... ``` ÂçÄÂ°ä
m = re.search(r"```json\s*(\{.*?\})\s*```", final_text, flags=re.DOTALL)
json_block = m.group(1) if m else None

print("===== FINAL REPORT =====")
print(final_text)

print("\n===== EXTRACTED JSON =====")
if json_block:
    print(json_block)
else:
    print("No JSON block found. (The model may not have produced a json code block.)")


===== FINAL REPORT =====
Company: Tesla, Inc. (TSLA)  
Period: **Q3 2025 (three months ended September 30, 2025)** (source=tsla-20250930-gen.pdf, page=30)

---

## 1. One-Line Verdict (Bull / Bear / Mixed)
**Mixed** ‚Äî Revenue grew YoY, but **gross/operating margins declined materially** and key cash-flow items needed for quarterly FCF are **not disclosed** in the provided PDF. (source=tsla-20250930-gen.pdf, page=30) (source=tsla-20250930-gen.pdf, page=32)

---

## 2. Key Highlights (with citations)
- **Total revenue increased to $28.10B (+12% YoY)** in Q3 2025. (source=tsla-20250930-gen.pdf, page=30)  
- **Energy generation & storage revenue grew 44% YoY** (to $3.42B), attributed to higher Megapack/Powerwall deployments (with some offset from lower Megapack ASP). (source=tsla-20250930-gen.pdf, page=30) (source=tsla-20250930-gen.pdf, page=31)  
- **Services & other revenue grew 25% YoY** (to $3.48B), driven by used vehicle sales volume, paid Supercharging sessions, maintenance/collisi