# Automotive Equity Research: A Multi-Step Agentic Workflow

<a href="https://colab.research.google.com/github/run-llama/llama_cloud_services-demo/blob/main/examples/extract/automotive_sector_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook demonstrates an end‑to‑end agentic workflow using LlamaExtract and the LlamaIndex event‑driven workflow framework for automotive sector analysis.

In this workflow, we:
1. **Extract** key financial metrics from Q2 2024 earnings reports for Tesla and Ford.
2. **Generate** a preliminary financial model summary for each company using an LLM.
3. **Cross‑reference** Tesla's metrics with Ford's data to produce a final equity research memo.
4. **Output** the memo as structured JSON.

This workflow is designed for equity research analysts and investment professionals.

In [None]:
# Download an example earnings call transcript PDF from SEC EDGAR (Tesla Q2 Earnings as an example)
!mkdir -p data/automotive_sector_analysis
!wget https://digitalassets.tesla.com/tesla-contents/image/upload/IR/TSLA-Q2-2024-Update.pdf -O data/automotive_sector_analysis/tesla_q2_earnings.pdf
!wget https://s205.q4cdn.com/882619693/files/doc_financials/2024/q2/Q2-2024-Ford-Earnings-Press-Release.pdf -O data/automotive_sector_analysis/ford_q2_earnings_press_release.pdf

## Define the Output Schema

We define a schema to represent the final equity research memo. This includes the company name, a summary of the financial model, a comparative analysis, and an overall recommendation.

In [None]:
from pydantic import BaseModel, Field
from typing import Optional


class RawFinancials(BaseModel):
    revenue: Optional[float] = Field(
        None, description="Extracted revenue (in million USD)"
    )
    operating_income: Optional[float] = Field(
        None, description="Extracted operating income (in million USD)"
    )
    eps: Optional[float] = Field(None, description="Extracted earnings per share")
    # Add more metrics as needed


class InitialFinancialDataOutput(BaseModel):
    company_name: str = Field(
        ..., description="Company name as extracted from the earnings deck"
    )
    ticker: str = Field(..., description="Stock ticker symbol")
    report_date: str = Field(..., description="Date of the earnings deck/report")
    raw_financials: RawFinancials = Field(
        ..., description="Structured raw financial metrics"
    )
    narrative: Optional[str] = Field(
        None, description="Additional narrative content (if any)"
    )

In [None]:
from pydantic import BaseModel, Field
from typing import List, Dict


# Define the structured output schema for each company's financial model
class FinancialModelOutput(BaseModel):
    revenue_projection: float = Field(
        ..., description="Projected revenue for next year (in million USD)"
    )
    operating_income_projection: float = Field(
        ..., description="Projected operating income for next year (in million USD)"
    )
    growth_rate: float = Field(..., description="Expected revenue growth rate (%)")
    discount_rate: float = Field(
        ..., description="Discount rate (%) used for valuation"
    )
    terminal_growth_rate: float = Field(
        ..., description="Terminal growth rate (%) used in the model"
    )
    valuation_estimate: float = Field(
        ..., description="Estimated enterprise value (in million USD)"
    )
    key_assumptions: str = Field(
        ..., description="Key assumptions such as tax rate, CAPEX ratio, etc."
    )
    summary: str = Field(
        ..., description="A brief summary of the preliminary financial model analysis."
    )


class ComparativeAnalysisOutput(BaseModel):
    comparative_analysis: str = Field(
        ..., description="Comparative analysis between Company A and Company B"
    )
    overall_recommendation: str = Field(
        ..., description="Overall investment recommendation with rationale"
    )


# Define the final equity research memo schema, which aggregates the outputs for Company A and B
class FinalEquityResearchMemoOutput(BaseModel):
    company_a_model: FinancialModelOutput = Field(
        ..., description="Financial model summary for Company A"
    )
    company_b_model: FinancialModelOutput = Field(
        ..., description="Financial model summary for Company B"
    )
    comparative_analysis: ComparativeAnalysisOutput = Field(
        ..., description="Comparative analysis between Company A and Company B"
    )


print("FinalEquityResearchMemoOutput schema defined.")

FinalEquityResearchMemoOutput schema defined.


## Initialize the Extraction Agent

We create (or replace) an extraction agent using our automotive sector analysis schema.

In [None]:
from dotenv import load_dotenv
from llama_cloud_services import LlamaExtract
from llama_cloud.core.api_error import ApiError
from llama_cloud import ExtractConfig


llama_extract = LlamaExtract(
    project_id="2fef999e-1073-40e6-aeb3-1f3c0e64d99b",
    organization_id="43b88c8f-e488-46f6-9013-698e3d2e374a",
)

try:
    existing_agent = llama_extract.get_agent(name="automotive-sector-analysis")
    if existing_agent:
        llama_extract.delete_agent(existing_agent.id)
except ApiError as e:
    if e.status_code == 404:
        pass
    else:
        raise

extract_config = ExtractConfig(
    extraction_mode="BALANCED"
    # extraction_mode="MULTIMODAL"
)

agent = llama_extract.create_agent(
    name="automotive-sector-analysis",
    data_schema=InitialFinancialDataOutput,
    config=extract_config,
)
print("Automotive sector analysis extraction agent created.")

Automotive sector analysis extraction agent created.


## Define the Workflow

This workflow analyzes Q2 2024 earnings reports for two major automotive companies:

- **Tesla (TSLA)**: Focus on electric vehicles, energy storage, and regulatory credits
- **Ford (F)**: Traditional automotive manufacturer with growing EV segment

Key metrics extracted and analyzed:
- Revenue and revenue projections
- Operating income
- Growth rates
- Valuation estimates
- Key business segment performance

In this workflow, the steps are:
1. **parse_transcript:** Extract text (with page citations) from the earnings call transcript PDF.
2. **load_modeling_data:** Load financial modeling assumptions from a text file.
3. **generate_financial_model:** Generate a preliminary financial model summary using an LLM.
4. **load_comparable_data:** **Extract** comparable financial metrics from a PDF file (Company B).
5. **cross_reference:** Compare Company A’s metrics with Company B’s data using the LLM.
6. **output_final_memo:** Assemble the final equity research memo and output it as JSON.

In [None]:
from llama_index.core.workflow import (
    Event,
    StartEvent,
    StopEvent,
    Context,
    Workflow,
    step,
)
from llama_index.llms.openai import OpenAI
from llama_index.core.llms.llm import LLM
from llama_index.core.prompts import ChatPromptTemplate


# Define custom events for each step
class DeckAParseEvent(Event):
    deck_content: InitialFinancialDataOutput


class DeckBParseEvent(Event):
    deck_content: InitialFinancialDataOutput


class CompanyModelEvent(Event):
    model_output: FinancialModelOutput


class ComparableDataLoadEvent(Event):
    company_a_output: FinancialModelOutput
    company_b_output: FinancialModelOutput


class LogEvent(Event):
    msg: str
    delta: bool = False


class AutomotiveSectorAnalysisWorkflow(Workflow):
    """
    Workflow to generate an equity research memo for automotive sector analysis.
    """

    def __init__(
        self,
        agent: LlamaExtract,
        modeling_path: str,
        llm: Optional[LLM] = None,
        **kwargs
    ):
        super().__init__(**kwargs)
        self.agent = agent
        self.llm = llm or OpenAI(model="o3-mini")
        # Load financial modeling assumptions from file
        with open(modeling_path, "r") as f:
            self.modeling_data = f.read()
        # Instead of loading comparable data from a text file, we load from a PDF

    async def _parse_deck(self, ctx: Context, deck_path) -> InitialFinancialDataOutput:
        extraction_result = await self.agent.aextract(deck_path)
        initial_output = extraction_result.data  # expected to be a string
        ctx.write_event_to_stream(LogEvent(msg="Transcript parsed successfully."))
        return initial_output

    @step
    async def parse_deck_a(self, ctx: Context, ev: StartEvent) -> DeckAParseEvent:
        initial_output = await self._parse_deck(ctx, ev.deck_path_a)
        await ctx.set("initial_output_a", initial_output)
        return DeckAParseEvent(deck_content=initial_output)

    @step
    async def parse_deck_b(self, ctx: Context, ev: StartEvent) -> DeckBParseEvent:
        initial_output = await self._parse_deck(ctx, ev.deck_path_b)
        await ctx.set("initial_output_b", initial_output)
        return DeckBParseEvent(deck_content=initial_output)

    async def _generate_financial_model(
        self, ctx: Context, financial_data: InitialFinancialDataOutput
    ) -> FinancialModelOutput:
        prompt_str = """
    You are an expert financial analyst.
    Using the following raw financial data from an earnings deck and financial modeling assumptions,
    refine the data to produce a financial model summary. Adjust the assumptions based on the company-specific context.
    Please use the most recent quarter's financial data from the earnings deck.

    Raw Financial Data:
    {raw_data}
    Financial Modeling Assumptions:
    {assumptions}

    Return your output as JSON conforming to the FinancialModelOutput schema.
    You MUST make sure all fields are filled in the output JSON.

    """
        prompt = ChatPromptTemplate.from_messages([("user", prompt_str)])
        refined_model = await self.llm.astructured_predict(
            FinancialModelOutput,
            prompt,
            raw_data=financial_data.model_dump_json(),
            assumptions=self.modeling_data,
        )
        return refined_model

    @step
    async def refine_financial_model_company_a(
        self, ctx: Context, ev: DeckAParseEvent
    ) -> CompanyModelEvent:
        print("deck content A", ev.deck_content)
        refined_model = await self._generate_financial_model(ctx, ev.deck_content)
        print("refined_model A", refined_model)
        print(type(refined_model))
        await ctx.set("CompanyAModelEvent", refined_model)
        return CompanyModelEvent(model_output=refined_model)

    @step
    async def refine_financial_model_company_b(
        self, ctx: Context, ev: DeckBParseEvent
    ) -> CompanyModelEvent:
        print("deck content B", ev.deck_content)
        refined_model = await self._generate_financial_model(ctx, ev.deck_content)
        print("refined_model B", refined_model)
        print(type(refined_model))
        await ctx.set("CompanyBModelEvent", refined_model)
        return CompanyModelEvent(model_output=refined_model)

    @step
    async def cross_reference_models(
        self, ctx: Context, ev: CompanyModelEvent
    ) -> StopEvent:
        # Assume CompanyAModelEvent and CompanyBModelEvent are stored in the context
        company_a_model = await ctx.get("CompanyAModelEvent", default=None)
        company_b_model = await ctx.get("CompanyBModelEvent", default=None)
        if company_a_model is None or company_b_model is None:
            return

        prompt_str = """
    You are an expert investment analyst.
    Compare the following refined financial models for Company A and Company B.
    Based on this comparison, provide a specific investment recommendation for Tesla (Company A).
    Focus your analysis on:
    1. Key differences in revenue projections, operating income, and growth rates
    2. Valuation estimates and their implications
    3. Clear recommendation for Tesla with supporting rationale
    Return your analysis as plain text.
    Company A Model:
    {company_a_model}
    Company B Model:
    {company_b_model}
    """
        prompt = ChatPromptTemplate.from_messages([("user", prompt_str)])
        comp_analysis = await self.llm.astructured_predict(
            ComparativeAnalysisOutput,
            prompt,
            company_a_model=company_a_model.model_dump_json(),
            company_b_model=company_b_model.model_dump_json(),
        )
        final_memo = FinalEquityResearchMemoOutput(
            company_a_model=company_a_model,
            company_b_model=company_b_model,
            comparative_analysis=comp_analysis,
        )
        return StopEvent(result={"memo": final_memo})




## Running the Workflow

Now we run the workflow with the pre-loaded modeling assumptions and the deck from both companies.

In [None]:
import nest_asyncio

nest_asyncio.apply()

In [None]:
modeling_path = "./data/automotive_sector_analysis/modeling_assumptions.txt"
workflow = AutomotiveSectorAnalysisWorkflow(
    agent=agent, modeling_path=modeling_path, verbose=True, timeout=240
)

#### Visualize the Workflow

![](data/automotive_sector_analysis/workflow_img.png)

In [None]:
from llama_index.utils.workflow import draw_all_possible_flows

draw_all_possible_flows(
    AutomotiveSectorAnalysisWorkflow,
    filename="automotive_sector_analysis_workflow.html",
)

<class 'NoneType'>
<class 'llama_index.core.workflow.events.StopEvent'>
<class '__main__.DeckAParseEvent'>
<class '__main__.DeckBParseEvent'>
<class '__main__.CompanyModelEvent'>
<class '__main__.CompanyModelEvent'>
automotive_sector_analysis_workflow.html


In [None]:
result = await workflow.run(
    deck_path_a="./data/automotive_sector_analysis/tesla_q2_earnings.pdf",
    deck_path_b="./data/automotive_sector_analysis/ford_q2_earnings_press_release.pdf",
)
final_memo = result["memo"]
print("\n********Final Equity Research Memo:********\n", final_memo)

Running step parse_deck_a


Uploading files:   0%|          | 0/1 [00:00<?, ?it/s]

Running step parse_deck_b




Uploading files: 100%|██████████| 1/1 [00:00<00:00,  1.13it/s]
Creating extraction jobs: 100%|██████████| 1/1 [00:00<00:00,  3.87it/s]
Uploading files: 100%|██████████| 1/1 [00:01<00:00,  1.25s/it]
Creating extraction jobs: 100%|██████████| 1/1 [00:00<00:00,  4.05it/s]
Extracting files: 100%|██████████| 1/1 [00:03<00:00,  3.82s/it]


Step parse_deck_b produced event DeckBParseEvent
Running step refine_financial_model_company_b
deck content B company_name='Ford Motor Company' ticker='F' report_date='July 24, 2024' raw_financials=RawFinancials(revenue=47.8, operating_income=2.8, eps=0.46) narrative='Ford reports second-quarter revenue of $47.8 billion, net income of $1.8 billion and adjusted EBIT of $2.8 billion. Ford Pro posts quarterly EBIT of $2.6 billion – a 15% margin – on 9% revenue gain; customers buying every Super Duty truck and Transit van the company can make. Ford Blue hybrid sales up 34%, represent nearly 9% of company’s global vehicle mix; Ford Model e costs down ~$400 million. Expectations for full-year 2024 adjusted EBIT unchanged at $10 billion to $12 billion; adjusted free cash flow outlook raised $1 billion, to between $7.5 billion and $8.5 billion.'


Extracting files: 100%|██████████| 1/1 [00:03<00:00,  3.74s/it]


Step parse_deck_a produced event DeckAParseEvent
Running step refine_financial_model_company_a
deck content A company_name='Tesla' ticker='TSLA' report_date='Q2 2024' raw_financials=RawFinancials(revenue=25.5, operating_income=1.6, eps=0.42) narrative='In Q2, we achieved record quarterly revenues despite a difficult operating environment. The Energy Storage business continues to grow rapidly, setting a record in Q2 with 9.4 GWh of deployments, resulting in record revenues and gross profits for the overall segment. We also saw a sequential rebound in vehicle deliveries in Q2 as overall consumer sentiment improved and we launched attractive financing options to offset the impact of sustained high interest rates. We recognized record regulatory credit revenues in Q2 as other OEMs are still behind on meeting emissions requirements. Global EV penetration returned to growth in Q2 and is taking share from ICE vehicles. We believe that a pure EV is the optimal vehicle design and will ultimatel

In [None]:
final_memo.comparative_analysis

ComparativeAnalysisOutput(comparative_analysis='Comparing the two refined models reveals several key differences. Company A’s (Tesla’s) model is based on a much smaller revenue base (annualized quarterly revenue of approximately $102 million, growing to $112.2 million with a 10% growth assumption) compared to Company B’s (Ford’s) model, which scales from a considerably larger quarterly figure to a projected annual revenue of roughly $210.32 billion. In Tesla’s model, the operating income projection is about $7.03 million, derived from maintaining a consistent operating margin from a strong Q2 performance. In contrast, Ford’s operating income is projected at about $12.32 billion, reflecting its vast operational scale and established footprint. \n\nBoth models assume similar growth rates (10% next year) and employ the same discount (8%) and terminal growth (2%) rates, but the underlying business model assumptions (e.g., CAPEX, working capital, and depreciation percentages) are applied to