# Solar Panel Datasheet Comparison Workflow

<a href="https://colab.research.google.com/github/run-llama/llama_cloud_services/blob/main/examples/extract/solar_panel_e2e_comparison.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


This notebook demonstrates an end‑to‑end agentic workflow using LlamaExtract and the LlamaIndex event‑driven workflow framework. In this workflow, we:

1. **Extract** structured technical specifications from a solar panel datasheet (e.g. a PDF downloaded from a vendor).
2. **Load** design requirements (provided as a text blob) for a lab‑grade solar panel.
3. **Generate** a detailed comparison report by triggering an event that injects both the extracted data and the requirements into an LLM prompt.

The workflow is designed for renewable energy engineers who need to quickly validate that a solar panel meets specific design criteria.

The following notebook uses the event‑driven syntax (with custom events, steps, and a workflow class) adapted from the technical datasheet and contract review examples.

## Setup and Load Data

We download the [Honey M TSM-DE08M.08(II) datasheet](https://static.trinasolar.com/sites/default/files/EU_Datasheet_HoneyM_DE08M.08%28II%29_2021_A.pdf) as a PDF.

**NOTE**: The design requirements are already stored in `data/solar_panel_e2e_comparison/design_reqs.txt`.

In [None]:
!wget https://static.trinasolar.com/sites/default/files/EU_Datasheet_HoneyM_DE08M.08%28II%29_2021_A.pdf -O data/solar_panel_e2e_comparison/datasheet.pdf --no-check-certificate

--2025-04-01 14:47:56--  https://static.trinasolar.com/sites/default/files/EU_Datasheet_HoneyM_DE08M.08%28II%29_2021_A.pdf
Resolving static.trinasolar.com (static.trinasolar.com)... 47.246.23.232, 47.246.23.234, 47.246.23.227, ...
Connecting to static.trinasolar.com (static.trinasolar.com)|47.246.23.232|:443... connected.
  Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response... 200 OK
Length: 1888183 (1.8M) [application/pdf]
Saving to: ‘data/solar_panel_e2e_comparison/datasheet.pdf’


2025-04-01 14:47:56 (7.47 MB/s) - ‘data/solar_panel_e2e_comparison/datasheet.pdf’ saved [1888183/1888183]



## Define the Structured Extraction Schema

We define a new, rich schema called `SolarPanelSchema` to capture key technical details from the datasheet. This schema includes:

- **PowerRange:** Structured as minimum and maximum power output (in Watts).
- **SolarPanelSpec:** Includes module name, power output range, maximum efficiency, certifications, and a mapping of page citations.

This schema replaces the earlier LM317 schema and will be used when creating our extraction agent.

In [None]:
from pydantic import BaseModel, Field
from typing import List


class PowerRange(BaseModel):
    min_power: float = Field(..., description="Minimum power output in Watts")
    max_power: float = Field(..., description="Maximum power output in Watts")
    unit: str = Field("W", description="Power unit")


class SolarPanelSpec(BaseModel):
    module_name: str = Field(..., description="Name or model of the solar panel module")
    power_output: PowerRange = Field(..., description="Power output range")
    maximum_efficiency: float = Field(
        ..., description="Maximum module efficiency in percentage"
    )
    temperature_coefficient: float = Field(
        ..., description="Temperature coefficient in %/°C"
    )
    certifications: List[str] = Field([], description="List of certifications")
    page_citations: dict = Field(
        ..., description="Mapping of each extracted field to its page numbers"
    )


class SolarPanelSchema(BaseModel):
    specs: List[SolarPanelSpec] = Field(
        ..., description="List of extracted solar panel specifications"
    )

## Initialize Extraction Agent

Here we initialize our extraction agent that will be responsible for extracting the schema from the solar panel datasheet.

In [None]:
from llama_cloud_services import LlamaExtract
from llama_cloud.core.api_error import ApiError
from llama_cloud import ExtractConfig

# Initialize the LlamaExtract client
llama_extract = LlamaExtract(
    project_id="2fef999e-1073-40e6-aeb3-1f3c0e64d99b",
    organization_id="43b88c8f-e488-46f6-9013-698e3d2e374a",
)

In [None]:
try:
    existing_agent = llama_extract.get_agent(name="solar-panel-datasheet")
    if existing_agent:
        llama_extract.delete_agent(existing_agent.id)
except ApiError as e:
    if e.status_code == 404:
        pass
    else:
        raise

extract_config = ExtractConfig(
    extraction_mode="BALANCED",
)

agent = llama_extract.create_agent(
    name="solar-panel-datasheet", data_schema=SolarPanelSchema, config=extract_config
)

## Workflow Overview

The workflow consists of four main steps:

1. **parse_datasheet:** Reads the solar panel datasheet (PDF) and converts its content into text (with page citations).
2. **load_requirements:** Loads the design requirements (as a text blob) that will be injected into the prompt.
3. **generate_comparison_report:** Constructs a prompt using the extracted datasheet content and design requirements and triggers the LLM to generate a comparison report.
4. **output_result:** Logs and returns the final report as the workflow’s result.

Each step is implemented as an asynchronous function decorated with `@step`, and the workflow is built by subclassing `Workflow`.

In [None]:
from llama_index.core.workflow import (
    Event,
    StartEvent,
    StopEvent,
    Context,
    Workflow,
    step,
)
from llama_index.llms.openai import OpenAI
from llama_index.core.prompts import ChatPromptTemplate
from llama_cloud_services import LlamaExtract
from llama_cloud.core.api_error import ApiError
from pydantic import BaseModel, Field
from typing import List


# Define output schema for the comparison report (for reference)
class ComparisonReportOutput(BaseModel):
    component_name: str = Field(
        ..., description="The name of the component being evaluated."
    )
    meets_requirements: bool = Field(
        ...,
        description="Overall indicator of whether the component meets the design criteria.",
    )
    summary: str = Field(..., description="A brief summary of the evaluation results.")
    details: dict = Field(
        ..., description="Detailed comparisons for each key parameter."
    )


# Define custom events


class DatasheetParseEvent(Event):
    datasheet_content: dict


class RequirementsLoadEvent(Event):
    requirements_text: str


class ComparisonReportEvent(Event):
    report: ComparisonReportOutput


class LogEvent(Event):
    msg: str
    delta: bool = False


# For our demonstration, we assume that LlamaExtract is used to parse the datasheet into text.
# We'll also use OpenAI (via LlamaIndex) as our LLM for generating the report.

llm = OpenAI(model="gpt-4o")  # or your preferred model

In [None]:
class SolarPanelComparisonWorkflow(Workflow):
    """
    Workflow to extract data from a solar panel datasheet and generate a comparison report
    against provided design requirements.
    """

    def __init__(self, agent: LlamaExtract, requirements_path: str, **kwargs):
        super().__init__(**kwargs)
        self.agent = agent
        # Load design requirements from file as a text blob
        with open(requirements_path, "r") as f:
            self.requirements_text = f.read()

    @step
    async def parse_datasheet(
        self, ctx: Context, ev: StartEvent
    ) -> DatasheetParseEvent:
        # datasheet_path is provided in the StartEvent
        datasheet_path = (
            ev.datasheet_path
        )  # e.g., "./data/solar_panel_comparison/datasheet.pdf"
        extraction_result = await self.agent.aextract(datasheet_path)
        datasheet_dict = (
            extraction_result.data
        )  # assumed to be a string with page citations
        await ctx.set("datasheet_content", datasheet_dict)
        ctx.write_event_to_stream(LogEvent(msg="Datasheet parsed successfully."))
        return DatasheetParseEvent(datasheet_content=datasheet_dict)

    @step
    async def load_requirements(
        self, ctx: Context, ev: DatasheetParseEvent
    ) -> RequirementsLoadEvent:
        # Use the pre-loaded requirements text from __init__
        req_text = self.requirements_text
        ctx.write_event_to_stream(LogEvent(msg="Design requirements loaded."))
        return RequirementsLoadEvent(requirements_text=req_text)

    @step
    async def generate_comparison_report(
        self, ctx: Context, ev: RequirementsLoadEvent
    ) -> StopEvent:
        # Build a prompt that injects both the extracted datasheet content and the design requirements
        datasheet_content = await ctx.get("datasheet_content")
        prompt_str = """
You are an expert renewable energy engineer.

Compare the following solar panel datasheet information with the design requirements.

Design Requirements:
{requirements_text}

Extracted Datasheet Information:
{datasheet_content}

Generate a detailed comparison report in JSON format with the following schema:
  - component_name: string
  - meets_requirements: boolean
  - summary: string
  - details: dictionary of comparisons for each parameter

For each parameter (Maximum Power, Open-Circuit Voltage, Short-Circuit Current, Efficiency, Temperature Coefficient),
indicate PASS or FAIL and provide brief explanations and recommendations.
"""

        # extract from contract
        prompt = ChatPromptTemplate.from_messages([("user", prompt_str)])

        # Call the LLM to generate the report using the prompt
        report_output = await llm.astructured_predict(
            ComparisonReportOutput,
            prompt,
            requirements_text=ev.requirements_text,
            datasheet_content=str(datasheet_content),
        )
        ctx.write_event_to_stream(LogEvent(msg="Comparison report generated."))
        return StopEvent(
            result={"report": report_output, "datasheet_content": datasheet_content}
        )

## Running the Workflow

Below, we instantiate and run the workflow. We inject the design requirements as a text blob (no custom code to load) and pass the path to the solar panel datasheet (the HoneyM datasheet from Trina).

The design requirements are:

```
Solar Panel Design Requirements:
- Power Output Range: ≥ 350 W
- Maximum Efficiency: ≥ 18%
- Certifications: Must include IEC61215 and UL1703
```


In [None]:
import nest_asyncio

nest_asyncio.apply()

In [None]:
# Path to design requirements file (e.g., a text file with design criteria for solar panels)
requirements_path = "./data/solar_panel_e2e_comparison/design_reqs.txt"

# Instantiate the workflow
workflow = SolarPanelComparisonWorkflow(
    agent=agent, requirements_path=requirements_path, verbose=True, timeout=120
)

# Run the workflow; pass the datasheet path in the StartEvent
result = await workflow.run(
    datasheet_path="./data/solar_panel_e2e_comparison/datasheet.pdf"
)

In [None]:
print("\n********Final Comparison Report:********\n")
print(result["report"].model_dump_json(indent=4))
# print("\n********Datasheet Content:********\n", result["datasheet_content"])


********Final Comparison Report:********

{
    "component_name": "TSM-DE08M.08(II)",
    "meets_requirements": true,
    "summary": "The solar panel TSM-DE08M.08(II) meets all the design requirements, making it a suitable choice for the intended application.",
    "details": {
        "Maximum Power Output": "PASS - The panel's power output ranges from 360 W to 385 W, exceeding the minimum requirement of 350 W.",
        "Open-Circuit Voltage": "PASS - The datasheet does not specify Voc, but the panel meets other critical requirements. Verification of Voc is recommended.",
        "Short-Circuit Current": "PASS - The datasheet does not specify Isc, but the panel meets other critical requirements. Verification of Isc is recommended.",
        "Efficiency": "PASS - The panel's efficiency is 21.0%, which is above the required 18%.",
        "Temperature Coefficient": "PASS - The temperature coefficient is -0.34%/°C, which is better than the maximum allowable -0.5%/°C."
    }
}
