# **SCRAPING USING 🔥 FIRECRAWL** & **RETAB**

We show you here how to scrape and structure **the latest Quarterly PR issued by NVIDIA** (Investor Relations page [here](https://investor.nvidia.com/financial-info/financial-reports/)) using:

- **[Firecrawl `Scrape` Endpoint](https://docs.firecrawl.dev/features/scrape)** to get the clean Markdow of the latest NVIDIA's PR.

- **[Retab](https://www.retab.com/)** to define a `schema` and `prompt` and generate precise structured output without LLMs' hallucinations from Firecrawl's clean Markdown. 

In our example, we will focus on the last two PRs issued by Nvidia:
- Financial Results for First Quarter Fiscal 2026 [here](https://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-first-quarter-fiscal-2026)
- Financial Results for Fourth Quarter and Fiscal 2025 [here](https://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-fourth-quarter-and-fiscal-2025)

The idea for this workflow would be to put in place automations that monitor the publications of your companies of interest to process their PR right away, enabling you to leverage the latest news and send structured accurate data into your DBs.

*Retab's platform enables to automatically generale — iterate - deploy our schemas & prompts into production. See the [Documentation here](https://docs.retab.com/overview/introduction)*

Built with 🩷 by retab.


### **INITIALIZATION**

Initiate your **API Keys** on **[Firecrawl](https://www.firecrawl.dev/)** and **[Retab](https://www.retab.com/)** and save them in a `.env` file.

You should have:
```
FIRECRAWL_API_KEY=fc-***
RETAB_API_KEY=sk_retab_***


### **RUN**

In [1]:
# %pip install retab
# %pip install firecrawl-py

In [2]:
# GET THE LATEST PRESS RELEASE MARKDOWN WITH FIRECRAWL
from dotenv import load_dotenv
from firecrawl import FirecrawlApp

load_dotenv()

app = FirecrawlApp()

# Scrape a website:
scrape_result = app.scrape_url(
    'https://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-first-quarter-fiscal-2026', 
    formats=['markdown'])

print(scrape_result)



You can use [Retab platform](https://www.retab.com/dashboard) to quickly generate a `schema` & `prompt` to extract the information with high accuracy.

You configuration is viewed as a unique `project_id` to be referenced below.

You can check the [Documentation here](https://docs.retab.com/core-concepts/Projects).

In [3]:
# STRUCTURE THE INFORMATION WITH RETAB
from retab import Retab

client = Retab()

with open("nvidia_pr_markdown.md", "w") as f:
    f.write(scrape_result.markdown)

completion = client.projects.extract(
    project_id="proj_G-i_K3hBzeRx-8zv787lA",
    iteration_id="eval_iter_njoiCHYYhyLWIjC9QMeWj",
    document="nvidia_pr_markdown.md"
)

print(completion)

RetabParsedChatCompletion(id='IWuoaOXqEIjyvdIP3LqRwA4', choices=[RetabParsedChoice(finish_reason='stop', index=0, logprobs=None, message=ParsedChatCompletionMessage(content='{"company_name": "NVIDIA", "ticker": "NVDA", "exchange": "nasdaq", "fiscal_period": {"period_type": "quarter", "fiscal_year": 2026, "fiscal_quarter": 1, "period_end_date": "2025-04-27"}, "headline": "NVIDIA Announces Financial Results for First Quarter Fiscal 2026", "key_bullets": ["Revenue of $44.1 billion, up 12% from Q4 and up 69% from a year ago", "Data Center revenue of $39.1 billion, up 10% from Q4 and up 73% from a year ago"], "dividend": {"amount_per_share": 0.01, "currency": "USD", "pay_date": "2025-07-03", "record_date": "2025-06-11"}, "gaap_summary": {"revenue_millions": 44062, "gross_margin_pct": 60.5, "operating_expenses_millions": 5030, "operating_income_millions": 21638, "net_income_millions": 18775, "diluted_eps": 0.76, "qoq_change_revenue_pct": 12, "yoy_change_revenue_pct": 69}, "nongaap_summary": 

In [4]:
import json, textwrap

parsed_data = json.loads(completion.choices[0].message.content)
formatted_json = json.dumps(parsed_data, indent=2, ensure_ascii=False)
print(textwrap.indent(formatted_json, "  "))

  {
    "company_name": "NVIDIA",
    "ticker": "NVDA",
    "exchange": "nasdaq",
    "fiscal_period": {
      "period_type": "quarter",
      "fiscal_year": 2026,
      "fiscal_quarter": 1,
      "period_end_date": "2025-04-27"
    },
    "headline": "NVIDIA Announces Financial Results for First Quarter Fiscal 2026",
    "key_bullets": [
      "Revenue of $44.1 billion, up 12% from Q4 and up 69% from a year ago",
      "Data Center revenue of $39.1 billion, up 10% from Q4 and up 73% from a year ago"
    ],
    "dividend": {
      "amount_per_share": 0.01,
      "currency": "USD",
      "pay_date": "2025-07-03",
      "record_date": "2025-06-11"
    },
    "gaap_summary": {
      "revenue_millions": 44062,
      "gross_margin_pct": 60.5,
      "operating_expenses_millions": 5030,
      "operating_income_millions": 21638,
      "net_income_millions": 18775,
      "diluted_eps": 0.76,
      "qoq_change_revenue_pct": 12,
      "yoy_change_revenue_pct": 69
    },
    "nongaap_summary": {
 