# LlamaParse + Cortex Search

This notebook walks through how to parse a complex report with LlamaParse, how to load it in Snowflake, and how to create a Cortex Search service on top of the parsed data (in Snowflake).

At the end, we link to a quickstart you can follow to build a complete RAG app on top of the search service.

## Set key and import libraries

In [None]:
import os
from llama_cloud_services import LlamaParse
import nest_asyncio
nest_asyncio.apply()

os.environ["LLAMA_CLOUD_API_KEY"] = "llx-..."

## Parse PDF

In [None]:
parser = LlamaParse(
    num_workers=4,       # if multiple files passed, split in `num_workers` API calls
    verbose=True,
    language="en",       # optionally define a language, default=en
)

# sync
result = parser.parse("./sec_snow_annual_report.pdf")

Started parsing the file under job_id f2286ee4-d1e8-4a8c-9fb7-f91ad77e9c88
.

In [20]:
# get the llama-index markdown documents
markdown_documents = result.get_markdown_documents(split_by_page=True)

In [23]:
import pandas as pd

def documents_to_dataframe(documents):
    rows = []
    for doc in documents:
        row = {}
        # Store document ID
        row["ID"] = doc.id_
        # Add all metadata items as separate columns
        for key, value in doc.metadata.items():
            row[key] = value
        # Get text from the text_resource attribute, if available
        row["text"] = getattr(doc.text_resource, "text", None)
        rows.append(row)
    return pd.DataFrame(rows)

In [24]:
df = documents_to_dataframe(markdown_documents)

In [None]:
df.to_csv("snowflake_sec_annual_report.csv", index=False)

Load data to Snowflake

1. Log into Snowsight
2. Data -> Databases -> Create Database 'LLAMAPARSE_DEMO'
3. Click the new database in explorer (left pane), and create new schema 'SEC_FILINGS'
4. Choose add data to table and load the CSV you just created. Choose 'View Options' if needed to specify that the first line of the CSV containers headers.

![load page 1](load1.png "Loading data in Snowsight")

![load page 2](load2.png "Loading data in Snowsight")

![load page 3](load_success.png "Successful data load in Snowsight")

## Create a Cortex Search Service

Open a new SQL worksheet and run the following. Make sure to select a database and schema to execute the query.

```sql
CREATE CORTEX SEARCH SERVICE IF NOT EXISTS SNOWFLAKE_ANNUAL_REPORT_SEARCH_SERVICE
  ON TEXT
  ATTRIBUTES ID, PAGE_NUMBER, FILE_NAME
  WAREHOUSE = S
  TARGET_LAG = '1 hour'
    AS (
      SELECT
        ID,
        page_number,
        file_name,
        text
      FROM llamaparse_demo.sec_filings.sec_filings
    );
```

## Create a RAG

Follow steps 5+ to build a LLM assistant using streamlit and Cortex Search on your SEC filings data parsed by LlamaParse!

https://quickstarts.snowflake.com/guide/ask_questions_to_your_own_documents_with_snowflake_cortex_search/#4