# RAG over the Caltrain Weekend Schedule 

<a href="https://colab.research.google.com/github/run-llama/llama_cloud_services/blob/main/examples/parse/caltrain/caltrain_text_mode.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This example shows off LlamaParse parsing capabilities to build a functioning query pipeline over the Caltrain weekend schedule, a big timetable containing all trains northbound and southbound and their stops in various cities.

Status:
| Last Executed | Version | State      |
|---------------|---------|------------|
| Aug-19-2025   | 0.6.61  | Maintained |

## Setup

Download the data.

In [None]:
!wget "https://www.caltrain.com/media/31602/download?inline?inline" -O caltrain_schedule_weekend.pdf

## Initialize LlamaParse

Parse the text results from `LlamaParse`, which will represent complex documents incl. text, tables, and figures as nicely formatted text.

In [None]:
from llama_cloud_services import LlamaParse

result = await LlamaParse(
    parse_mode="parse_page_with_agent",
    model="openai-gpt-4-1-mini",
    high_res_ocr=True,
    adaptive_long_table=True,
    outlined_table_extraction=True,
    output_tables_as_HTML=True,
    api_key="llx-...",
).aparse("./caltrain_schedule_weekend.pdf")

documents = result.get_text_documents(split_by_page=True)

Started parsing the file under job_id d162724f-dcb9-4bfe-9bd4-337244906fb8
..

Take a look at the below text (and zoom out from the browser to really get the effect!). You'll see that the entire table is nicely laid out.

In [None]:
print(documents[0].text)

          Printer Friendly WEEKEND Caltrain Schedule
                                          Morning to Early Afternoon Page 1 of 2
          Northbound – WEEKEND SERVICE to SAN FRANCISCO                                                                                                         6XX Local
          Train No.             601  603           605  607          609  611            613       615     617       619          621  623            625  627           629  631
                 Tamien         6:51a             7:51a             8:51a                9:51a               10:51a              11:51a              12:51p             1:51p
       San Jose Diridon         6:56a    7:26a    7:56a    8:26a    8:56a     9:26a      9:56a     10:26a    10:56a    11:26a    11:56a    12:26p    12:56p    1:26p    1:56p    2:26p
          Santa Clara           7:03a    7:33a    8:03a    8:33a    9:03a     9:33a      10:03a    10:33a    11:03a    11:33a    12:03p    12:33p    1:03p     1:

## Initialize Query Engine

We now initialize a query engine over this data. Here we use a baseline summary index, which doesn't do vector indexing/chunking and instead dumps the entire text into the prompt.

In [None]:
from llama_index.core import SummaryIndex
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-5-mini", api_key="sk-...")
index = SummaryIndex.from_documents(documents)
query_engine = index.as_query_engine(llm=llm)

In [None]:
response = query_engine.query(
    "What are the stops (and times) for train no 609 northbound?"
)

In [None]:
print(str(response))

Train No. 609 northbound (stops and times):

- Tamien — 8:51a
- San Jose Diridon — 8:56a
- Santa Clara — 9:03a
- Lawrence — 9:08a
- Sunnyvale — 9:12a
- Mountain View — 9:16a
- San Antonio — 9:19a
- California Ave — 9:22a
- Palo Alto — 9:25a
- Menlo Park — 9:27a
- Redwood City — 9:32a
- San Carlos — 9:35a
- Belmont — 9:38a
- Hillsdale — 9:41a
- Hayward Park — 9:43a
- San Mateo — 9:46a
- Burlingame — 9:48a
- Broadway — 9:51a
- Millbrae — 9:54a
- San Bruno — 9:57a
- S. San Francisco — 10:00a
- Bayshore — 10:05a
- 22nd Street — 10:10a
- San Francisco — 10:15a


In [None]:
response = query_engine.query(
    "What are all the trains (and times) that end at Redwood City going Southbound?"
)

In [None]:
print(str(response))

None. On this weekend schedule no southbound trains terminate at Redwood City — every listed southbound train continues beyond Redwood City to later stations (Menlo Park/Palo Alto and onward).
