# LlamaIndex Platform Demo

## Step 0: Setup environment config for platform

In [1]:
import os

os.environ["PLATFORM_BASE_URL"] = "https://d8cz3jaef0wc2.cloudfront.net/"
os.environ["PLATFORM_APP_URL"] = "https://lpro-git-staging-llama-index.vercel.app/"
os.environ["PLATFORM_API_KEY"] = "llx-gQW14ec1giEhlBNx6ndcEjEIOsBsVIoQoklF3uc9Fer48DN2"

## Step 1: Configure ingestion pipeline (data source, transformations)

In [2]:
from llama_index.ingestion import IngestionPipeline
from llama_index.readers import SimpleDirectoryReader
from llama_index.node_parser import SentenceSplitter
from llama_index.embeddings import OpenAIEmbedding

In [3]:
reader = SimpleDirectoryReader(input_files=['data_pg/source_files/source.txt'])
docs = reader.load_data()

In [4]:
pg_pipeline = IngestionPipeline(
    project_name='paul graham', 
    name='essay',
    documents=docs,
    transformations=[
        SentenceSplitter(),
        OpenAIEmbedding(),
    ]
)

## Running ingestion locally

Let's try running this locally first.

In [7]:
nodes = pg_pipeline.run()

This runs quickly, since data volume is small.  
We can inspect the processed nodes, but limited by the notebook interface.

In [8]:
print(f'Ingested {len(nodes)} nodes')

Ingested 21 nodes


In [9]:
print(nodes[0])

Node ID: 460df4d9-8bd9-454f-9d9e-df3bdf103bb7
Text: What I Worked On  February 2021  Before college the two main
things I worked on, outside of school, were writing and programming. I
didn't write essays. I wrote what beginning writers were supposed to
write then, and probably still are: short stories. My stories were
awful. They had hardly any plot, just characters with strong feelings,
which I ...


In [10]:
print(docs[0])

Doc ID: 7226123c-6d49-4454-b3fc-577b4de64a34
Text: What I Worked On  February 2021  Before college the two main
things I worked on, outside of school, were writing and programming. I
didn't write essays. I wrote what beginning writers were supposed to
write then, and probably still are: short stories. My stories were
awful. They had hardly any plot, just characters with strong feelings,
which I ...


### Key challenges for AI engineer
- hard to visualize intermediate data
- hard to tune parameters and understand impact on downstream performance

## Platform Playground

### Register pipeline

We can register the pipeline to Platform Playground, where we can easily experiment with different parameter configurations and evaluate downstream performance.

In [11]:
pg_pipeline_id = pg_pipeline.register()

Pipeline available at: https://lpro-git-staging-llama-index.vercel.app//project/7a2f40f6-8836-4d17-a59c-13a5c6a18260/playground/2b7120bb-f820-4c59-a7bb-019e86630e46


### Pull eval dataset from Llama Hub

In [12]:
from llama_index.llama_dataset import download_llama_dataset
from llama_index.evaluation.eval_utils import upload_eval_dataset

In [13]:
rag_dataset, _ = download_llama_dataset('PaulGrahamEssayDataset', download_dir='data_pg')

In [14]:
# or, load from downloaded dir
# from llama_index.llama_dataset import LabelledRagDataset
# rag_dataset = LabelledRagDataset.from_json("./data_pg/rag_dataset.json")
questions = [example.query for example in rag_dataset.examples[:5]]

In [15]:
for ind, question in enumerate(questions):
    print(f"{ind + 1}. {question}")

1. In the essay, the author mentions his early experiences with programming. Describe the first computer he used for programming, the language he used, and the challenges he faced.
2. The author switched his major from philosophy to AI during his college years. What were the two specific influences that led him to develop an interest in AI? Provide a brief description of each.
3. In the essay, the author discusses his initial interest in AI and his eventual disillusionment with it. According to the author, what were the two main influences that initially drew him to AI and what realization led him to believe that the approach to AI during his time was a hoax?
4. The author mentions his shift of interest towards Lisp, a programming language. What reasons does he provide for this shift and how did he further his understanding of Lisp?
5. In the essay, the author mentions his interest in both computer science and art. Discuss how he attempts to reconcile these two interests during his tim

### Upload eval dataset

Let's upload a list of evaluation questions we have curated to help us iterate on the ingestion pipeline.

In [16]:
upload_eval_dataset(
    project_name='paul graham',
    dataset_name='AI generated - 5 questions',
    questions=questions,
    overwrite=True,
)

Uploaded 5 questions to dataset AI generated - 5 questions


'e365ea92-03d8-496d-8c72-9d9e91436c31'

### Download pipeline

After we tune our pipeline with the desired evals, we can download the config back to local development.

In [24]:
new_pg_pipeline = IngestionPipeline.from_pipeline_name(
    project_name='paul graham', 
    name='essay',
)

In [25]:
new_nodes = new_pg_pipeline.run(documents=docs)

In [26]:
len(new_nodes)

36

We can now build our query engine as before.

In [27]:
from llama_index.indices import VectorStoreIndex

In [28]:
index = VectorStoreIndex(nodes=new_nodes)

In [29]:
query_engine = index.as_query_engine()

In [32]:
response = query_engine.query("""\
In the essay, the author mentions his early experiences with programming. 
Describe the first computer he used for programming, the language he used, and the challenges he faced.
""")

In [33]:
print(response)

The author mentions his early experiences with programming in the essay. He describes using the IBM 1401 computer in 9th grade, which was located in the basement of his junior high school. The language he used was an early version of Fortran. He mentions being puzzled by the 1401 and not knowing what to do with it. Since the only form of input to programs was data stored on punched cards, and he didn't have any data stored on punched cards, he couldn't do much with it. The only other option was to do things that didn't rely on any input, like calculating approximations of pi, but he didn't know enough math to do anything interesting of that type.
