# LlamaIndex Platform Demo

## Step 0: Setup environment config for platform

In [1]:
import os

os.environ["LLAMA_CLOUD_API_KEY"] = "llx-"
os.environ["OPENAI_API_KEY"] = "sk-"

## Step 1: Configure ingestion pipeline (data source, transformations)

In [2]:
from llama_index.core.ingestion import IngestionPipeline
from llama_index.core import SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.openai import OpenAIEmbedding

In [3]:
reader = SimpleDirectoryReader(input_files=['data_pg/source_files/source.txt'])
docs = reader.load_data()

In [4]:
pg_pipeline = IngestionPipeline(
    project_name='paul graham', 
    name='essay',
    documents=docs,
    transformations=[
        SentenceSplitter(),
        OpenAIEmbedding(),
    ]
)

## Running ingestion locally

Let's try running this locally first.

In [5]:
nodes = pg_pipeline.run()

This runs quickly, since data volume is small.  
We can inspect the processed nodes, but limited by the notebook interface.

In [6]:
print(f'Ingested {len(nodes)} nodes')

Ingested 21 nodes


In [7]:
print(nodes[0])

Node ID: badb3184-1bac-4674-a28d-b9d60ea1d57c
Text: What I Worked On  February 2021  Before college the two main
things I worked on, outside of school, were writing and programming. I
didn't write essays. I wrote what beginning writers were supposed to
write then, and probably still are: short stories. My stories were
awful. They had hardly any plot, just characters with strong feelings,
which I ...


In [8]:
nodes[0].metadata

{'file_path': 'data_pg/source_files/source.txt',
 'file_name': 'source.txt',
 'file_type': 'text/plain',
 'file_size': 75084,
 'creation_date': '2023-12-15',
 'last_modified_date': '2023-12-15',
 'last_accessed_date': '2024-02-09',
 'ref_doc_id': '620ee62d-3a14-41f6-a9ba-086a7ea5ddcb',
 'document_title': ''}

### Key challenges for AI engineer
- hard to visualize intermediate data
- hard to tune parameters and understand impact on downstream performance

## Platform Playground

### Register pipeline

We can register the pipeline to Platform Playground, where we can easily experiment with different parameter configurations and evaluate downstream performance.

In [9]:
pg_pipeline_id = pg_pipeline.register()

Pipeline available at: https://cloud.llamaindex.ai/project/5e1a294d-a1f9-4c47-96d9-a2c71bd7acbf/playground/5bb3ff13-63df-43d2-a25e-755890092c00


### Pull eval dataset from Llama Hub

In [16]:
from llama_index.core.llama_dataset import download_llama_dataset
from llama_index.core.evaluation.eval_utils import upload_eval_dataset

In [17]:
rag_dataset, _ = download_llama_dataset('PaulGrahamEssayDataset', download_dir='data_pg')

In [18]:
# or, load from downloaded dir
# from llama_index.llama_dataset import LabelledRagDataset
# rag_dataset = LabelledRagDataset.from_json("./data_pg/rag_dataset.json")
questions = [example.query for example in rag_dataset.examples[:5]]

In [19]:
for ind, question in enumerate(questions):
    print(f"{ind + 1}. {question}")

1. In the essay, the author mentions his early experiences with programming. Describe the first computer he used for programming, the language he used, and the challenges he faced.
2. The author switched his major from philosophy to AI during his college years. What were the two specific influences that led him to develop an interest in AI? Provide a brief description of each.
3. In the essay, the author discusses his initial interest in AI and his eventual disillusionment with it. According to the author, what were the two main influences that initially drew him to AI and what realization led him to believe that the approach to AI during his time was a hoax?
4. The author mentions his shift of interest towards Lisp, a programming language. What reasons does he provide for this shift and how did he further his understanding of Lisp?
5. In the essay, the author mentions his interest in both computer science and art. Discuss how he attempts to reconcile these two interests during his tim

### Upload eval dataset

Let's upload a list of evaluation questions we have curated to help us iterate on the ingestion pipeline.

In [20]:
upload_eval_dataset(
    project_name='paul graham',
    dataset_name='AI generated - 5 questions',
    questions=questions,
    overwrite=True,
)

Uploaded 5 questions to dataset AI generated - 5 questions


'dbfc40dc-42d3-4338-84ba-08301591fba5'

### Download pipeline

After we tune our pipeline with the desired evals, we can download the config back to local development.

In [21]:
new_pg_pipeline = IngestionPipeline.from_pipeline_name(
    project_name='paul graham', 
    name='essay',
)

In [22]:
new_nodes = new_pg_pipeline.run(documents=docs)

In [23]:
len(new_nodes)

21

In [30]:
!pip3 show llama-index-core

Name: llama-index-core
Version: 0.10.14.post1
Summary: Interface between LLMs and your data
Home-page: https://llamaindex.ai
Author: Jerry Liu
Author-email: jerry@llamaindex.ai
License: MIT
Location: /opt/homebrew/lib/python3.11/site-packages
Requires: aiohttp, dataclasses-json, deprecated, dirtyjson, fsspec, httpx, llamaindex-py-client, nest-asyncio, networkx, nltk, numpy, openai, pandas, pillow, PyYAML, requests, SQLAlchemy, tenacity, tiktoken, tqdm, typing-extensions, typing-inspect
Required-by: llama-index, llama-index-agent-openai, llama-index-embeddings-openai, llama-index-llms-openai, llama-index-multi-modal-llms-openai, llama-index-postprocessor-colbert-rerank, llama-index-postprocessor-flag-embedding-reranker, llama-index-program-openai, llama-index-question-gen-openai, llama-index-readers-file, llama-index-retrievers-bm25, llama-parse


We can now build our query engine as before.

In [24]:
from llama_index.core.indices import VectorStoreIndex

In [25]:
index = VectorStoreIndex(nodes=new_nodes)

In [26]:
query_engine = index.as_query_engine()

In [27]:
response = query_engine.query("""\
In the essay, the author mentions his early experiences with programming. 
Describe the first computer he used for programming, the language he used, and the challenges he faced.
""")

In [28]:
print(response)

The author mentions using the IBM 1401 computer for programming during his early experiences. The language he used on this computer was an early version of Fortran. One of the challenges he faced was the limited input options for programs, as the only form of input was data stored on punched cards, which he did not have access to. This limitation led to difficulties in finding meaningful tasks to perform with the computer.
