# LlamaIndex Platform Demo

## Step 0: Setup environment config for platform

In [59]:
import os

os.environ["PLATFORM_BASE_URL"] = "https://d8cz3jaef0wc2.cloudfront.net/"
os.environ["PLATFORM_APP_URL"] = "https://lpro-git-staging-llama-index.vercel.app/"
os.environ["PLATFORM_API_KEY"] = "llx-gQW14ec1giEhlBNx6ndcEjEIOsBsVIoQoklF3uc9Fer48DN2"

## Step 1: Configure ingestion pipeline (data source, transformations)

In [60]:
from llama_index.ingestion import IngestionPipeline
from llama_index.readers import SimpleDirectoryReader
from llama_index.node_parser import SentenceSplitter
from llama_index.embeddings import OpenAIEmbedding

In [129]:
reader = SimpleDirectoryReader(input_files=['data/paul_graham_essay.txt'])
docs = reader.load_data()

In [68]:
pg_pipeline = IngestionPipeline(
    name='pg essay',
    documents=docs,
    transformations=[
        SentenceSplitter(),
        OpenAIEmbedding(),
    ]
)

## Running ingestion locally

Let's try running this locally first.

In [90]:
nodes = pg_pipeline.run()

This runs quickly, since data volume is small.  
We can inspect the processed nodes, but limited by the notebook interface.

In [70]:
print(f'Ingested {len(nodes)} nodes')

Ingested 21 nodes


In [71]:
print(nodes[0])

Node ID: 6fc26e62-e99a-41d9-b224-d1b8ba20a1a4
Text: What I Worked On  February 2021  Before college the two main
things I worked on, outside of school, were writing and programming. I
didn't write essays. I wrote what beginning writers were supposed to
write then, and probably still are: short stories. My stories were
awful. They had hardly any plot, just characters with strong feelings,
which I ...


In [72]:
print(docs[0])

Doc ID: f1e2b4af-a6e2-4572-9ea0-01c74fd420c6
Text: What I Worked On  February 2021  Before college the two main
things I worked on, outside of school, were writing and programming. I
didn't write essays. I wrote what beginning writers were supposed to
write then, and probably still are: short stories. My stories were
awful. They had hardly any plot, just characters with strong feelings,
which I ...


### Key challenges for AI engineer
- hard to visualize intermediate data
- hard to tune parameters and understand impact on downstream performance

## Platform Playground

We can register the pipeline to Platform Playground, where we can easily experiment with different parameter configurations and evaluate downstream performance.

In [143]:
pg_pipeline_id = pg_pipeline.register()

Pipeline available at: https://lpro-git-staging-llama-index.vercel.app//project/196b4c10-936b-4079-a338-c5ab3a22fd81/playground/c879beaf-58fa-472a-b648-a02c63c9ddda


After we tune our pipeline with the desired evals, we can download the config back to local development.

In [150]:
new_pg_pipeline = IngestionPipeline.from_pipeline_name('pg essay')

In [149]:
# new_pg_pipeline.run(documents=docs)

# current broken due to 2 issues
# 1. get back a pipeline with an empty doc attached
# 2. our OpenAI embedding retries and breaks on empty doc