# Hands-on 2: Metadata & Indexing

### 🎯 Problem

Create a function with those requirements:​
- ⚡ Input: path of paul_graham_essay.txt​
- 🔄 Output: index able to answer correctly to these questions: ​
    - "Who is the author of the book?​"
    - "What inspired the author to switch from studying philosophy to studying AI in college?​"
    - "What would the author say about art vs. Engineering?​"
    - "Why did the author have to learn Italian?​"
    - "Why the author was in Florence?​"

### 🔍 Suggested tasks
- 📥 Ingest text from file with the right reader​
- 🔎 Filter useless information ​
- ✂️ Split the page content with the best parser​
- 📋 Extract metadata for each chunk​
- 🏗️ Build the index​
- ✅ Test the index on provided questions

## Code

In [None]:
!mkdir -p 'data/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham_essay.txt'

In [None]:
%pip install llama-index>=0.11.20

In [None]:
# must be run before any asyncio code is run
import nest_asyncio
nest_asyncio.apply()

In [None]:
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import VectorStoreIndex
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
from rich import print as rprint
import os
# add your imports here...

In [None]:
# set the OPENAI_API_KEY
os.environ["OPENAI_API_KEY"] = "here your openai api key"

In [None]:
Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

In [None]:

def create_index(document_path: str) -> VectorStoreIndex:
    # add your code here...
    # 1. create a pipeline
    # 2. load the documents
    # 3. run the pipeline
    # 4. create the index
    raise NotImplementedError

In [None]:
document_path = "TODO" 

index = create_index(document_path)

engine = index.as_query_engine()

In [None]:
rprint(engine.query("Who is the author of the book?").response)

In [None]:
rprint(engine.query("What inspired the author to switch from studying philosophy to studying AI in college?").response)

In [None]:
rprint(engine.query("What would the author say about art vs. engineering?").response)

In [None]:
rprint(engine.query("Why did the author have to learn italian?").response)

In [None]:
rprint(engine.query("Why the author was in Florence?").response)