# LlamaIndex: RAG over Unstructured Documents

- Question-Answering (RAG)
  - https://docs.llamaindex.ai/en/stable/use_cases/q_and_a/
- RAG over Unstructured Documents<br>
  - **[Semantic search](https://docs.llamaindex.ai/en/stable/understanding/putting_it_all_together/q_and_a/#semantic-search)**
  - **[Summarization](https://docs.llamaindex.ai/en/stable/understanding/putting_it_all_together/q_and_a/#summarization)**

LlamaIndex can pull in unstructured text, PDFs, Notion and Slack documents and more and index the data within them.<br>
The simplest queries involve either semantic search or summarization.
- **Semantic search**: A query about specific information in a document that matches the query terms and/or semantic intent.<br>This is typically executed with simple vector retrieval (top-k).
- **Summarization**: condensing a large amount of data into a short summary relevant to your current question.

## SETUP

In [1]:
import os
from dotenv import load_dotenv

# Load environment variables (for API key)
load_dotenv()

# Set up OpenAI API key
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
    raise ValueError("Please set the OPENAI_API_KEY environment variable or add it to a .env file")

# Define the model to use
MODEL_GPT = "gpt-4o-mini"

# Q&A patterns (Semantic Search, Summarization)

## Semantic Search
The most basic example usage of LlamaIndex is through semantic search.<br>
We provide a simple in-memory vector store for you to get started, but you can also choose to use any one of our **vector store integrations**.<br>

**Using Vector Stores**
- https://docs.llamaindex.ai/en/stable/community/integrations/vector_stores/

**Vector Stores**
- https://docs.llamaindex.ai/en/stable/module_guides/storing/vector_stores/

In [2]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# documents = SimpleDirectoryReader("data").load_data()
reader = SimpleDirectoryReader(
    input_files=["./data/paul_graham_essay.txt"]
)
documents = reader.load_data()

In [3]:
print(len(documents))

1


In [4]:
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

In [5]:
response = query_engine.query("What did the author do growing up?")
print(response)

The author focused on writing and programming before college.


### Setting up LlamaIndex with GPT-4o-mini
**OpenAI Pricing**<br>
- https://platform.openai.com/docs/pricing

**OpenAI models**<br>
- https://platform.openai.com/docs/models

**OpenAI Embedding models**<br>
- https://platform.openai.com/docs/guides/embeddings#embedding-models

Model (Pages per dollar) assuming ~800 tokens per page
- text-embedding-3-small (62,500)
- text-embedding-3-large (9,615)
- text-embedding-ada-002 (12,500)

In [6]:
import os
from llama_index.core import Settings, VectorStoreIndex
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# Configure the embedding model
embed_model = OpenAIEmbedding(
    model="text-embedding-3-small",  # Using text-embedding-3-small for embeddings
)

# Configure GPT-4o-mini as the LLM
llm = OpenAI(
    model="gpt-4o-mini",
    temperature=0.1,
)

# Update the global settings
Settings.llm = llm
Settings.embed_model = embed_model

In [7]:
# Now when you create your VectorStoreIndex, it will use these models
# documents = load_your_documents()
# index = VectorStoreIndex.from_documents(documents)

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

In [8]:
response = query_engine.query("What did the author do growing up, answer in Czech language in 100 words?")
print(response)

Autor vyrůstal v prostředí, kde se věnoval malování a umění. Navštěvoval Accademii, kde se učil malovat, a jeho zájem o malbu ho vedl k tomu, že začal malovat still life v noci ve svém pokoji. Maloval na zbytky plátna, protože si nemohl dovolit nové. Během svého studia se také setkal s modelkou, která žila poblíž a pracovala jako modelka a malířka kopií starých obrazů. Jeho zkušenosti z malování mu umožnily lépe vnímat detaily a krásu každodenního života, což ovlivnilo jeho umělecký styl a přístup k malbě.


## Summarization