In [None]:
"""
This script loads environment variables from a .env file and retrieves your OpenAI API key 
securely from your system's environment variables.

Requirements:
- python-dotenv package
- A .env file in your project root containing: OPENAI_API_KEY=<your_api_key>

Example .env:
    OPENAI_API_KEY="sk-xxxxxxx"

Usage:
    python your_script.py
"""

# Import the built-in 'os' module for interacting with the operating system (e.g., environment variables)
import os

# Import 'load_dotenv' to load variables from a .env file,
# and 'find_dotenv' to locate the .env file automatically.
from dotenv import load_dotenv, find_dotenv

# Locate the .env file and load its contents so that variables are available as environment variables.
_ = load_dotenv(find_dotenv())

# Retrieve the OpenAI API key from the environment variables.
# This avoids hardcoding sensitive keys in your code.
openai_api_key = os.environ["OPENAI_API_KEY"]


## Load private document

In [None]:
"""
This script uses LlamaIndex (formerly GPT Index) to:

1️⃣ Import key classes for building and loading vector indexes.
2️⃣ Read all documents from a local 'data' folder.

Requirements:
- llama-index==0.12.39
- A 'data/' directory in your project root containing text files or documents to index.

Usage:
    python your_script.py
"""

# Import core LlamaIndex components:
# - VectorStoreIndex: for creating a vector-based index of documents.
# - SimpleDirectoryReader: for reading files from a folder.
# - StorageContext and load_index_from_storage: for saving/loading indexes efficiently.
from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    load_index_from_storage,
)

# Create a SimpleDirectoryReader instance pointing to the 'data' folder.
# It automatically finds and reads text files (e.g., .txt, .md, .pdf if supported).
documents = SimpleDirectoryReader("data").load_data()


## Create vector database

In [None]:
"""
This line builds a vector-based index from the loaded documents using LlamaIndex.

What happens here?
📚 -> 🧮 -> ⚡️

1️⃣ Takes your raw documents.
2️⃣ Converts them into numerical embeddings (vectors).
3️⃣ Stores them in a vector index, making them easy for LLMs to search and retrieve relevant chunks.

Usage:
    index = VectorStoreIndex.from_documents(documents)
"""

# Create a new vector index from the list of documents.
# Under the hood:
# - It chunks large documents if needed.
# - Embeds each chunk using the chosen embedding model.
# - Stores the embeddings in memory (or later on disk).
index = VectorStoreIndex.from_documents(documents)


## Ask questions to private document

In [None]:
"""
This block turns the vector index into an interactive query engine
and sends it a question: "Summarize the article in 100 words."

Steps:
1️⃣ Converts the index into a query engine capable of semantic search + generation.
2️⃣ Runs your prompt against the engine.
3️⃣ Prints the LLM-powered answer.

Usage:
    python your_script.py
"""

# Create a query engine from the index.
# This engine knows how to:
# - Look up relevant text chunks via embeddings.
# - Feed those chunks to your LLM (e.g., OpenAI).
# - Return a coherent answer.
query_engine = index.as_query_engine()

# Ask the query engine to summarize the whole article in 100 words.
response = query_engine.query("summarize the article in 100 words")

# Print the response from the LLM.
print(response)


The article discusses the importance of creating something people want and not focusing solely on making money in the early stages of a startup. It explores the idea of startups behaving like nonprofits and how benevolence can benefit startups in various ways, such as improving morale, attracting help from others, and aiding decision-making. Examples like Craigslist and Google are used to illustrate how successful companies have incorporated elements of charity. The article emphasizes the significance of mission-driven work, user care, and the tamagotchi effect in sustaining startups through challenges and fostering investor interest.


## See under the hood

In [None]:
"""
This (optional) logging setup helps you see detailed debug info from LlamaIndex 
and other libraries in your console.

When enabled, it:
1️⃣ Sends all logs to your console (stdout).
2️⃣ Sets the logging level to DEBUG — so you see everything, even low-level messages.
3️⃣ Adds an explicit StreamHandler to make sure logs appear nicely.

Usage:
    Uncomment this block for debugging or development.
"""

# Import the built-in logging module to handle logs.
# import logging

# Import sys so you can use sys.stdout (the console output stream).
# import sys

# Configure logging:
# - stream=sys.stdout: print logs to your terminal, not to a file.
# - level=logging.DEBUG: show all messages, even the most verbose.
# logging.basicConfig(
#     stream=sys.stdout, 
#     level=logging.DEBUG
# )

# Add an extra StreamHandler so you can see logs in real time.
# Useful if other handlers suppress them.
# logging.getLogger().addHandler(
#     logging.StreamHandler(stream=sys.stdout)
# )


## Save the vector database

In [None]:
"""
This script does smart index handling for a Generative AI pipeline using LlamaIndex:

✅ Checks if a local vector index already exists in './storage'.
    - If it DOES NOT exist:
        1️⃣ Reads documents from the 'data' folder.
        2️⃣ Creates a fresh vector index.
        3️⃣ Saves (persists) the index to './storage' for reuse.
    - If it DOES exist:
        1️⃣ Loads the prebuilt index from './storage'.

✅ Then, it turns the index into a query engine.

✅ Finally, it queries: "According to the author, what is good?" 
   and prints the LLM-powered answer.

Usage:
    python your_script.py

Requirements:
    - llama-index==0.12.39
    - data/ folder with documents
    - storage/ folder is created automatically on first run
"""

# Import the 'os.path' module to work with filesystem paths.
import os.path

# Check if a persistent index already exists in the 'storage' directory.
if not os.path.exists("./storage"):
    # If storage doesn't exist:
    # 1. Read all documents from the 'data' folder.
    documents = SimpleDirectoryReader("data").load_data()
    
    # 2. Create a new vector index from the documents.
    index = VectorStoreIndex.from_documents(documents)
    
    # 3. Persist (save) the index and its embeddings to disk for next time.
    index.storage_context.persist()
else:
    # If storage exists:
    # 1. Create a StorageContext using the saved folder.
    storage_context = StorageContext.from_defaults(persist_dir="./storage")
    
    # 2. Load the existing index from disk.
    index = load_index_from_storage(storage_context)

# Now, whether we built it or loaded it, we have an index ready for querying.

# Create a query engine from the index.
query_engine = index.as_query_engine()

# Run a semantic query: "According to the author, what is good?"
response = query_engine.query("According to the author, what is good?")

# Print the LLM's answer.
print(response)


According to the author, being good is not about claiming to be a particularly good person or having sanctimonious intentions. Instead, being good is suggested because it works effectively as a guide to strategy, a design spec for software, and a valuable approach for startups.


## Customization options
* parse into smaller chunks
* use a different vector store
* retrieve more context when I query
* use a different LLM
* use a different response mode
* stream the response back

## Use cases
* QA
* Chatbot
* Agent
* Structured Data Extraction
* Multimodal

## Optimizing
* Advanced Retrieval Strategies
* Evaluation
* Building performant RAG applications for production

## Other
* LlamaPacks and Create-llama = LangChain templates
* Very recent, still in beta
* Very interesting: create-llama allows you to create a Vercel app!
* Very interesting: open-source end-to-end project (SEC Insights)
    * llamaindex + react/nextjs (vercel) + fastAPI + render + AWS
    * environment setup: localStack + docker
    * monitoring: sentry
    * load testing: loader.io
    * [web](https://www.secinsights.ai/)
    * [code](https://github.com/run-llama/sec-insights)