# First llamaindex tutorial

Prepare to import your API key

In [None]:
pip install python-dotenv



Similar to langchain llamaindex provides a lot of functionality to easily work with LLMs. It is one of the most popular frameworks besides langchain

In [None]:
pip install llama-index

In [None]:
#skip this section if you have no openai api key
import getpass
import os
from dotenv import load_dotenv

load_dotenv(os.path.expanduser("~/Projekte/MOOC/OpenCampus/codespace/.env"))

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass.getpass(
        prompt="Enter your OpenAI API key (required if using OpenAI): "
    )

Let's ask the model about if it knows about opencampus.sh

In [None]:
from llama_index.llms.openai import OpenAI

response = OpenAI(model="gpt-4o-mini").complete("What is opencampus.sh ")
print(response)

Uhh that is kind of a bummer, it does not even know about us. Probably because opencampus.sh is only fewly mentioned on any website etc.

Let's try out the new 4.1 model

In [None]:
from llama_index.core.llms import ChatMessage
llm = OpenAI(model="gpt-4.1-mini")
messages = [
    ChatMessage(role="system", content="You are a helpful assistant."),
    ChatMessage(role="user", content="What is opencampus.sh"),
]
chat_response = llm.chat(messages)

In [None]:
print(chat_response)

Hmm seems to go a bit into hallucinations here

## Olama

In [None]:
pip install llama-index-llms-ollama

In [None]:
from llama_index.llms.ollama import Ollama
from llama_index.core.llms import ChatMessage

llm = Ollama(model="gemma3:1b", request_timeout=60.0)

messages = [
    ChatMessage(role="system", content="You are a helpful assistant."),
    ChatMessage(role="user", content="What is opencampus.sh"),
]

chat_response = llm.chat(messages)
print(chat_response)

Oh dear..

## Conclusion

We have libraries which wrap the llm call into functions and classes of their liking. We gain that we can easily switch between different model providers but loose at lot of flexibility.

Made with GitHub Copilot
# Adding RAG with Cosine Similarity to the Notebook

In [None]:
# Install additional required packages
#!pip install llama-index-vector-stores-simple numpy
!pip install llama-index numpy

Then, let's add code to load documents and create an index:

In [None]:
# Create a directory with some sample documents about OpenCampus.sh
import os

# Create a documents directory if it doesn't exist
os.makedirs("documents", exist_ok=True)

# Create a sample document about OpenCampus.sh
with open("documents/opencampus_info.txt", "w") as f:
    f.write("""
OpenCampus.sh is an innovative educational initiative based in Hamburg, Germany. 
It offers a variety of open and accessible learning opportunities in tech, 
digital skills, and modern professions. The platform provides MOOCs (Massive Open Online Courses), 
workshops, and project-based learning experiences that are often free or low-cost.

Key characteristics of OpenCampus.sh:
1. Community-focused education model
2. Collaboration with industry partners and universities
3. Practical, hands-on learning experiences
4. Focus on modern digital skills and technologies
5. Based in Hamburg but with online course offerings

OpenCampus.sh aims to democratize access to education and bridge the gap between academic 
knowledge and practical industry requirements in the digital age.
""")

Now let's implement document loading and indexing with cosine similarity:

In [None]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
# from llama_index.vector_stores.simple import SimpleVectorStore
from llama_index.core.vector_stores.simple import SimpleVectorStore
from llama_index.core.vector_stores import MetadataFilters
from llama_index.core.vector_stores import FilterOperator, ExactMatchFilter
from llama_index.embeddings.openai import OpenAIEmbedding
import numpy as np

# Load documents
documents = SimpleDirectoryReader("documents").load_data()
print(f"Loaded {len(documents)} documents")

# Create embedding model
embed_model = OpenAIEmbedding(model="text-embedding-3-small")

# Create a vector store with cosine similarity as the distance function
vector_store = SimpleVectorStore(similarity_top_k=2, vector_store_type="dict", distance_fn="cosine")

# Create and populate the index
index = VectorStoreIndex.from_documents(
    documents,
    embed_model=embed_model,
    vector_store=vector_store,
    show_progress=True
)

# Save the index
index.storage_context.persist("index_cosine")

Now let's add a function to perform RAG with cosine similarity:

In [None]:
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.retrievers import VectorIndexRetriever

# Create a retriever with cosine similarity
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=2,
)

# Create a query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    llm=OpenAI(model="gpt-4o-mini")
)

# Now let's query about OpenCampus.sh with our RAG system
response = query_engine.query("What is OpenCampus.sh and what does it do?")
print(response)

Let's also add a function to demonstrate how cosine similarity works under the hood:

In [None]:
def cosine_similarity_demo():
    """Demonstrate how cosine similarity works with embeddings"""
    # Get embeddings for two related texts
    text1 = "OpenCampus.sh provides educational opportunities."
    text2 = "Education and learning are offered by OpenCampus."
    text3 = "Hamburg is a city in northern Germany."
    
    # Get embeddings
    embedding1 = embed_model.get_text_embedding(text1)
    embedding2 = embed_model.get_text_embedding(text2)
    embedding3 = embed_model.get_text_embedding(text3)
    
    # Convert to numpy arrays for easier calculation
    emb1 = np.array(embedding1)
    emb2 = np.array(embedding2)
    emb3 = np.array(embedding3)
    
    # Calculate cosine similarity
    def cosine_sim(a, b):
        return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
    
    sim_1_2 = cosine_sim(emb1, emb2)
    sim_1_3 = cosine_sim(emb1, emb3)
    sim_2_3 = cosine_sim(emb2, emb3)
    
    print(f"Similarity between related texts: {sim_1_2:.4f}")
    print(f"Similarity between text1 and unrelated text: {sim_1_3:.4f}")
    print(f"Similarity between text2 and unrelated text: {sim_2_3:.4f}")
    
    return "As shown, cosine similarity gives higher scores to semantically related texts"

# Run the demonstration
cosine_similarity_demo()

Finally, let's add a cell to compare RAG results with the previous direct LLM queries:

In [None]:
# Compare RAG results with direct LLM query
print("--- Direct LLM Query ---")
direct_response = OpenAI(model="gpt-4o-mini").complete("What is OpenCampus.sh?")
print(direct_response)

print("\n--- RAG with Cosine Similarity ---")
rag_response = query_engine.query("What is OpenCampus.sh?")
print(rag_response)