## Initial Setup

We initializes three key components:

- **LLM Chat Model:**  
  Uses `ChatOllama` to interact with a local model (`llama3.2:latest` at `http://192.168.1.99:11434`) with a temperature of 0 for consistent responses.

- **Embedding Model:**  
  Uses `OllamaEmbeddings` to convert text into semantic vectors for similarity search and contextual understanding.

- **Vector Store:**  
  Uses `InMemoryVectorStore` to store embeddings in memory, enabling quick retrieval for tasks like semantic search or retrieval-augmented generation.

Together, these components provide a simple, flexible architecture ideal for building chatbots, semantic search systems, and more.


In [None]:
from langchain_ollama import OllamaEmbeddings
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_ollama.chat_models import ChatOllama

OLLAMA_BASEURL='http://192.168.1.99:11434/'
OLLAMA_MODEL='llama3.2:latest'

llm = ChatOllama(
        model = OLLAMA_MODEL,
        temperature = 0,
        base_url = OLLAMA_BASEURL
    )

embeddings = OllamaEmbeddings(
    model=OLLAMA_MODEL,
    base_url = OLLAMA_BASEURL
)

vector_store = InMemoryVectorStore(embeddings)

## Load PDF into Knowledge Base

This code snippet performs three key steps:

- **Load PDF:**  
  Uses `PyPDFLoader` to read a PDF file and extract its pages as documents.

- **Split Text:**  
  Utilizes `RecursiveCharacterTextSplitter` to break the document text into smaller chunks (1000 characters each with a 200-character overlap) and adds the start index for tracking.

- **Store Chunks:**  
  Saves the resulting text splits into a pre-initialized vector store (via `vector_store.add_documents`), making them available for later retrieval tasks.

It processes each file in the `knowledge_base_items` list and logs the progress before confirming that the knowledge base is ready.


In [None]:
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200, add_start_index=True
)

def load_to_knowledge(file_path):
    print(f"\nProcessing {file_path}")
    loader = PyPDFLoader(file_path)
    docs = loader.load()
    print(f'\t{len(docs)} pages found')

    splits = text_splitter.split_documents(docs)
    print(f'\t{len(splits)} splits generated')

    print('Saving splits to Vectore Store...')
    vector_store.add_documents(documents=splits)
    print('\tSaved.\n')

In [None]:
knowledge_base_items = [
    'docs/SIRA_CV_EN.pdf'
]

for item in knowledge_base_items:
    load_to_knowledge(item)

print('Knowledge base ready')

## Generate and Refine Concise Answers Using Retrieval

This code demonstrates two approaches to generate a brief answer (max 10 words) for a given question.

1. **Direct Document Chain:**  
   - **Prompt Template:** A chat prompt instructs the LLM to answer succinctly using provided placeholders for context and question.
   - **Chain Creation:** The `create_stuff_documents_chain` builds a chain with the LLM and prompt.
   - **Invocation:** When invoked with an empty context, the chain relies solely on the LLM’s internal knowledge.


In [None]:
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.prompts import ChatPromptTemplate

# https://smith.langchain.com/hub/rlm/rag-prompt
# prompt = hub.pull("rlm/rag-prompt")

prompt = ChatPromptTemplate.from_template("""
Answer the user's question trying to be as synthetic as possible. Use at max 10 words:
Context: {context}
Question: {input}
""")

chain = create_stuff_documents_chain(
    llm = llm,
    prompt = prompt
)

response = chain.invoke({"context": [], "input": 'Which project is Antonino Sirchia working at the moment?'})
print(response)

2. **Retrieval-Enhanced Chain:**  
   - **Retriever:** Converts the vector store into a retriever to extract relevant documents.
   - **Retrieval Chain:** The `create_retrieval_chain` wraps the original chain, injecting retrieved context automatically.
   - **Invocation:** Running the retrieval chain provides a potentially more informed answer by leveraging external context.

In [None]:
from langchain.chains import create_retrieval_chain

retriever = vector_store.as_retriever()

retrieval_chain = create_retrieval_chain(retriever, chain)

response = retrieval_chain.invoke({'input': 'Which project is Antonino Sirchia working at the moment?'})
print(response['answer'])

**Key Difference:**  
The second snippet shows how the answer can differ when relevant context is automatically provided by the retriever, compared to relying only on the LLM’s built-in knowledge.