[Linktext](https://)# Q&A on the content of a bunch of files
Goal:
- Read all files in a folder containing text (pdf, word, markdown, ...)
- Ask questions on the content of the files
- Avoid any answers based on information that is not in the source data

## Install dependencies and setup environment

In [4]:
!pip install -qU llama-index datasets openai transformers cohere pypdf Markdown docx2txt llama-index-readers-file
# !pip install langchain langchainhub llama-index-llms-langchain
!pip install llama-index-llms-ollama


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Collecting llama-index-llms-ollama
  Obtaining dependency information for llama-index-llms-ollama from https://files.pythonhosted.org/packages/4f/45/f37075b0b075c56d85c8a6868f4641bdc610e66bf6e056f7021713266be9/llama_index_llms_ollama-0.1.5-py3-none-any.whl.metadata
  Downloading llama_index_llms_ollama-0.1.5-py3-none-any.whl.metadata (585 bytes)
Downloading llama_index_llms_ollama-0.1.5-py3-none-any.whl (3.6 kB)
Installing collected packages: llama-index-llms-ollama
Successfully installed llama-index-llms-ollama-0.1.5

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m

## Setup API-Key for OpenAI

In [7]:
import os

os.environ['OPENAI_API_KEY'] = ''  # platform.openai.com

## Load documents
Documents are a container provides by LlamaIndex around the actual source files.
We read all documents uploaded to the colab instance using the SimpleDirectoryReader.


In [6]:
from llama_index.core import SimpleDirectoryReader

filename_fn = lambda filename: {"file_name": filename}
documents = SimpleDirectoryReader("../data/pdf", file_metadata=filename_fn).load_data()

print(documents[0])
len(documents)

Doc ID: 13283e98-a2ec-4494-96b7-87dfb839fe16
Text: Portable Data Recorder   HMG 4000       Operating Manual
(Translation of o riginal instructions )


145

## Parsing documents and creating embeddings
This process involves setting up a TextSplitter to chunk up the source data, configuring an EmbeddingModel to define the desired embedding, and creating a GPTVectorStore to convert the documents into embeddings.

In [None]:
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.extractors import TitleExtractor, KeywordExtractor
from llama_index.core.ingestion import IngestionPipeline

# Define metadata extractors
transformations = [
    SentenceSplitter(),
    #TitleExtractor(nodes=5),
    #KeywordExtractor(keywords=10),
    #OpenAIEmbedding(model='text-embedding-3-large', embed_batch_size=100)
]

# Create ingestion pipeline
pipeline = IngestionPipeline(transformations=transformations)

nodes = pipeline.run(
    documents=documents,
    show_progress=True
    )

print(nodes[0].metadata)

Parsing nodes:   0%|          | 0/2 [00:00<?, ?it/s]

{'file_name': '/content/converted.docx'}


In [None]:
from llama_index.core import VectorStoreIndex
from llama_index.embeddings.openai import OpenAIEmbedding

embed_model = OpenAIEmbedding(model='text-embedding-3-large', embed_batch_size=100)

index = VectorStoreIndex.from_documents(
    documents,
    embed_model=embed_model
)

In [None]:
query_engine = index.as_query_engine()

res = query_engine.query("How to init and get data from analog inputs on TTC 500 in C ?")
print(res)

To initialize and retrieve data from analog inputs on TTC 500 in C, you can follow these steps:

1. Call the function `IO_Driver_Init()` as the first function during initialization to initialize the driver.
2. Use the function `IO_ADC_ChannelInit()` to set up the desired ADC channel with parameters such as channel number, input type, input range, and pull-up/down configuration.
3. Periodically call the task function, where you can include calls to driver task functions.
4. Within the task function, use `IO_ADC_Get()` to retrieve the raw ADC value from the desired ADC channel.
5. Convert the raw ADC value to temperature in degrees Celsius using the function `IO_ADC_BoardTempSbyte()`.
6. Handle any errors or safety callbacks as needed based on the application requirements.

By following these steps, you can successfully initialize and obtain data from analog inputs on TTC 500 in C.


## Querying

In [None]:
from IPython.display import Markdown

query_engine = index.as_query_engine()
#chat_engine = index.as_chat_engine()

# define prompt viewing function
def display_prompt_dict(prompts_dict):
    for k, p in prompts_dict.items():
        text_md = f"**Prompt Key**: {k}<br>" f"**Text:** <br>"
        display(Markdown(text_md))
        print(p.get_template())
        display(Markdown("<br><br>"))


prompts_dict = query_engine.get_prompts()
display_prompt_dict(prompts_dict)

from langchain import hub
langchain_prompt = hub.pull("rlm/rag-prompt")

from llama_index.core.prompts import LangchainPromptTemplate FewShotPromptTemplate

lc_prompt_tmpl = LangchainPromptTemplate(
    template=langchain_prompt,
    template_var_mappings={"query_str": "question", "context_str": "context"},
)

query_engine.update_prompts(
    {"response_synthesizer:text_qa_template": lc_prompt_tmpl}
)
prompts_dict = query_engine.get_prompts()
display_prompt_dict(prompts_dict)
#res = query_engine.chat("Geb mir eine Zusammenfassung des -Evaluation- Kapitel")
res = query_engine.query("Wie sah die Aufnahmeprüfung zur Volksschule aus?")
print(res)
#res = query_engine.query("Welches State Management Nutzt das Flutter Frontend und in welcher Beziehung steht es zur Schichten-Architektur?")

**Prompt Key**: response_synthesizer:text_qa_template<br>**Text:** <br>

Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {query_str}
Answer: 


<br><br>

**Prompt Key**: response_synthesizer:refine_template<br>**Text:** <br>

The original query is as follows: {query_str}
We have provided an existing answer: {existing_answer}
We have the opportunity to refine the existing answer (only if needed) with some more context below.
------------
{context_msg}
------------
Given the new context, refine the original answer to better answer the query. If the context isn't useful, return the original answer.
Refined Answer: 


<br><br>

**Prompt Key**: response_synthesizer:text_qa_template<br>**Text:** <br>

input_variables=['context', 'question'] metadata={'lc_hub_owner': 'rlm', 'lc_hub_repo': 'rag-prompt', 'lc_hub_commit_hash': '50442af133e61576e74536c6556cefe1fac147cad032f4377b60c436e6cdcb6e'} messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'question'], template="You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.\nQuestion: {question} \nContext: {context} \nAnswer:"))]


<br><br>

**Prompt Key**: response_synthesizer:refine_template<br>**Text:** <br>

The original query is as follows: {query_str}
We have provided an existing answer: {existing_answer}
We have the opportunity to refine the existing answer (only if needed) with some more context below.
------------
{context_msg}
------------
Given the new context, refine the original answer to better answer the query. If the context isn't useful, return the original answer.
Refined Answer: 


<br><br>

Die Aufnahmeprüfung zur Volksschule bestand aus einem Diktat, einer Matheaufgabe und einem Aufsatz. Arbeiterkinder mussten einen großen Schritt machen, um sich überhaupt anzumelden und die Prüfung zu bestehen. Die Schulkarriere war für viele eine Herausforderung, da die meisten Schüler auf dem Weg zum Abitur scheiterten.


Query-Engines can be customized with custom node_postprocessors and retrievers.
Also the response-mode defines how the llm

In [None]:
from llama_index import (
    VectorStoreIndex,
    get_response_synthesizer,
)
from llama_index.retrievers import VectorIndexRetriever
from llama_index.query_engine import RetrieverQueryEngine
from llama_index.postprocessor import SimilarityPostprocessor

# configure retriever
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=10,
)

# configure response synthesizer
response_synthesizer = get_response_synthesizer()

# assemble query engine
custom_query_engine = RetrieverQueryEngine.from_args(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
    node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.7)],
    response_mode="tree_summarize"
)

# query
response = query_engine.query("Geb mir eine Zusammenfassung des -Evaluation- Kapitel")
print(response)

Das Evaluation-Kapitel befasst sich mit der Bewertung der entwickelten Lösung anhand verschiedener Merkmale. Zunächst wird die funktionale Vollständigkeit betrachtet, bei der überprüft wird, ob alle zuvor definierten Ziele erreicht wurden. Es wird festgestellt, dass der Anmelde- und Registrierungsprozess nur im xCollect Frontend verfügbar ist und neue Nutzer in der Appwrite Web-Oberfläche angelegt werden müssen. Das Anlegen und Löschen von Kontext-Analysen ist in der Session-Übersicht von xCollect möglich, jedoch fehlt noch die Funktionalität des automatischen Löschens von Kontext-Analysen. Das Hinzufügen und Synchronisieren von Medien ist enthalten, ebenso wie das gleichzeitige Aufnehmen von Audio-Aufnahmen und Videos. Das Hinzufügen von externen Fotos und Videos ist sowohl in der Desktop- als auch in der Mobile-Anwendung möglich. Daten können exportiert werden, wobei in der Desktop-Anwendung der File Explorer bzw. Finder genutzt wird und in der Mobile-Version die Foto-Galerie. Der Of