## Document Question Answering Notebook

## Setup: 

- Make sure you have installed the required packages (see requirements.txt)
- Make sure you have preprocessed the documents (see README.md)
- You also need a working internet connection to download the AI language models 
- You should have at least 8GB of RAM installed on your machine (Also a GPU is preferred for faster response generation)

## How to run: 
- Run Setup notebook cells for downloading the models and loading the preprocessed documents
- If you run the notebook for the first time, it will take a while to download the models so grab a cup of coffee :)
- The code will automatically decide which model to download based on if a GPU is available or not ( change the device parameter in the "load_inference_model" function to set manually )
- Either choose a predefined example query from the dropdown or enter your own query
- The current standard settings are to extract the 3 top snippets from the documents collection to base the model answer on it

## Visualisation:
- Visualise the output of the model and compare it with the extracted snippets from the documents
- The visualisation is currently highlighting the 5 top sentences that influenced the answer generation



### Setup: 

In [2]:
## automatic reload when imported files change
%load_ext autoreload
%autoreload 2

import sys 
sys.path.append('../')
import rag.rag_retrieval_utils as rag_utils
import rag.rag_visualisation as rag_visualisation
import rag.rag_llm_classes as rag_llm_classes
import pandas as pd 
from sentence_transformers import SentenceTransformer
import json 


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [3]:
## Load embedding model 
model_emb = SentenceTransformer('BAAI/bge-large-zh-v1.5')

## Load LLM model
## either device == 'cpu' or 'cuda' (for GPU) or None for self-selection
llm = rag_llm_classes.load_inference_model(device = None)

## Load document chunks 
input_path = "../datasets/chunked_documents/esa_documents/chunking_cks_1024_ovl_8.json"
docs = rag_utils.load_data(input_path) 

## drop duplicates according to the text column
data = pd.DataFrame(docs)
data.drop_duplicates(subset=['text'], inplace=True)
docs = data.to_json(orient='records')
docs = json.loads(docs)

input_path = "../datasets/example_questions/esa_test_queries.json"
with open(input_path, 'r') as json_file:
    example_queries = json.load(json_file)
    example_queries = [data['question'] for data in example_queries[:9]]



Loading GPU model
Loading GPU model


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

../datasets/chunked_documents/esa_documents/chunking_cks_1024_ovl_8


### Question Answering

In [11]:
import ipywidgets as widgets
from IPython.display import display

# Create a dropdown widget
dropdown = widgets.Dropdown(
    options=example_queries,
    value=example_queries[0],
    description='Example Queries:',
)

# Create a text input widget
text_input = widgets.Text(
    value='',
    placeholder='Type your own query here',
    description='Query:',
)

# Create a button widget
submit_button = widgets.Button(
    description='Submit',
)

# Create a dictionary to store the output
output_storage = {'output': None, 'ordered_docs': None}

# Create a label widget to show the status
status_label = widgets.Label(value='')

# Display the dropdown, text input, button, and status label widgets
display(dropdown)
display(text_input)
display(submit_button)
display(status_label)

# Function to handle query submission
def on_submit(button):
    query = text_input.value.strip()
    if not query:  # If text input is empty, use the dropdown value
        query = dropdown.value.strip()
    
    try:
        # Update status label to indicate processing
        status_label.value = 'Processing...'

        print(f'You selected: "{query}"')
        results, ordered_docs = rag_utils.semantic_search(query, docs, model_emb, top_k=5, verbose=False)
        n_con = 3
        contexts = [doc['text'] for doc in ordered_docs[:n_con]]
        print("___________________________________")
        print("WAIT: Running answer generation...")
        output = llm.act(query, contexts)
        print("Done..")
        
        # Store the output in the dictionary for further analysis
        output_storage['query'] = query
        output_storage['output'] = output
        output_storage['ordered_docs'] = ordered_docs
        output_storage['n_contexts'] = n_con
        # Update status label to indicate completion
        status_label.value = 'Done.'
        print("___________________________________")
        print("ANSWER: ", output)
        print("___________________________________")

    except Exception as e:
        # Handle any exceptions and update the status label
        status_label.value = 'Error occurred.'
        print(f"An error occurred: {e}")
    
    # Reset the text input widget after submission
    text_input.value = ''

# Attach the function to the button click event
submit_button.on_click(on_submit)

Dropdown(description='Example Queries:', options=("How did the L-band hybrid HPA's integration into the L40 pa…

Text(value='', description='Query:', placeholder='Type your own query here')

Button(description='Submit', style=ButtonStyle())

Label(value='')

You selected: "How did the L-band hybrid HPA's integration into the L40 package affect the material of the output combiner's second part, and what was the chosen material for the output combiner in this case?"
___________________________________
WAIT: Running answer generation...
{'question': "How did the L-band hybrid HPA's integration into the L40 package affect the material of the output combiner's second part, and what was the chosen material for the output combiner in this case?", 'prompt': "<|start_header_id|>user<|end_header_id|>\n\nCONTEXT:  load impedance synthesis up to the 3rd harmonic as been obtained with Al2O3 material. In a second time, output combiner downsizing was realized by the use of OxTi material to enable its integration in L40 package. As a consequence, the second part of the output combiner has been replaced with OxTi material as shown on figure below. Figure 4 : Output network schematic (Momentum) of the L-band hybrid HPA 2.1.1.3 RF performance The main RF per

### Visualisation

In [10]:
output = output_storage['output']
ordered_docs = output_storage['ordered_docs']
n_con = output_storage['n_contexts']

extracts = ordered_docs[:n_con]
# Example usage
from IPython.display import display, HTML
queries = [output]
num_top_returned = 5
contexts = [doc['text'] for doc in extracts]
results = rag_visualisation.find_snippet(contexts, queries, num_top_returned)
html_content = rag_visualisation.generate_interactive_html(extracts, results)
display(HTML(html_content))