# CS-6120-Final-Project: Book Tutor

## Imports

In [1]:
# Imports

import re
import pandas as pd
import fitz # PyMuPDF
import pprint
import torch
import numpy as np
import faiss

from tqdm import tqdm
from nltk.tokenize import sent_tokenize
from preprocessing_helper import *
from retrieval_helper import *
from model_helper import *
from ablation_helper import *

from sentence_transformers import CrossEncoder

import nltk
nltk.download('punkt_tab')

from transformers import AutoTokenizer, AutoModel

[nltk_data] Error loading punkt_tab: <urlopen error [SSL:
[nltk_data]     CERTIFICATE_VERIFY_FAILED] certificate verify failed:
[nltk_data]     unable to get local issuer certificate (_ssl.c:1006)>


## Loading and Preprocessing

In [2]:
pdf_path = "Driver Manual.pdf"

Create a df from the pdf using a custom function:

In [3]:
df=create_df_from_pdf(pdf_path, method="sentence", num_sentences=10)
df

100%|██████████| 81/81 [00:00<00:00, 250.34it/s]


Unnamed: 0,page_number,char_count,word_count,sentence_count,text
0,"[3, 4, 5, 6]",5144,893,10,New York State's need for organ and tissue don...
1,[6],1067,181,10,"However, even if you are licensed somewhere el..."
2,[6],1979,362,10,You must have a CDL if you drive any vehicle t...
3,"[6, 7]",1463,260,10,"Taxi/Livery, Class E - Minimum age is 18. Allo..."
4,[7],1216,199,9,This license can be used instead of a passport...
...,...,...,...,...,...
193,[76],935,168,10,Make sure to protect yourself and others from ...
194,"[76, 77]",3384,546,7,You must also report any traffic incident or c...
195,"[77, 78, 79]",8703,1380,3,No written tests. Riverhead 200 Old Country Ro...
196,"[79, 80]",6151,925,10,"Lawrence 80 State Highway 310, Suite 3 Canton ..."


## Retrieving the most similar chunks

Load the embedding tokenizer:

In [4]:
# Load Hugging Face models
embedding_model_name = "sentence-transformers/all-mpnet-base-v2"
embedding_tokenizer = AutoTokenizer.from_pretrained(embedding_model_name)
embedding_model = AutoModel.from_pretrained(embedding_model_name)

Embed text from each row:

In [5]:
embeddings = []

for _, item in df.iterrows():
    embeddings.append(get_text_embedding(embedding_model, embedding_tokenizer, item["text"]))

df["embedding"] = embeddings
df

Unnamed: 0,page_number,char_count,word_count,sentence_count,text,embedding
0,"[3, 4, 5, 6]",5144,893,10,New York State's need for organ and tissue don...,"[0.053695105, 0.30028513, -0.007311916, 0.0446..."
1,[6],1067,181,10,"However, even if you are licensed somewhere el...","[-0.11320977, 0.028662484, 0.09880905, 0.05729..."
2,[6],1979,362,10,You must have a CDL if you drive any vehicle t...,"[-0.1268677, 0.01560301, 0.10079472, -0.073042..."
3,"[6, 7]",1463,260,10,"Taxi/Livery, Class E - Minimum age is 18. Allo...","[-0.06300462, 0.008507263, 0.082344204, -0.025..."
4,[7],1216,199,9,This license can be used instead of a passport...,"[-0.049608383, 0.07712367, 0.04408017, -0.0048..."
...,...,...,...,...,...,...
193,[76],935,168,10,Make sure to protect yourself and others from ...,"[-0.12103632, -0.12166576, -0.04148209, 0.0716..."
194,"[76, 77]",3384,546,7,You must also report any traffic incident or c...,"[-0.18676649, 0.042021804, 0.020721635, 0.0203..."
195,"[77, 78, 79]",8703,1380,3,No written tests. Riverhead 200 Old Country Ro...,"[-0.102082744, 0.16322422, 0.046678375, 0.0929..."
196,"[79, 80]",6151,925,10,"Lawrence 80 State Highway 310, Suite 3 Canton ...","[-0.096065156, 0.08573288, 0.06285982, 0.15635..."


Get embeddings and put in tensor:

In [6]:
device = "cuda" if torch.cuda.is_available() else "cpu"

embeddings = torch.tensor(np.array(df["embedding"].tolist()), dtype=torch.float32)
embeddings.shape

torch.Size([198, 768])

Some observations:
- First attempted with all-MiniLM-L6-v2 (384max tokens) for paragraphs; then with all-mpnet-base-v2 (512max tokens) and this one returned the better (correct) result.

- While putting in a drivers manual, there are a lot of questions in the book. It also calculates scores of these questions. This needs to be avoided. Adjusting the chunking method will help solve this issue.

- Possible future exploration for chunking: find highest n scores (most similar chunks), then take n chunks before and n chunks after. Then compute the score again and pick from new scores. Feed final, larger context into the model!



Query:

In [7]:
query = "What color are the traffic lights?"

Retrieval of top k chunks:

In [8]:
values, indices = retrieve_top_k_similar(query, embeddings, embedding_model, embedding_tokenizer, 3, method="faiss")

In [9]:
indices

array([59, 73, 56])

In [10]:
pprint.pp(df["text"][59])

('The shape tells you the type of route you are on. The sample signs, left to '
 'right, are for state, U.S., and interstate routes. When you plan a trip, use '
 'a highway map to decide which routes to take. During the trip, watch for '
 'destina- tion signs so you will not get lost, or have to turn or stop '
 'suddenly. SERVICE SIGNS: COLOR: Blue, with white letters or symbols MEANING: '
 'Show the location of services, like rest areas, gas stations, camping or '
 'medical facilities. TRAFFIC SIGNALS Traffic Lights Traffic lights are '
 'normally red, yellow and green from the top to bottom or left to right. At '
 'some intersections, there are lone red, yellow or green lights. Some traffic '
 'lights are steady, others flash. Some are round, and some are arrows. State '
 'law requires that if the traffic lights or controls are out of service or do '
 'not operate correctly when you approach an intersection, you must come to a '
 'stop as you would for a stop sign.')


## Reranking (optional)

Reranking is done using a rerank model that would take the top n picked chunks and the query, and rerank the option. It acts as a double-check step for similarity retrieval.

Example:

In [11]:
# Retrieve the chunk texts and put them in a list
docs = df["text"][indices].to_list()
docs

['The shape tells you the type of route you are on. The sample signs, left to right, are for state, U.S., and interstate routes. When you plan a trip, use a highway map to decide which routes to take. During the trip, watch for destina- tion signs so you will not get lost, or have to turn or stop suddenly. SERVICE SIGNS: COLOR: Blue, with white letters or symbols MEANING: Show the location of services, like rest areas, gas stations, camping or medical facilities. TRAFFIC SIGNALS Traffic Lights Traffic lights are normally red, yellow and green from the top to bottom or left to right. At some intersections, there are lone red, yellow or green lights. Some traffic lights are steady, others flash. Some are round, and some are arrows. State law requires that if the traffic lights or controls are out of service or do not operate correctly when you approach an intersection, you must come to a stop as you would for a stop sign.',
 'BLUE, GREEN AND AMBER LIGHTS Personal vehicles driven by volun

In [12]:
# Initialize the rerank model
rerank_model = CrossEncoder("mixedbread-ai/mxbai-rerank-large-v1")

Get reranked results:

In [13]:
results = rerank_model.rank(query, docs, return_documents=True, top_k=5)
results

[{'corpus_id': 0,
  'score': np.float32(0.9699756),
  'text': 'The shape tells you the type of route you are on. The sample signs, left to right, are for state, U.S., and interstate routes. When you plan a trip, use a highway map to decide which routes to take. During the trip, watch for destina- tion signs so you will not get lost, or have to turn or stop suddenly. SERVICE SIGNS: COLOR: Blue, with white letters or symbols MEANING: Show the location of services, like rest areas, gas stations, camping or medical facilities. TRAFFIC SIGNALS Traffic Lights Traffic lights are normally red, yellow and green from the top to bottom or left to right. At some intersections, there are lone red, yellow or green lights. Some traffic lights are steady, others flash. Some are round, and some are arrows. State law requires that if the traffic lights or controls are out of service or do not operate correctly when you approach an intersection, you must come to a stop as you would for a stop sign.'},
 {

For this example, reranking comes up with the same order.

## Preparing the templates

First, get a dictionary of relevant chunks and a query using a custom function.

In [15]:
dict1 = get_items_for_prompt(query, df, indices)
dict1

{'query': 'What color are the traffic lights?',
 'page1': [30],
 'page2': [35, 36],
 'page3': [29],
 'context1': 'The shape tells you the type of route you are on. The sample signs, left to right, are for state, U.S., and interstate routes. When you plan a trip, use a highway map to decide which routes to take. During the trip, watch for destina- tion signs so you will not get lost, or have to turn or stop suddenly. SERVICE SIGNS: COLOR: Blue, with white letters or symbols MEANING: Show the location of services, like rest areas, gas stations, camping or medical facilities. TRAFFIC SIGNALS Traffic Lights Traffic lights are normally red, yellow and green from the top to bottom or left to right. At some intersections, there are lone red, yellow or green lights. Some traffic lights are steady, others flash. Some are round, and some are arrows. State law requires that if the traffic lights or controls are out of service or do not operate correctly when you approach an intersection, you must

### Templates

In [17]:
template1 = """Instruct:You are my tutor. Your task is to give me answers and explanations to my questions about the topic based on the context I provide. Think carefully about the answer by extracting relevant passages from the context before answering my question. Don’t return your thoughts, only the answer. Make sure your responses are detailed and as explanatory as possible. Optionally quote from the context, citing the page. Do not use your previous knowledge to answer the question.

Following are the examples:


QA Example 1
Context:
Page 1: "Water scarcity affects over 2 billion people worldwide due to climate change and poor resource management."
Page 2: "Desalination is a key technological solution but comes with challenges such as high energy costs and environmental concerns."
Query:
What are the main solutions to water scarcity?
Answer:
Desalination is a significant solution, as noted on Page 2, but it has challenges like high energy costs and environmental impact. Other approaches, such as improved resource management (Page 1), are also critical.

QA Example 2
Context:
Page 1: "Photosynthesis is the process by which plants convert sunlight into energy, primarily occurring in the chloroplasts."
Page 2: "The process consists of light-dependent reactions and the Calvin cycle, where glucose is synthesized."
Query:
What is glucose?
Answer:
Unfortunately, the context provided does not contain the answer to your inquiry.

Context Pages:
Page {page1}: {context1}
Page {page2}: {context2}
Page {page3}: {context3}

Query:
{query}"""

In [18]:
template2 = """Instruct:You are a knowledgeable tutor. Answer the query below only using the given context. Pick the context you find most valuable. You are allowed to use more than one context. If you are not sure about the answer say that you don’t know the answer.

Context Pages:
Page {page1} : {context1}
Page {page2}: {context2}
Page {page3}: {context3}

Query:
{query}

Guidelines for Response:
Provide a detailed, explanatory answer, but do not make it too long.
Optionally quote from the context if helpful, citing the page.
Specify which pages support your response.
Only use the context to answer and do not answer the question if the answer is not in the context. 
If the context does not contain the answer, say that you cannot deduce the answer from the context."""

In [20]:
template3 = """Context: {context}
Query: {query}
Answer:"""

### Populate the templates using the dictionary

In [21]:
final_prompt1 = template1.format(page1 = dict1["page1"], page2 = dict1["page2"], page3 = dict1["page3"],
                              context1 = dict1["context1"], context2 = dict1["context2"], context3 = dict1["context3"],
                              query = dict1["query"])

In [22]:
final_prompt2 = template2.format(page1=dict1["page1"], context1=dict1["context1"],
                                 page2=dict1["page2"], context2=dict1["context2"],
                                 page3=dict1["page3"], context3=dict1["context3"],
                                 query=dict1["query"])

In [23]:
final_prompt3 = template3.format(context = dict1["context1"],
                              query = dict1["query"])

In [24]:
pprint.pp(final_prompt3)

('Context: The shape tells you the type of route you are on. The sample signs, '
 'left to right, are for state, U.S., and interstate routes. When you plan a '
 'trip, use a highway map to decide which routes to take. During the trip, '
 'watch for destina- tion signs so you will not get lost, or have to turn or '
 'stop suddenly. SERVICE SIGNS: COLOR: Blue, with white letters or symbols '
 'MEANING: Show the location of services, like rest areas, gas stations, '
 'camping or medical facilities. TRAFFIC SIGNALS Traffic Lights Traffic lights '
 'are normally red, yellow and green from the top to bottom or left to right. '
 'At some intersections, there are lone red, yellow or green lights. Some '
 'traffic lights are steady, others flash. Some are round, and some are '
 'arrows. State law requires that if the traffic lights or controls are out of '
 'service or do not operate correctly when you approach an intersection, you '
 'must come to a stop as you would for a stop sign.\n'
 'Quer

## MODELS

### GPT2

First try uses the pre-trained regular GPT2 model.

In [25]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel

# Load the GPT-2 model and tokenizer
gpt_model_name = "gpt2"  # Replace with a fine-tuned model name if available
gpt_tokenizer = GPT2Tokenizer.from_pretrained(gpt_model_name)
gpt_model = GPT2LMHeadModel.from_pretrained(gpt_model_name)

Encode the prompt:

In [26]:
input_prompt = gpt_tokenizer.encode(final_prompt3, return_tensors="pt")

Generate output:

In [27]:
output = gpt_model.generate(
    input_prompt,
    max_new_tokens=100,  # Limit response length
    num_return_sequences=1,
    temperature=0.7,  # Control randomness
    top_k=50,  # Focus on the most likely next words
    do_sample=True
)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Print the response:

In [28]:
response = gpt_tokenizer.decode(output[0], skip_special_tokens=True)
print(response)

Context: The shape tells you the type of route you are on. The sample signs, left to right, are for state, U.S., and interstate routes. When you plan a trip, use a highway map to decide which routes to take. During the trip, watch for destina- tion signs so you will not get lost, or have to turn or stop suddenly. SERVICE SIGNS: COLOR: Blue, with white letters or symbols MEANING: Show the location of services, like rest areas, gas stations, camping or medical facilities. TRAFFIC SIGNALS Traffic Lights Traffic lights are normally red, yellow and green from the top to bottom or left to right. At some intersections, there are lone red, yellow or green lights. Some traffic lights are steady, others flash. Some are round, and some are arrows. State law requires that if the traffic lights or controls are out of service or do not operate correctly when you approach an intersection, you must come to a stop as you would for a stop sign.
Query: What color are the traffic lights?
Answer: The numbe

The problem here is that GPT2 is not trained to generate for QA purposes. When given too large of a context, it starts generating, but it goes on in repetitive circles. If given a smaller context, it does not respond to question the way it is supposed to, and does not refer to the context, and it can also start repeating the same things over and over.

Hence, we wanted to explore how fine-tuning the model and training it on a QA dataset would work.

We used SQuAD (Stanford Question Answering Dataset) and modified it to be in format context-query-answer to train GPT2 on it. We set a limit to 20000 examples due to computational limitations.

Model is saved and the model folder needs to be moved into thhe folder with the notebook before running th following:

In [29]:
model_gpt_new = GPT2LMHeadModel.from_pretrained("./fine_tuned_gpt2_qa")
tokenizer_gpt_new = GPT2Tokenizer.from_pretrained("./fine_tuned_gpt2_qa")

In [30]:
input_prompt_new = tokenizer_gpt_new.encode(final_prompt3, return_tensors="pt")

In [31]:
output = model_gpt_new.generate(
    input_prompt_new,
    max_new_tokens=50,  # Limit response length
    num_return_sequences=1,
    temperature=0.8,  # Control randomness
    top_k=50,  # Focus on the most likely next words
    do_sample=True
)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


In [32]:
response = tokenizer_gpt_new.decode(output[0], skip_special_tokens=True)
print(response)

Context: The shape tells you the type of route you are on. The sample signs, left to right, are for state, U.S., and interstate routes. When you plan a trip, use a highway map to decide which routes to take. During the trip, watch for destina- tion signs so you will not get lost, or have to turn or stop suddenly. SERVICE SIGNS: COLOR: Blue, with white letters or symbols MEANING: Show the location of services, like rest areas, gas stations, camping or medical facilities. TRAFFIC SIGNALS Traffic Lights Traffic lights are normally red, yellow and green from the top to bottom or left to right. At some intersections, there are lone red, yellow or green lights. Some traffic lights are steady, others flash. Some are round, and some are arrows. State law requires that if the traffic lights or controls are out of service or do not operate correctly when you approach an intersection, you must come to a stop as you would for a stop sign.
Query: What color are the traffic lights?
Answer: blue, wit

After being trained on the QA dataset, GPT2 starts infering from the context given, and returns the relevant response to the question, but definitely in a basic form.

The following 3 models are more advanced than GPT2, but are trained in different ways with a different purpose.
We created a function for each model for making calls to the HF Inference API (since loading the models directly presented up with issues due to hardware limtations).

### Mistral-7B

It is interesting to see that Mistral gives the correct answer, but cites 2 different book locations (second one incorrect, but was passed as additional context to the prompt).

In [33]:
response = query_hf_mistral(final_prompt2)
print("Model Response:\n", response)

Model Response:
 Traffic lights are typically red, yellow, and green, from top to bottom or left to right. This information can be found on pages [30] and [35, 36].


## PHI

Phi responds to the question correctly, but cites a wrong page (pulled from a different context chunk passed to the template).

In [34]:
response = query_hf_phi(final_prompt2)
print("Model Response:\n", response)

Model Response:
 
The traffic lights are normally red, yellow, and green. This information is found on Page [35, 36] of the document.




## QWEN

Qwen responds correctly, and also cites the correct page.

In [35]:
response = query_hf_qwen(final_prompt2)
print("Model Response:\n", response)

Model Response:
 The traffic lights are normally red, yellow, and green, arranged from top to bottom or left to right. This information is provided on Page [30] of the context. 

Quoting from the context:
"Traffic lights are normally red, yellow and green from the top to bottom or left to right." (Page [30])


## Ablation Study

### Picking the chunking method

#### Fixed number of characters:

When using a fixed number of characters, words might end up cut, and the flow and main thought of the sentence can be interrupted. If this happens a lot, it reduced the importance of context passed to the model. Hence, we decided not to use the fixed method.

In [147]:
df=create_df_from_pdf(pdf_path, method="fixed")
embeddings = []

for _, item in df.iterrows():
    embeddings.append(get_text_embedding(embedding_model, embedding_tokenizer, item["text"]))

df["embedding"] = embeddings

embeddings = torch.tensor(np.array(df["embedding"].tolist()), dtype=torch.float32)

100%|██████████| 81/81 [00:00<00:00, 235.65it/s]


Set the query and retrieve the chunks:

In [148]:
query = "Who has the right of way in an intersection?"
values, indices = retrieve_top_k_similar(query, embeddings, embedding_model, embedding_tokenizer, 3, method="faiss")
indices

array([182, 208, 181])

In [151]:
# See the chunk
pprint.pp(df["text"][208])

('e directions at the same time, and one travels straight, the other prepares '
 'to turn left, which must yield the right-of-way? • ?If you enter an '
 'intersection to make a left turn, but oncoming traffic prevents the turn '
 'immediately, what should you do? • ?If you reach an intersection that is not '
 'con- trolled at the same time as a driver on your right, and both of you '
 'prepare to go straight, who has the right-of-way? • ?What must you do if you '
 'enter a road from a driveway? • ?You fac')


#### Paragraph

As shown below, because paragraphs are retrieved on single or double newline, in certain documents like study books where there are a lot of titles or questions, the chunk retrieved could very well be just a title of a chapter, or a question. By setting a threshold of number of words below which we remove the chunk we handled standalone titles (but this means that titles, which are very important in semantic search now do not play a role in retrieval). Questions are still a problem, as shown above. Hence, we decided not to use paragraphs.

In [152]:
df=create_df_from_pdf(pdf_path, method="paragraph")
embeddings = []

for _, item in df.iterrows():
    embeddings.append(get_text_embedding(embedding_model, embedding_tokenizer, item["text"]))

df["embedding"] = embeddings

embeddings = torch.tensor(np.array(df["embedding"].tolist()), dtype=torch.float32)

100%|██████████| 81/81 [00:00<00:00, 287.26it/s]


Set the query and retrieve the chunks:

In [153]:
query = "Who has the right of way in an intersection?"
values, indices = retrieve_top_k_similar(query, embeddings, embedding_model, embedding_tokenizer, 3, method="faiss")
indices

array([256, 254, 222])

In [154]:
# See the chunk
pprint.pp(df["text"][256])

('• ?If you reach an intersection that is not con- trolled at the same time as '
 'a driver on your right, and both of you prepare to go straight, who has the '
 'right-of-way?')


#### Page

When selecting the chunking method to be page, there are several problems we run into. First one is the length of context which will be very long for the models to take in. This limits limit of tokens to be generated and might make the models stuggle if they do not perform well with long context. Another issue we've encountered is when a page contains two different topics (like with a start of a new chapter). When this happens, models might infer a connection between two things where they are completely independent and relate them in its generated explanation of the query. We have decided not to use the page chunks.

In [158]:
df=create_df_from_pdf(pdf_path, method="page")
embeddings = []

for _, item in df.iterrows():
    embeddings.append(get_text_embedding(embedding_model, embedding_tokenizer, item["text"]))

df["embedding"] = embeddings

embeddings = torch.tensor(np.array(df["embedding"].tolist()), dtype=torch.float32)

100%|██████████| 81/81 [00:00<00:00, 300.26it/s]


Set the query and retrieve the chunks:

In [159]:
query = "Who has the right of way in an intersection?"
values, indices = retrieve_top_k_similar(query, embeddings, embedding_model, embedding_tokenizer, 3, method="faiss")
indices

array([38, 33, 36])

In [162]:
# See the chunk
pprint.pp(df["text"][36])

('37 | Driver’s Manual LEFT TURN FROM TWO-WAY ROAD INTO TWO-WAY ROAD: Approach '
 'the turn from the right half of the roadway closest to the center. Try to '
 'use the left side of the intersection to help make sure that you do not '
 'interfere with traffic headed toward you that wants to turn left. Keep to '
 'the right of the cen- terline of the road you enter, but as close as pos- '
 'sible to the center line. Be alert for traffic heading toward you from the '
 'left and from the lane you are about to go across. Motorcycles headed toward '
 'you are hard to see and it is difficult to judge their speed and distance '
 'away. Drivers often fail to see a motorcycle headed toward them and hit it '
 'while they turn across a traffic lane. LEFT TURN FROM TWO-WAY ROAD INTO ONE- '
 'WAY ROAD: Approach the turn from the right half of the road- way closest to '
 'the center. Make the turn before you reach the center of the intersection '
 'and turn into the left lane of the road you enter. 

#### Sentences

We have decided to use a fixed number of sentences as our chunking method. This way, the problem of words and thoughts being split (when using fixed number of characters) is solved. Due to a series of sentences that are connected into a chunk, we avoid the problem of standalone titles (that we've seen in paragraph approach), and we get to keep the titles which are important for semantic search. In case of QA parts of the book where we might've seen a standalone question in a paragraph approach, now we might also include an answer to said question. 

Finally, we have decided to set the numeber of sentences to 10. Having less did not result in giving us enough context, and having more gave us too much. Hence, 10 was the sweet spot, and it also worked well with maximum number of embedding tokens during the embedding process.

In [166]:
df=create_df_from_pdf(pdf_path, method="sentence")
embeddings = []

for _, item in df.iterrows():
    embeddings.append(get_text_embedding(embedding_model, embedding_tokenizer, item["text"]))

df["embedding"] = embeddings

embeddings = torch.tensor(np.array(df["embedding"].tolist()), dtype=torch.float32)

100%|██████████| 81/81 [00:00<00:00, 304.23it/s]


Set the query and retrieve the chunks:

In [167]:
query = "Who has the right of way in an intersection?"
values, indices = retrieve_top_k_similar(query, embeddings, embedding_model, embedding_tokenizer, 3, method="faiss")
indices

array([67, 81, 70])

In [172]:
# See the chunk
pprint.pp(df["text"][67])

('Driver’s Manual | 34 Most traffic crashes occur at intersections when a '
 'driver makes a turn. Many occur in large parking lots like at shopping '
 'centers. To prevent this type of crash, you must understand the right-of-way '
 'rules and how to make correct turns. RIGHT-OF-WAY Traffic signs, signals and '
 'pavement markings do not always resolve traffic conflicts. A green light, '
 'for example, does not resolve the conflict of when a car turns left at an '
 'intersection while an approaching car goes straight through the '
 'intersection. The right-of-way rules help resolve these conflicts. They tell '
 'you who goes first and who must wait in different conditions. Here are '
 'examples of right-of-way rules: • ?A driver who approaches an intersection '
 'must yield the right-of-way to traffic that is in the intersection. Example: '
 'You approach an intersection. The traffic light is green and you want to '
 'drive straight through.')


### Picking the embedding model

Following exploration is meant to look into indices of the chunks retrieved by all three embedding models: "all-MiniLM-L6-v2", "all-mpnet-base-v2", "multi-qa-mpnet-base-cos-v1". 

In [198]:
dfs = compare_pipeline_configurations(["What do I do if my turn lights are not working?", "Who has the right of way in an intersection?",
                                       "Where am I not allowed to park?", "What am I not allowed to do while driving?"], 
                                      pdf_path, chunking_method=["sentence"], 
                                      embedding_tokenizers=["all-MiniLM-L6-v2", "all-mpnet-base-v2", "multi-qa-mpnet-base-cos-v1"], 
                                      similarity_method="cosine", templates=[template2], model_functions=[query_hf_mistral])

100%|██████████| 81/81 [00:00<00:00, 282.65it/s]


In [199]:
dfs

Unnamed: 0,Query,Chunking Method,Embedding,Prompt #,Model Function,Response,Indices
0,What do I do if my turn lights are not working?,sentence,all-MiniLM-L6-v2,1,query_hf_mistral,If your turn signals (or blinkers) are not wor...,"[120, 148, 159, 73, 141]"
1,What do I do if my turn lights are not working?,sentence,all-mpnet-base-v2,1,query_hf_mistral,"If your turn lights are not working, you shoul...","[159, 120, 74, 148, 151]"
2,What do I do if my turn lights are not working?,sentence,multi-qa-mpnet-base-cos-v1,1,query_hf_mistral,"In case your turn signals are not working, it ...","[159, 74, 151, 60, 75]"
3,Who has the right of way in an intersection?,sentence,all-MiniLM-L6-v2,1,query_hf_mistral,"In an intersection, the right-of-way rules dic...","[67, 80, 69, 68, 79]"
4,Who has the right of way in an intersection?,sentence,all-mpnet-base-v2,1,query_hf_mistral,"In an intersection, the driver who is making a...","[67, 68, 70, 82, 84]"
5,Who has the right of way in an intersection?,sentence,multi-qa-mpnet-base-cos-v1,1,query_hf_mistral,"In an intersection, a driver who wants to turn...","[67, 81, 68, 70, 80]"
6,Where am I not allowed to park?,sentence,all-MiniLM-L6-v2,1,query_hf_mistral,You are not allowed to park within 30 feet (10...,"[93, 94, 92, 95, 91]"
7,Where am I not allowed to park?,sentence,all-mpnet-base-v2,1,query_hf_mistral,"Based on the given context from pages [43, 44]...","[93, 92, 95, 94, 91]"
8,Where am I not allowed to park?,sentence,multi-qa-mpnet-base-cos-v1,1,query_hf_mistral,You are not allowed to park your vehicle withi...,"[93, 92, 94, 95, 91]"
9,What am I not allowed to do while driving?,sentence,all-MiniLM-L6-v2,1,query_hf_mistral,You are not allowed to use cell phones or text...,"[98, 123, 99, 72, 41]"


After looking into the chunks (based on indices retrieved):
- In an example of "What do I do if my turn lights are not working?", all-mpnet-base-v2	performed the best. It gave all top 3 relevant chunks, while the others returned 1 or 2 chunks that are not relevant.
- In an example of "Who has the right of way in an intersection?", the best performance was achieved by all-mpnet-base-v2. all-MiniLM-L6-v2	retrieved a completely irrelevant second chunk (80), and multi-qa-mpnet-base-cos-v1's second chunk is about passing vehicles.
- In an example of "Where am I not allowed to park?" query, all-mpnet-base-v2 retrieves the 3 most relevant chunks in the exact order, but all 3 models retrieved the same 5 chunks.
- In an example of "What am I not allowed to do while driving?" query, multi-qa-mpnet-base-cos-v1 wins the battle. Chunks 98 and 97 are the most relevant, and all-MiniLM-L6-v2	fails to retrieve 97, while all-mpnet-base-v2 puts it on 5th place.

After running this test in a similar way and manually comparing against ground truth, we have decided to use all-mpnet-base-v2 as the embedding model.


### Comparing the templates and models:

For each example we use a custom made function that helps in ablation studies (compare_pipeline_configurations located in ablation_helper.py). Then we print it out in format we want.

#### Answer to a question from the context
Both templates do a good job when question asked can be answered by the context in the pdf. This is true for all models. 

Model observation: In the example ran, it looks like Phi struggles to pick the correct context (it picks pages 35, 36, whereas QWEN picks the correct page 30).

Best performing model: QWEN (got the pages correct consistently and even gave a quote.)

In [323]:
dfs = compare_pipeline_configurations("What color are the traffic lights?",
                                      pdf_path, chunking_method=["sentence"], 
                                      embedding_tokenizers=["all-mpnet-base-v2"], 
                                      similarity_method="faiss", templates=[template1, template2], 
                                      model_functions=[query_hf_mistral, query_hf_phi, query_hf_qwen])

100%|██████████| 81/81 [00:00<00:00, 299.51it/s]


In [324]:
for query, group1 in dfs.groupby("Query"):
    print(f"Query: {query}\n\n")
    
    for model, group2 in group1.groupby("Model Function"):
        for _, row in group2.iterrows():
            print("Model Function: " + model)
            print(f"Prompt: {row['Prompt #']}")
            print(row["Response"]+"\n\n")
        print("\n-----------------\n")
        
    print("\n******************\n")

Query: What color are the traffic lights?


Model Function: query_hf_mistral
Prompt: 1
The color of traffic lights are red, yellow, and green. (From Page [35, 36])


Model Function: query_hf_mistral
Prompt: 2
The traffic lights are normally red, yellow, and green. These colors are from top to bottom or left to right, as per the context provided on Page [30].



-----------------

Model Function: query_hf_phi
Prompt: 1

The traffic lights are typically red, yellow, and green, as mentioned on Page 35 and 36. These colors are used to regulate traffic flow, with red indicating a stop, yellow a caution to prepare to stop, and green allowing vehicles to proceed when it is safe.




Model Function: query_hf_phi
Prompt: 2

The traffic lights are normally red, yellow, and green. This information is found on Page [35, 36] of the context.





-----------------

Model Function: query_hf_qwen
Prompt: 1
Traffic lights are normally red, yellow, and green from the top to bottom or left to right, as s

#### Itemized answer/bulletpoints

When asked a question that can be answered with bulletpoints, like itemizing or procedure explanation, it seems that both templates perform well.

Procedural:

Best performing models: PHI and QWEN.

In [325]:
dfs = compare_pipeline_configurations("How to parallel park?",
                                      pdf_path, chunking_method=["sentence"], 
                                      embedding_tokenizers=["all-mpnet-base-v2"], 
                                      similarity_method="faiss", templates=[template1, template2], 
                                      model_functions=[query_hf_mistral, query_hf_phi, query_hf_qwen])

  0%|          | 0/81 [00:00<?, ?it/s]

100%|██████████| 81/81 [00:00<00:00, 287.39it/s]


In [326]:
for query, group1 in dfs.groupby("Query"):
    print(f"Query: {query}\n\n")
    
    for model, group2 in group1.groupby("Model Function"):
        for _, row in group2.iterrows():
            print("Model Function: " + model)
            print(f"Prompt: {row['Prompt #']}")
            print(row["Response"]+"\n\n")
        print("\n-----------------\n")
        
    print("\n******************\n")

Query: How to parallel park?


Model Function: query_hf_mistral
Prompt: 1
To parallel park, follow these steps:

1. Select a parking space that is large enough for your vehicle on your side of the road.
2. Check your mirrors before you stop and signal to warn other drivers.
3. Position your vehicle behind the vehicle ahead of the space you have chosen, so that your passenger-side mirror aligns with the rear bumper of the vehicle in front of the parking space.
4. Pull up alongside the vehicle ahead, aligned with the center of the parking space.
5. Signal your intentions to move into the parking space and check your mirrors and blind spots for approaching traffic.
6. Turn the steering wheel all the way to the right and slowly move your vehicle into the parking space, keeping an eye on the vehicle ahead to ensure you have enough room to park.
7. Once your vehicle is halfway into the space, turn the steering wheel all the way to the left, then pull forward until your vehicle is parallel wi

Itemized:

Best model: QWEN with template 2.

In [426]:
dfs = compare_pipeline_configurations("What are the three components of the apparatus?",
                                      "In the Penal Colony Author Franz Kafka.pdf", chunking_method=["sentence"], 
                                      embedding_tokenizers=["all-mpnet-base-v2"], 
                                      similarity_method="faiss", templates=[template1, template2], 
                                      model_functions=[query_hf_mistral, query_hf_phi, query_hf_qwen])

100%|██████████| 28/28 [00:00<00:00, 307.70it/s]


In [427]:
for query, group1 in dfs.groupby("Query"):
    print(f"Query: {query}\n\n")
    
    for model, group2 in group1.groupby("Model Function"):
        for _, row in group2.iterrows():
            print("Model Function: " + model)
            print(f"Prompt: {row['Prompt #']}")
            print(row["Response"]+"\n\n")
        print("\n-----------------\n")
        
    print("\n******************\n")

Query: What are the three components of the apparatus?


Model Function: query_hf_mistral
Prompt: 1
Based on the context provided, the three components of the apparatus can be identified as follows:

1. The Harrow: This is a device that sinks onto the body of the individual being punished, and it is described as having needle-like tips (Page 8).
2. The Steel Cable: This component tightens up into a rod when the harrow is in position, and it is used to perform the punishment (Page 8).
3. The Bed: This is the surface on which the individual being punished lies during the execution of the punishment (Implied from the descriptions of the harrow sinking onto the body and the harrow being adjusted to lightly touch the body).


Model Function: query_hf_mistral
Prompt: 2
Based on the context provided, it appears that the apparatus consists of three main components.

1. The Harrow: This is the part that sinks onto the body of the convicted during punishment. It is described as a "bed" that quiv

#### Answer NOT present in the context

When given a query that is a trick question, and does not appear in the context, we want the model to respond that it cannot deduce the answer from the context rather than responding from its pretraining knowledge.

Template 2 was consistent in yielding a response that says that the information cannot be found in the context. Mistral still seemed to have responded to the question even when we explicitly said not to when using the first template. 

Best performing model: QWEN

In [403]:
dfs = compare_pipeline_configurations("How do I get a private pilot license?",
                                      pdf_path, chunking_method=["sentence"], 
                                      embedding_tokenizers=["all-mpnet-base-v2"], 
                                      similarity_method="faiss", templates=[template1, template2], 
                                      model_functions=[query_hf_mistral, query_hf_phi, query_hf_qwen])

100%|██████████| 81/81 [00:00<00:00, 291.72it/s]


In [404]:
for query, group1 in dfs.groupby("Query"):
    print(f"Query: {query}\n\n")
    
    for model, group2 in group1.groupby("Model Function"):
        for _, row in group2.iterrows():
            print("Model Function: " + model)
            print(f"Prompt: {row['Prompt #']}")
            print(row["Response"]+"\n\n")
        print("\n-----------------\n")
        
    print("\n******************\n")

Query: How do I get a private pilot license?


Model Function: query_hf_mistral
Prompt: 1
To obtain a private pilot license (PPL), follow these steps:

1. Find a flight school (Page [10]): You can start by finding a certified flight instructor (CFI) and a flight school that offers private pilot training. You can contact the Federal Aviation Administration (FAA) for a list of certified flight schools.

2. Complete the required flight training (Page [10]): To get a PPL, you must complete a minimum of 40 hours of flight time, including 20 hours of flight training from an instructor and 10 hours of solo flight.

3. Obtain a medical certificate (Page not present in context): You must have a third-class medical certificate to get a PPL. If you have any medical conditions that may affect your ability to fly, you must schedule an appointment with an Aviation Medical Examiner (AME).

4. Pass the written knowledge test (Page [10]): You must pass a written knowledge test, which covers topics like

#### Query not detailed enough

There is such a thing as a bad query. If words not similar to the ones in the pdf are used, the embedding models/retrieval process might not deem similar to the actual chunks that are relevant to the question. If we as the question below, a chunk is retrieved saying that the Condemned was "condemned for disobeying and insulting his superior." But later in the text, it is said that he was caught sleeping, then did not take the whooping but talked back at his superior. Due to the query not resulting in that chunk being retrieved, the model says that this is not mentioned in the context. 

In addition, the models do respond with a broad "The condemned man was sentenced for disobeying and insulting his superior", but some of them quote a wrong sentence.

Best performing model here is Mistral, with Template 2, which corrently pick the right sentence from the context to paraphrase. 

In [411]:
dfs = compare_pipeline_configurations("What crime did the condemned commit?",
                                      "In the Penal Colony Author Franz Kafka.pdf", chunking_method=["sentence"], 
                                      embedding_tokenizers=["all-mpnet-base-v2"], 
                                      similarity_method="faiss", templates=[template1, template2], 
                                      model_functions=[query_hf_mistral, query_hf_phi, query_hf_qwen])

100%|██████████| 28/28 [00:00<00:00, 330.04it/s]


In [412]:
for query, group1 in dfs.groupby("Query"):
    print(f"Query: {query}\n\n")
    
    for model, group2 in group1.groupby("Model Function"):
        for _, row in group2.iterrows():
            print("Model Function: " + model)
            print(f"Prompt: {row['Prompt #']}")
            print(row["Response"]+"\n\n")
        print("\n-----------------\n")
        
    print("\n******************\n")

Query: What crime did the condemned commit?


Model Function: query_hf_mistral
Prompt: 1
The condemned man was condemned for disobeying and insulting his superior, as stated on Page 1.


Model Function: query_hf_mistral
Prompt: 2
The condemned man committed the crime of disobeying and insulting his superior. This can be inferred from the context provided on page 1, where it is mentioned that the Traveler was invited to attend the execution of a soldier who had been condemned for this offense.



-----------------

Model Function: query_hf_phi
Prompt: 1

The text does not specify the exact crime committed by the condemned. It only mentions that he was condemned for disobeying and insulting his superior, as stated in the context: "Of course, interest in the execution was not very high, not even in the penal colony itself. At least, here in the small, deep, sandy valley, closed in on all sides by barren slopes, apart from the Officer and the Traveler there were present only the Condemned,

#### Hyphotethical and what if questions

PHI and QWEN perform better with template 2, saying that such information is not provided in the context, while they start sypothesizing with template 1. This is a design decision and is debatable if we want to allow the model to hypothesise about a situation while connecting it with the current context.

With the example ran, Mistral always answers the question and starts hypothesizing. Instruction possibly would have to be more explicit for mistral.

In [405]:
dfs = compare_pipeline_configurations("If the story were set in modern times, how might technology like surveillance cameras \
                                      or smartphones have changed the narrator's actions or the outcome of the story?",
                                      "The_Tell_Tale_Heart.pdf", chunking_method=["sentence"], 
                                      embedding_tokenizers=["all-mpnet-base-v2"], 
                                      similarity_method="faiss", templates=[template1, template2], 
                                      model_functions=[query_hf_mistral, query_hf_phi, query_hf_qwen])

100%|██████████| 6/6 [00:00<00:00, 275.83it/s]


In [406]:
for query, group1 in dfs.groupby("Query"):
    print(f"Query: {query}\n\n")
    
    for model, group2 in group1.groupby("Model Function"):
        for _, row in group2.iterrows():
            print("Model Function: " + model)
            print(f"Prompt: {row['Prompt #']}")
            print(row["Response"]+"\n\n")
        print("\n-----------------\n")
        
    print("\n******************\n")

Query: If the story were set in modern times, how might technology like surveillance cameras                                       or smartphones have changed the narrator's actions or the outcome of the story?


Model Function: query_hf_mistral
Prompt: 1
In modern times, the presence of surveillance cameras or smartphones could have significantly altered the narrator's actions and the outcome of the story in several ways:

1. Detection: With the ubiquity of surveillance cameras in public areas, it is possible that the police may have footage of the narrator carrying the old man's body or disposing of it. This could lead to a more immediate discovery of the crime, as the officers would not need to rely on neighbor's suspicions or information.

2. Evidence: Smartphones can store photos, videos, and other digital evidence. If the old man had a smartphone, it could contain incriminating pictures or conversations between him and the narrator, potentially linking the narrator to the crime.


#### Opinion

This question aims to see how well the model can infer the resoning behind the narrator's actions.

QWEN, using template 2, once again performed the best.

In [414]:
dfs = compare_pipeline_configurations("Why do you think the narrator chose to seat the police officers directly above the hidden body?",
                                      "The_Tell_Tale_Heart.pdf", chunking_method=["sentence"], 
                                      embedding_tokenizers=["all-mpnet-base-v2"], 
                                      similarity_method="faiss", templates=[template1, template2], 
                                      model_functions=[query_hf_mistral, query_hf_phi, query_hf_qwen])

100%|██████████| 6/6 [00:00<00:00, 214.83it/s]


In [415]:
for query, group1 in dfs.groupby("Query"):
    print(f"Query: {query}\n\n")
    
    for model, group2 in group1.groupby("Model Function"):
        for _, row in group2.iterrows():
            print("Model Function: " + model)
            print(f"Prompt: {row['Prompt #']}")
            print(row["Response"]+"\n\n")
        print("\n-----------------\n")
        
    print("\n******************\n")

Query: Why do you think the narrator chose to seat the police officers directly above the hidden body?


Model Function: query_hf_mistral
Prompt: 1
The narrator chose to seat the police officers directly above the hidden body to create an impression of innocence and dispel their suspicions, leading them to believe that there was no foul play involved. By doing so, the narrator hoped to buy more time to carry out his plan undetected. [Page 5]


Model Function: query_hf_mistral
Prompt: 2
The narrator chose to seat the police officers directly above the hidden body to increase their confidence and satisfaction with their search, as well as to subtly taunt the old man. By placing himself and the officers in the room where the corpse was hidden, the narrator demonstrated the thoroughness of the search and further solidified the belief that there was no foul play involved. This action can be inferred from Page [5], where the narrator invites the officers to rest in the old man's chamber and 

#### Trick question

The following question is meant to trick the model into thinking that there is an election in the story.

All models seems to not fall for the trap.

Best model: QWEN with template 2.

In [421]:
dfs = compare_pipeline_configurations("Who wins the election for govern of the penal colony at the end of the story?",
                                      "In the Penal Colony Author Franz Kafka.pdf", chunking_method=["sentence"], 
                                      embedding_tokenizers=["all-mpnet-base-v2"], 
                                      similarity_method="faiss", templates=[template1, template2], 
                                      model_functions=[query_hf_mistral, query_hf_phi, query_hf_qwen])

100%|██████████| 28/28 [00:00<00:00, 360.79it/s]


In [422]:
for query, group1 in dfs.groupby("Query"):
    print(f"Query: {query}\n\n")
    
    for model, group2 in group1.groupby("Model Function"):
        for _, row in group2.iterrows():
            print("Model Function: " + model)
            print(f"Prompt: {row['Prompt #']}")
            print(row["Response"]+"\n\n")
        print("\n-----------------\n")
        
    print("\n******************\n")

Query: Who wins the election for govern of the penal colony at the end of the story?


Model Function: query_hf_mistral
Prompt: 1
The context provided does not offer information about an election or the governance of the penal colony, so it's not possible to determine who wins the election for the governorship based on the information given.


Model Function: query_hf_mistral
Prompt: 2
The context provided does not contain information about an election for the governance of the penal colony. Therefore, I cannot deduce the answer to the query about who wins the election for govern of the penal colony at the end of the story from the given context.



-----------------

Model Function: query_hf_phi
Prompt: 1

The context provided does not contain information about an election or the winner of a governance position for the penal colony. The text focuses on the Traveler's visit to the execution and his interactions with the Officer and the Condemned Man.


QA: Based on the context provided

The following question means to trick the model into thinking that the old man came back. 

While all the models say that the context does not mention the old man coming back, it would be good if the models were assertive that the event didn't happen because the old man was dead. Templates can be modified to allow the model to correct the user if wrong.

Best model: QWEN with template 2.

In [418]:
dfs = compare_pipeline_configurations("When the old man came back from his travels, how did the policemen react?",
                                      "The_Tell_Tale_Heart.pdf", chunking_method=["sentence"], 
                                      embedding_tokenizers=["all-mpnet-base-v2"], 
                                      similarity_method="faiss", templates=[template1, template2], 
                                      model_functions=[query_hf_mistral, query_hf_phi, query_hf_qwen])

  0%|          | 0/6 [00:00<?, ?it/s]

100%|██████████| 6/6 [00:00<00:00, 218.37it/s]


In [420]:
for query, group1 in dfs.groupby("Query"):
    print(f"Query: {query}\n\n")
    
    for model, group2 in group1.groupby("Model Function"):
        for _, row in group2.iterrows():
            print("Model Function: " + model)
            print(f"Prompt: {row['Prompt #']}")
            print(row["Response"]+"\n\n")
        print("\n-----------------\n")
        
    print("\n******************\n")

Query: When the old man came back from his travels, how did the policemen react?


Model Function: query_hf_mistral
Prompt: 1
Based on the provided context, it is not mentioned when the policemen reacted upon the old man's return from his travels. The passage only describes the events that occurred after the policemen visited the house, and the old man's death. Thus, the context does not contain information about the policemen's reaction upon the old man's return.


Model Function: query_hf_mistral
Prompt: 2
: The context provided does not include information about the policemen's reaction when the old man returned from his travels. Therefore, I cannot deduce the answer from the context.



-----------------

Model Function: query_hf_phi
Prompt: 1

The context does not provide specific details on how the policemen reacted when the old man returned from his travels. The text only mentions that the old man was absent in the country and that the policemen were deputed to search the premis

#### Inferrence

The following are meant to check the model's reasoning and the ability to infer from the context.

Seems like QWEN outperforms the other models in the first question (Mistral cannot infer from the context, and PHI continues making up QA or Context pages in the style of each template).

In [423]:
dfs = compare_pipeline_configurations("Does the traveler appear more interested in the apparatus or the condemned man?",
                                      "In the Penal Colony Author Franz Kafka.pdf", chunking_method=["sentence"], 
                                      embedding_tokenizers=["all-mpnet-base-v2"], 
                                      similarity_method="faiss", templates=[template1, template2], 
                                      model_functions=[query_hf_mistral, query_hf_phi, query_hf_qwen])

100%|██████████| 28/28 [00:00<00:00, 322.55it/s]


In [424]:
for query, group1 in dfs.groupby("Query"):
    print(f"Query: {query}\n\n")
    
    for model, group2 in group1.groupby("Model Function"):
        for _, row in group2.iterrows():
            print("Model Function: " + model)
            print(f"Prompt: {row['Prompt #']}")
            print(row["Response"]+"\n\n")
        print("\n-----------------\n")
        
    print("\n******************\n")

Query: Does the traveler appear more interested in the apparatus or the condemned man?


Model Function: query_hf_mistral
Prompt: 1
The Traveler does not appear more interested in the apparatus or the condemned man, but rather he seems more concerned about the situation and the interaction between the Soldier and the Condemned Man. On page 2, it is stated that the Traveler found it embarrassing to witness the two men, and on page 5, when the Officer explains the sentence inscribed on the Condemned Man's body, the Traveler wants to ask questions but ultimately remains silent (Page 5, 6). Therefore, the Traveler's interest lies in understanding the overall context and situation, rather than the apparatus or the condemned man individually.


Model Function: query_hf_mistral
Prompt: 2
Based on the provided context, it is not explicitly stated that the Traveler appears more interested in the apparatus or the Condemned Man. However, the Traveler seems more bothered by the presence of the two

After exploring the answers to the above questions, QWEN outperforms all models, mainly using template 2. Therefore, through ablation studies, we have decided to use the following pipeline configuration:

- Chunking method: sentences (n=10)
- Embedding model: all-mpnet-base-v2
- Similarity style: FAISS (which uses Euclidean distance)
- Template: template 2
- Model: QWEN

# Results

In [428]:
dmv_queries = ["What color are the traffic lights?",
           "Who has the right of way in an intersection?",
           "What does HOV mean?",
           "How to parallel park?",
           "Where am I not allowed to park?",
           "What is the fine for a parking violation?",
           "What am I not allowed to do while driving?",
           "Explain railroad crossings.",
           "What are some driving emergencies?",
           "How can I avoid hitting a deer?",
           "What do I do if my turn lights are not working?",
           "How do I get a private pilot license?"]

In [429]:
kafka_queries = ["What crime did the condemned commit?",
                 "What does the apparatus do?",
                 "What are the three components of the apparatus?",
                 "Does the condemned know why he is being punished?",
                 "Does the traveler appear more interested in the apparatus or the condemned man?",
                 "Does the commandant support the execution practice?",
                 "Where is the penal colony?",
                 "Who wins the election for govern of the penal colony at the end of the story?",
                 "What is the role of the jury in deciding guilt or innocence in the penal colony?",
                 "What ultimately happens to the officer and the machine?"]

In [430]:
poe_queries = ["How does the narrator attempt to prove their sanity to the reader at the beginning of the story?",
               "What was it about the old man that deeply disturbed the narrator, leading to their decision to commit murder?",
               "Why does the narrator insist on opening the lantern so cautiously every night, even though they claim the old man cannot see them?",
               "The narrator says they loved the old man. How does this claim conflict with their actions in the story?",
               "How does the description of the 'vulture eye' serve as a symbol in the story?",
               "Why does the narrator believe the sound of the old man's heart grows louder even after his death? Could the sound have a different source?",
               "Why do you think the narrator chose to seat the police officers directly above the hidden body?",
               "The narrator believes the officers are mocking them by smiling and chatting. Do you think this is true, or is it a figment of their imagination?",
               "How does the narrator's claim of heightened senses both support and contradict their argument of being sane?",
               "If the narrator had not confessed, would the police have discovered the crime based on the evidence presented?",
               "If the story were set in modern times, how might technology like surveillance cameras or smartphones have changed the narrator's actions or the outcome of the story?"]

In [432]:
dfs_poe = compare_pipeline_configurations(poe_queries, "The_Tell_Tale_Heart.pdf", 
                                      chunking_method=["sentence"], 
                                      embedding_tokenizers=["all-mpnet-base-v2"], 
                                      similarity_method="faiss", templates=[template2], 
                                      model_functions=[query_hf_qwen])

100%|██████████| 6/6 [00:00<00:00, 225.28it/s]


In [433]:
dfs_kafka = compare_pipeline_configurations(kafka_queries, "In the Penal Colony Author Franz Kafka.pdf", 
                                      chunking_method=["sentence"], 
                                      embedding_tokenizers=["all-mpnet-base-v2"], 
                                      similarity_method="faiss", templates=[template2], 
                                      model_functions=[query_hf_qwen])

100%|██████████| 28/28 [00:00<00:00, 335.89it/s]


In [434]:
dfs_dmv = compare_pipeline_configurations(dmv_queries, "Driver Manual.pdf", 
                                      chunking_method=["sentence"], 
                                      embedding_tokenizers=["all-mpnet-base-v2"], 
                                      similarity_method="faiss", templates=[template2], 
                                      model_functions=[query_hf_qwen])

100%|██████████| 81/81 [00:00<00:00, 279.17it/s]


### The Tell Tale Heart - Edgar A. Poe

In [440]:
for _, row in dfs_poe.iterrows():
    print("--------------------")
    print("Query: " + row["Query"])
    print("--------------------")
    print("\nModel response:\n" + row["Response"])
    print("\n*******************************************")

--------------------
Query: How does the narrator attempt to prove their sanity to the reader at the beginning of the story?
--------------------

Model response:
The narrator attempts to prove their sanity to the reader by emphasizing their methodical and cautious approach to their plan. Specifically, they describe in great detail how they carefully and systematically went about their actions, which they argue a madman would not be capable of. For example, the narrator states:

"You should have seen how wisely I proceeded—with what caution— with what foresight—with what dissimulation I went to work! I was never kinder to the old man than during the whole week before I killed him. And every night, about midnight, I turned the latch of his door and opened it—oh, so gently! And then, when I had made an opening sufficient for my head, I put in a dark lantern, all closed, closed, so that no light shone out, and then I thrust in my head. Oh, you would have laughed to see how cunningly I thr

### In the Penal Colony - Franz Kafka

In [441]:
for _, row in dfs_kafka.iterrows():
    print("--------------------")
    print("Query: " + row["Query"])
    print("--------------------")
    print("\nModel response:\n" + row["Response"])
    print("\n*******************************************")

--------------------
Query: What crime did the condemned commit?
--------------------

Model response:
Based on the provided context, the condemned man was sentenced for disobeying and insulting his superior. This information is found on Page [1], where it states:

> "Of course, interest in the execution was not very high, not even in the penal colony itself. At least, here in the small, deep, sandy valley, closed in on all sides by barren slopes, apart from the Officer and the Traveler there were present only the Condemned, a vacant-looking man with a broad mouth and dilapidated hair and face, and the Soldier, who held the heavy chain to which were connected the small chains which bound the Condemned Man by his feet and wrist bones, as well as by his neck, and which were also linked to each other by connecting chains. The Condemned Man had an expression of such dog-like resignation that it looked as if one could set him free to roam around the slopes and would only have to whistle at 

### Driver Manual - NYS

In [442]:
for _, row in dfs_dmv.iterrows():
    print("--------------------")
    print("Query: " + row["Query"])
    print("--------------------")
    print("\nModel response:\n" + row["Response"])
    print("\n*******************************************")

--------------------
Query: What color are the traffic lights?
--------------------

Model response:
The traffic lights are normally red, yellow, and green, arranged from top to bottom or left to right. This information is found on Page [30] of the provided context. 

Quoting from the context:
"Traffic lights are normally red, yellow and green from the top to bottom or left to right." (Page [30])

*******************************************
--------------------
Query: Who has the right of way in an intersection?
--------------------

Model response:
The right of way in an intersection is determined by specific rules that help resolve traffic conflicts. According to the Driver’s Manual (Page 34):

1. **Vehicles Already in the Intersection:**
   - A driver approaching an intersection must yield the right-of-way to traffic that is already in the intersection.
   - **Example:** If you approach an intersection with a green light and another vehicle is already in the intersection making a le