# **Experiment 1: Hybrid/Ensemble Retriever**

Combine sparse retriever (BM25 that is based on keyword search) with dense retriever (FAISS that is based on embedding similarity/semantic similarity).

Results are reranked based on the Reciprocal Rank Fusion algorithm (RRF).

The Hybrid/Ensemble approach, combining basic keyword and similarity search, could be applicable to our use case to retrieve a interpolation of keyword-wise relevant (BM25) and semantically-similar (FAISS) results.

In [1]:
%pip install --quiet --upgrade bitsandbytes langchain langchain-community langchain-huggingface transformers beautifulsoup4 faiss-gpu rank_bm25 lark

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.1/44.1 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m122.4/122.4 MB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m39.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.0/10.0 MB[0m [31m105.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.5/85.5 MB[0m [31m8.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m111.0/111.0 kB[0m [31m7.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m79.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.5/49.5 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
from langchain_core.documents import Document
from langchain.retrievers import EnsembleRetriever # Supports Ensembling of results from multiple retrievers
from langchain_community.retrievers import BM25Retriever
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
import faiss
from langchain_community.docstore.in_memory import InMemoryDocstore
import nltk
from nltk.corpus import stopwords
import re
import pandas as pd
import os

## **User Action Required**

1. Run the code below to create the ```data``` folder

2. Upload the following files to the ```data``` folder
- ```iceland_articles_updated.csv```
- ```finland_articles_updated.csv```


In [3]:
data_folder = os.path.join(os.getcwd(), 'data')
os.makedirs(data_folder, exist_ok=True)

Your folder structure should now look as such:

```
data
  - iceland_articles_updated.csv
  - finland_articles_updated.csv
```

## **Simple experiment to easily understand behaviour of individual retrievers**

**Setup Simple Experiment Data for Experiment 1**

We first set up simple experiment data where we have 2 short documents for each activity category. This is done to have an easier <u>preliminary understanding of the behaviour of the individual retrievers</u> as we are able to quickly look at what words our data contains.

In [5]:
# Simple experiment example data
docs = [
    Document(
        page_content="The best hikes in Norway include the Reinebringen hike in the Lofoten islands. At a modest 448 meters high, Reinebringen is far from one of the highest peaks on the Lofoten islands. Yet this is more than made up for by the iconic view from the summit of Reine. It is not suitable for winter! Also, the trail can be quite demanding as the steps are quite steep.",
        metadata={"activity": 'Hiking', "country": 'Norway'},
    ),
    Document(
        page_content="Unique hike that can be done are volcanic hikes which can be done in Iceland. It is recommended to go with a tour of experienced people!",
        metadata={"activity": 'Hiking', "country": 'Iceland'},
    ),
    Document(
        page_content="Popular food in Norway is seafood! The best seafood in the Nordic region can be found in Norway. The seafood is freshly caught from the arctic ocean. Popular choices include the famous norwegian salmon. Other delicacies include whale steak!",
        metadata={"activity": 'Food', "country": 'Norway'},
    ),
    Document(
        page_content="The famous street food of Iceland is the Hotdog! It is called the Baejarins Beztu Pylsur hot dog is made of a mix of lamb, beef and pork. Other delicacies of iceland include Fish and Chips as well as Tommi's burger.",
        metadata={"activity": 'Food', "country": 'Norway'},
    ),
    Document(
        page_content="Transportation within Reykjavik is fairly convenient as there is a public bus service called BSI. All you need to do is to download their mobile app, follow the instructions, and you're good to go. Transportation to places outside Reykjavik however requires a car. Some options include car rentals as well as booking bus tours.",
        metadata={"activity": 'Transportation', "country": 'Iceland'},
    ),
    Document(
        page_content="Finland is easily accessible with its HSL public transportation services where all you need to do is to download a mobile app and follow the instructions.",
        metadata={"activity": 'Transportation', "country": 'Iceland'},
    ),
    Document(
        page_content="Finland is known for its snowy-like landscape and captivating auroras. One of the best places to stay is the Glass huts in Skyfire village in Rovaniemi, Lapland where you can admire the beautiful northern lights and snowy landscape. The village has its very own restaurant called Sky Huts Restaurant and Bar which offers tailor-made menus by a professional chef using local ingredients.",
        metadata={"activity": 'Accomodation', "country": 'Finland'},
    ),
    Document(
        page_content="A nice place to stay in Norway is the Lofoten Islands, in particlar Unstad which provides a breathtaking view of the mountain valley, ocean, and if you're lucky, northern lights.",
        metadata={"activity": 'Accomodation', "country": 'Norway'},
    ),
]

**Control Variables for Simple Experiment Data**

For simplicity sake, we
- set the number of documents retrieved to be 5
- set the weights for each retriever to be 0.5
- do not chunk/split documents as our examples are short enough

We also fix the embeddings model and vector store index for FAISS based on our prior research
- embeddings model: all-mpnet-base-v2
- index for FAISS: IndexFlatL2

In [6]:
num_docs_retrieved = 5
embeddings_model_name = "sentence-transformers/all-mpnet-base-v2" # https://www.sbert.net/docs/sentence_transformer/pretrained_models.html
embeddings_model = HuggingFaceEmbeddings(model_name=embeddings_model_name)
vector_store_index = faiss.IndexFlatL2(len(embeddings_model.embed_query("hello world")))
bm25_weight = 0.5
faiss_weight = 1-bm25_weight

  from tqdm.autonotebook import tqdm, trange


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [7]:
# Initialise BM25 retreiver (keyword-based)
bm25_retriever = BM25Retriever.from_documents(
    docs
)
bm25_retriever.k = num_docs_retrieved

# Initialise the FAISS retriever (semantic-similarity)
embeddings_model = HuggingFaceEmbeddings(model_name=embeddings_model_name)
faiss_vector_store = FAISS(
    embedding_function=embeddings_model,
    index=vector_store_index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={},
)

faiss_vector_store.add_documents(docs)
faiss_retriever = faiss_vector_store.as_retriever(search_type="similarity", search_kwargs={"k": num_docs_retrieved})

# Initialise the hybrid/ensemble retriever
# Uses RRF to sum the rankings of each doc from both retrievers, discounting rankings that are lower.
ensemble_retriever = EnsembleRetriever(
    retrievers=[bm25_retriever, faiss_retriever], weights=[bm25_weight, faiss_weight]
)

**Test Query 1: Testing Hybrid/Ensemble Retriever**

In [8]:
test_query_1 = "What are the best hikes?"

In [9]:
ensemble_retriever.invoke(test_query_1)

[Document(metadata={'activity': 'Hiking', 'country': 'Norway'}, page_content='The best hikes in Norway include the Reinebringen hike in the Lofoten islands. At a modest 448 meters high, Reinebringen is far from one of the highest peaks on the Lofoten islands. Yet this is more than made up for by the iconic view from the summit of Reine. It is not suitable for winter! Also, the trail can be quite demanding as the steps are quite steep.'),
 Document(metadata={'activity': 'Hiking', 'country': 'Iceland'}, page_content='Unique hike that can be done are volcanic hikes which can be done in Iceland. It is recommended to go with a tour of experienced people!'),
 Document(metadata={'activity': 'Accomodation', 'country': 'Finland'}, page_content='Finland is known for its snowy-like landscape and captivating auroras. One of the best places to stay is the Glass huts in Skyfire village in Rovaniemi, Lapland where you can admire the beautiful northern lights and snowy landscape. The village has its v

**Test Query 2: Testing Hybrid/Ensemble Retriever**

In [10]:
test_query_2 = "What are the best food?"

In [11]:
ensemble_retriever.invoke(test_query_2)

[Document(metadata={'activity': 'Hiking', 'country': 'Norway'}, page_content='The best hikes in Norway include the Reinebringen hike in the Lofoten islands. At a modest 448 meters high, Reinebringen is far from one of the highest peaks on the Lofoten islands. Yet this is more than made up for by the iconic view from the summit of Reine. It is not suitable for winter! Also, the trail can be quite demanding as the steps are quite steep.'),
 Document(metadata={'activity': 'Food', 'country': 'Norway'}, page_content='Popular food in Norway is seafood! The best seafood in the Nordic region can be found in Norway. The seafood is freshly caught from the arctic ocean. Popular choices include the famous norwegian salmon. Other delicacies include whale steak!'),
 Document(metadata={'activity': 'Accomodation', 'country': 'Finland'}, page_content='Finland is known for its snowy-like landscape and captivating auroras. One of the best places to stay is the Glass huts in Skyfire village in Rovaniemi, 

**Questions about Hybrid/Ensemble Experiment**
- For <u>Test Query 2</u>, why is a hiking related document being returned first when we are making a query about food?

**Investigating BM25**

<u>Running BM25 with Test Query 2</u>

In [12]:
bm25_retriever.invoke(test_query_2)

[Document(metadata={'activity': 'Hiking', 'country': 'Norway'}, page_content='The best hikes in Norway include the Reinebringen hike in the Lofoten islands. At a modest 448 meters high, Reinebringen is far from one of the highest peaks on the Lofoten islands. Yet this is more than made up for by the iconic view from the summit of Reine. It is not suitable for winter! Also, the trail can be quite demanding as the steps are quite steep.'),
 Document(metadata={'activity': 'Hiking', 'country': 'Iceland'}, page_content='Unique hike that can be done are volcanic hikes which can be done in Iceland. It is recommended to go with a tour of experienced people!'),
 Document(metadata={'activity': 'Food', 'country': 'Norway'}, page_content='Popular food in Norway is seafood! The best seafood in the Nordic region can be found in Norway. The seafood is freshly caught from the arctic ocean. Popular choices include the famous norwegian salmon. Other delicacies include whale steak!'),
 Document(metadata=

<u>Running BM25 with the keyword that we want: 'food'</u>

In [13]:
bm25_retriever.invoke("food")

[Document(metadata={'activity': 'Food', 'country': 'Norway'}, page_content='Popular food in Norway is seafood! The best seafood in the Nordic region can be found in Norway. The seafood is freshly caught from the arctic ocean. Popular choices include the famous norwegian salmon. Other delicacies include whale steak!'),
 Document(metadata={'activity': 'Food', 'country': 'Norway'}, page_content="The famous street food of Iceland is the Hotdog! It is called the Baejarins Beztu Pylsur hot dog is made of a mix of lamb, beef and pork. Other delicacies of iceland include Fish and Chips as well as Tommi's burger."),
 Document(metadata={'activity': 'Accomodation', 'country': 'Norway'}, page_content="A nice place to stay in Norway is the Lofoten Islands, in particlar Unstad which provides a breathtaking view of the mountain valley, ocean, and if you're lucky, northern lights."),
 Document(metadata={'activity': 'Accomodation', 'country': 'Finland'}, page_content='Finland is known for its snowy-lik

<u>Findings from BM25</u>

When using BM25 with the full query, why were the hiking-related documents ranked higher than the food-related documents?
- Other terms in the query are also considered (because stopwords are not removed) into the the BM25 calculation. For instance, the word 'best' and 'are' which are present in the hiking-related documents.


<u>How can we improve BM25</u>
- Drop stopwords from the query so that they are not factored into the BM25 calculation

In [14]:
nltk.download('stopwords')
print("The stopwords provided by the nltk library include:")
print(stopwords.words('english'))
nltk_stopwords = stopwords.words('english')

The stopwords provided by the nltk library include:
['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


In [15]:
full_query = test_query_2
query_words = re.findall(r'\b\w+\b', full_query)
keywords = [keyword for keyword in query_words if keyword.lower() not in nltk_stopwords]
full_query_keywords_only = ' '.join(keywords)
print(f'The original query is: "{full_query}"')
print(f'The new query after removing stopwords is: "{full_query_keywords_only}"')
print("The result is: \n")
bm25_retriever.invoke(full_query_keywords_only)

The original query is: "What are the best food?"
The new query after removing stopwords is: "best food"
The result is: 



[Document(metadata={'activity': 'Food', 'country': 'Norway'}, page_content='Popular food in Norway is seafood! The best seafood in the Nordic region can be found in Norway. The seafood is freshly caught from the arctic ocean. Popular choices include the famous norwegian salmon. Other delicacies include whale steak!'),
 Document(metadata={'activity': 'Food', 'country': 'Norway'}, page_content="The famous street food of Iceland is the Hotdog! It is called the Baejarins Beztu Pylsur hot dog is made of a mix of lamb, beef and pork. Other delicacies of iceland include Fish and Chips as well as Tommi's burger."),
 Document(metadata={'activity': 'Accomodation', 'country': 'Finland'}, page_content='Finland is known for its snowy-like landscape and captivating auroras. One of the best places to stay is the Glass huts in Skyfire village in Rovaniemi, Lapland where you can admire the beautiful northern lights and snowy landscape. The village has its very own restaurant called Sky Huts Restaurant 

**We can see an improvement in the BM25 ranking by using removing stopwords from the query**

**Investigating FAISS**

<u>Running FAISS (IndexL2) with the Test Query 2</u>

In [16]:
faiss_retriever.invoke(test_query_2)

[Document(metadata={'activity': 'Food', 'country': 'Norway'}, page_content='Popular food in Norway is seafood! The best seafood in the Nordic region can be found in Norway. The seafood is freshly caught from the arctic ocean. Popular choices include the famous norwegian salmon. Other delicacies include whale steak!'),
 Document(metadata={'activity': 'Food', 'country': 'Norway'}, page_content="The famous street food of Iceland is the Hotdog! It is called the Baejarins Beztu Pylsur hot dog is made of a mix of lamb, beef and pork. Other delicacies of iceland include Fish and Chips as well as Tommi's burger."),
 Document(metadata={'activity': 'Hiking', 'country': 'Norway'}, page_content='The best hikes in Norway include the Reinebringen hike in the Lofoten islands. At a modest 448 meters high, Reinebringen is far from one of the highest peaks on the Lofoten islands. Yet this is more than made up for by the iconic view from the summit of Reine. It is not suitable for winter! Also, the trail

It can be observed that the embedding similarity approach works well even though the words in the query are not an exact match, where relevant documents are retrieved at the top. This is because of the vector similarity between words in the user query and document.

<br/>
<br/>
<br/>

## **Full experiment to find best weightages for improved Hybrid/Ensemble Approach**

In this experiment, we test the retrieval of the Hybrid/Ensemble Approach (with the removal of stopwords) by adjusting the different weightages, using the full data and test queries

**Define Function to Remove Stopwords**

In [17]:
def remove_stopwords(query, stopwords):
  query_words = re.findall(r'\b\w+\b', query)
  keywords = [keyword for keyword in query_words if keyword.lower() not in stopwords]
  return ' '.join(keywords)

**Control Variables**

- Set number of documents retrieved to be 10
- Fix the embeddings model and vector store index for FAISS based on our prior research
  - embeddings model: all-mpnet-base-v2
  - index for FAISS: IndexFlatL2
- Fix the document chunk/splitting method based on our prior research
  - RecursiveCharacterTextSplitter
    - chunk_size=1000
    - chunk_overlap=200


In [None]:
num_docs_retrieved = 10
embeddings_model_name = "sentence-transformers/all-mpnet-base-v2" # https://www.sbert.net/docs/sentence_transformer/pretrained_models.html
vector_store_index = faiss.IndexFlatL2(len(embeddings_model.embed_query("hello world")))
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200, add_start_index=True
)
chunked_docs = text_splitter.split_documents(docs)

**Experimental Variables**
- Weightage of bm25 (keyword similarity) and FAISS (vector similarity)


In [18]:
bm25_weights = [0.0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0]
faiss_weights = list(reversed(bm25_weights))

**Experiment Test Queries**

In [19]:
test_queries = ['what is the best food to eat in Finland?', 'what is the best food to eat in Iceland?']

**Setup folder structure and filepaths**

In [20]:
article_names = ['finland_articles_updated.csv', 'iceland_articles_updated.csv']
article_fps = [os.path.join('.', 'data', article_name) for article_name in article_names]

In [21]:
output_folder_binary = os.path.join(os.getcwd(), 'experiment_1_output', 'binary_relevance')
output_folder_score = os.path.join(os.getcwd(), 'experiment_1_output', 'score_relevance')
output_folders = [output_folder_binary, output_folder_score]

for output_folder in output_folders:
  os.makedirs(output_folder, exist_ok=True)

In [22]:
docs = []
for article_fp in article_fps:
  df = pd.read_csv(article_fp)
  for _, row in df.iterrows():
    text = row['Title'] + " " + row['Content']

    doc = Document(
        page_content=text,
        metadata={'country': row['Country'], 'source': row['Source'], 'link': row['Article Links']}
    )

    docs.append(doc)

In [24]:
# Initialise BM25 retreiver (keyword-based)
bm25_retriever = BM25Retriever.from_documents(
    chunked_docs
)
bm25_retriever.k = num_docs_retrieved

# Initialise the FAISS retriever (semantic-similarity)
embeddings_model = HuggingFaceEmbeddings(model_name=embeddings_model_name)
faiss_vector_store = FAISS(
    embedding_function=embeddings_model,
    index=vector_store_index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={},
)
faiss_vector_store.add_documents(chunked_docs)
faiss_retriever = faiss_vector_store.as_retriever(search_type="similarity", search_kwargs={"k": num_docs_retrieved})

In [25]:
# Initialise the hybrid/ensemble retriever with different weightages
# Uses RRF to sum the rankings of each doc from both retrievers, discounting rankings that are lower.
for i in range(len(bm25_weights)):

  ensemble_retriever = EnsembleRetriever(
      retrievers=[bm25_retriever, faiss_retriever], weights=[bm25_weights[i], faiss_weights[i]]
  )

  f_name_binary = os.path.join(output_folder_binary, f'hybrid_retriever__bm25_{bm25_weights[i]}_faiss_{faiss_weights[i]}_binary_relevance.csv')
  f_name_score = os.path.join(output_folder_score, f'hybrid_retriever__bm25_{bm25_weights[i]}_faiss_{faiss_weights[i]}_score_relevance.csv')
  results = []

  for q in test_queries:
    # Remove stopwords from query to improve BM25 performance
    q = remove_stopwords(q, nltk_stopwords)
    retrieved_docs = [d.page_content for d in ensemble_retriever.invoke(q)]

    for idx,retrieved_doc in enumerate(retrieved_docs):
        results.append({
            'idx': idx,
            'query': q,
            'retrieved_doc': retrieved_doc,
            'relevant': ''
        })

  results_df = pd.DataFrame(results)
  results_df.to_csv(f_name_binary, index=False)
  results_df.to_csv(f_name_score, index=False)


Your folder structure should now look as such:

```
data
  - iceland_articles_updated.csv
  - finland_articles_updated.csv

experiment_1_output
  - binary_relevance
    - hybrid_retriever__bm25_0.0_faiss_1.0_binary_relevance.csv
    - hybrid_retriever__bm25_0.1_faiss_0.9_binary_relevance.csv
    - hybrid_retriever__bm25_0.2_faiss_0.8_binary_relevance.csv
    - hybrid_retriever__bm25_0.3_faiss_0.7_binary_relevance.csv
    - hybrid_retriever__bm25_0.4_faiss_0.6_binary_relevance.csv
    - hybrid_retriever__bm25_0.5_faiss_0.5_binary_relevance.csv
    - hybrid_retriever__bm25_0.6_faiss_0.4_binary_relevance.csv
    - hybrid_retriever__bm25_0.7_faiss_0.3_binary_relevance.csv
    - hybrid_retriever__bm25_0.8_faiss_0.2_binary_relevance.csv
    - hybrid_retriever__bm25_0.9_faiss_0.1_binary_relevance.csv
    - hybrid_retriever__bm25_1.0_faiss_0.0_binary_relevance.csv
  - score_relevance
    - hybrid_retriever__bm25_0.0_faiss_1.0_score_relevance.csv
    - hybrid_retriever__bm25_0.1_faiss_0.9_score_relevance.csv
    - hybrid_retriever__bm25_0.2_faiss_0.8_score_relevance.csv
    - hybrid_retriever__bm25_0.3_faiss_0.7_score_relevance.csv
    - hybrid_retriever__bm25_0.4_faiss_0.6_score_relevance.csv
    - hybrid_retriever__bm25_0.5_faiss_0.5_score_relevance.csv
    - hybrid_retriever__bm25_0.6_faiss_0.4_score_relevance.csv
    - hybrid_retriever__bm25_0.7_faiss_0.3_score_relevance.csv
    - hybrid_retriever__bm25_0.8_faiss_0.2_score_relevance.csv
    - hybrid_retriever__bm25_0.9_faiss_0.1_score_relevance.csv
    - hybrid_retriever__bm25_1.0_faiss_0.0_score_relevance.csv
```

## **User Action Required**

Download the folder ```experiment_1_output``` (if its not on your local environment) and start scoring the files :)

<br/>
<br/>
<br/>

## **Conclusion for Experiment 1: Hybrid/Ensemble Retriever**

We can improve Hybrid/Ensemble Retriever by removing stopwords to increase the performance of BM25.