# Evaluating the Ideal Chunk Size for a RAG System using LlamaIndex

# **Introduction**

Retrieval-augmented generation (RAG) has introduced an innovative approach that fuses the extensive retrieval capabilities of search systems with the LLM. When implementing a RAG system, one critical parameter that governs the system’s efficiency and performance is the `chunk_size`. How does one discern the optimal chunk size for seamless retrieval? This is where LlamaIndex `Response Evaluation` comes handy. In this blogpost, we'll guide you through the steps to determine the best `chunk size` using LlamaIndex’s `Response Evaluation` module. If you're unfamiliar with the `Response` Evaluation module, we recommend reviewing its [documentation](https://docs.llamaindex.ai/en/latest/core_modules/supporting_modules/evaluation/modules.html) before proceeding.

## **Why Chunk Size Matters**

Choosing the right `chunk_size` is a critical decision that can influence the efficiency and accuracy of a RAG system in several ways:

1. **Relevance and Granularity**: A small `chunk_size`, like 128, yields more granular chunks. This granularity, however, presents a risk: vital information might not be among the top retrieved chunks, especially if the `similarity_top_k` setting is as restrictive as 2. Conversely, a chunk size of 512 is likely to encompass all necessary information within the top chunks, ensuring that answers to queries are readily available. To navigate this, we employ the Faithfulness and Relevancy metrics. These measure the absence of ‘hallucinations’ and the ‘relevancy’ of responses based on the query and the retrieved contexts respectively.
2. **Response Generation Time**: As the `chunk_size` increases, so does the volume of information directed into the LLM to generate an answer. While this can ensure a more comprehensive context, it might also slow down the system. Ensuring that the added depth doesn't compromise the system's responsiveness is crucial.

In essence, determining the optimal `chunk_size` is about striking a balance: capturing all essential information without sacrificing speed. It's vital to undergo thorough testing with various sizes to find a configuration that suits the specific use-case and dataset.

## **Setup**

Before embarking on the experiment, we need to ensure all requisite modules are imported:

In [11]:
!pip install llama-index llama-index-embeddings-openai spacy

Defaulting to user installation because normal site-packages is not writeable
Collecting spacy
  Downloading spacy-3.7.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (27 kB)
Collecting spacy-legacy<3.1.0,>=3.0.11 (from spacy)
  Downloading spacy_legacy-3.0.12-py2.py3-none-any.whl.metadata (2.8 kB)
Collecting spacy-loggers<2.0.0,>=1.0.0 (from spacy)
  Downloading spacy_loggers-1.0.5-py3-none-any.whl.metadata (23 kB)
Collecting murmurhash<1.1.0,>=0.28.0 (from spacy)
  Downloading murmurhash-1.0.10-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.0 kB)
Collecting cymem<2.1.0,>=2.0.2 (from spacy)
  Downloading cymem-2.0.8-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (8.4 kB)
Collecting preshed<3.1.0,>=3.0.2 (from spacy)
  Downloading preshed-3.0.9-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.2 kB)
Collecting thinc<8.3.0,

In [1]:
import nest_asyncio

nest_asyncio.apply()

from llama_index.core import (
    SimpleDirectoryReader,
    VectorStoreIndex,
    ServiceContext,
)
from llama_index.core.evaluation import (
    DatasetGenerator,
    FaithfulnessEvaluator,
    RelevancyEvaluator
)
from llama_index.llms.openai import OpenAI

import openai
import time
openai.api_key = ''

  from .autonotebook import tqdm as notebook_tqdm


## **Load Data**

Let’s load our document.

In [50]:
# Load Data
# reader = SimpleDirectoryReader("../data/web-software-development-1-0/", recursive=True)
document_base_path = "../data/web-software-development-1-0/"
documents_path = f"{document_base_path}13-working-with-databases-i/"
reader = SimpleDirectoryReader(documents_path, recursive=True)

documents = reader.load_data()
print(len(documents))

14


## **Question Generation**

To select the right `chunk_size`, we'll compute metrics like Average Response time, Faithfulness, and Relevancy for various `chunk_sizes`. The `DatasetGenerator` will help us generate questions from the documents.

In [10]:
# To evaluate for each chunk size, we will first generate a set of 40 questions from first 20 pages.
eval_documents = documents[:20]
print("Amount of documents: ", len(eval_documents))

# data_generator = DatasetGenerator.from_documents(documents)
# eval_questions = data_generator.generate_questions_from_nodes(num = 40)

# generated from above, hardcoded to save costs
all_eval_questions = ['What is the importance of using a database in web applications?',
                      'What database management system will be used in this course?',
                      'What are the learning objectives related to working with databases?',
                      'Where can you find a tutorial for SQL basics if you need a refresher?',
                      'How can you start using PostgreSQL according to the document?',
                      'What is the recommended approach for taking PostgreSQL into use for development?',
                      #   'What is the purpose of the Walking skeleton in relation to PostgreSQL?',
                      'What are some options for running PostgreSQL locally?',
                      'Name two hosted services that provide PostgreSQL as a service.',
                      #   'Why does the document strongly recommend using the first option for development when starting to use PostgreSQL?',
                      'What are two options for starting to use PostgreSQL as mentioned in the document?',
                      'What are some examples of hosted services that provide PostgreSQL databases?',
                      #   'Why does the document strongly recommend using the first option for development?',
                      'How can you get started with ElephantSQL according to the document?',
                      "What attributes are included in the table created in the document's example using SQL?",
                      'How can you add names to the table in ElephantSQL according to the document?',
                      "What SQL query can you use to select all rows from the 'names' table in ElephantSQL?",
                      "What library is used in the document's example to access the database programatically?",
                      #   'What information is grayed out in the image of the ElephantSQL details page?',
                      "What is the purpose of the 'id' attribute in the table created in the document's example?",
                      #   'What library is used in the first example to access a PostgreSQL database in the provided code snippet?',
                      #   'How can you specify the database credentials when using Postgres.js in the provided code snippet?',
                      #   'What is the purpose of the `max: 2` parameter in the Postgres.js example?',
                      #   'In the second example, what library is used to access a PostgreSQL database?',
                      'What is the recommended alternative to Deno Postgres mentioned in the document?',
                      #   'How can you establish a connection to a PostgreSQL database using Deno Postgres in the provided code snippet?',
                      #   'What query is executed in the Deno Postgres example to retrieve data from the database?',
                      'What is the significance of having a database client when working with databases?',
                      'What is the default database client mentioned in the document for accessing a PostgreSQL database?',
                      'Where can you find a list of PostgreSQL clients for different operating systems according to the document?',
                      'What database driver is used when working with Deno and PostgreSQL in the provided document?',
                      'How can you create a database client using the Postgres.js driver?',
                      #   'In the example code provided, what SQL query is being executed to retrieve data from the database?',
                      'How does Postgres.js ensure safe query generation when constructing SQL queries?',
                      'What is the purpose of the `sql` function in the Postgres.js driver?',
                      #   'In the example code, how is the result data iterated over to print only the name property?',
                      #   'What SQL statement is used to insert data into a database in the provided document?',
                      #   'After inserting a new name into the database, how many names are present in the database according to the output?',
                      'What flag is required to be used with Deno when working with the Postgres.js driver?',
                      'How can you access the Postgres.js documentation for further details on tagged template literals?']


eval_questions = [
    'What is the significance of having a database client when working with databases?',
    'What is the default database client mentioned in the document for accessing a PostgreSQL database?',
    'Where can you find a list of PostgreSQL clients for different operating systems according to the document?',
]
print("=== EVAL QUESTIONS ===")
print(eval_questions)


Amount of documents:  14
=== EVAL QUESTIONS ===
['What is the importance of using a database in web applications?', 'What database management system will be used in this course?', 'What are the learning objectives related to working with databases?', 'Where can you find a tutorial for SQL basics if you need a refresher?', 'How can you start using PostgreSQL according to the document?', 'What is the recommended approach for taking PostgreSQL into use for development?', 'What are some options for running PostgreSQL locally?', 'Name two hosted services that provide PostgreSQL as a service.', 'What are two options for starting to use PostgreSQL as mentioned in the document?', 'What are some examples of hosted services that provide PostgreSQL databases?', 'How can you get started with ElephantSQL according to the document?', "What attributes are included in the table created in the document's example using SQL?", 'How can you add names to the table in ElephantSQL according to the document?'

## Setting Up Evaluators

We are setting up the GPT-4 model to serve as the backbone for evaluating the responses generated during the experiment. Two evaluators, `FaithfulnessEvaluator` and `RelevancyEvaluator`, are initialised with the `service_context` .

1. **Faithfulness Evaluator** - It is useful for measuring if the response was hallucinated and measures if the response from a query engine matches any source nodes.
2. **Relevancy Evaluator** - It is useful for measuring if the query was actually answered by the response and measures if the response + source nodes match the query.

In [8]:
from llama_index.core import Settings
# We will use GPT-4 for evaluating the responses

llm_evaluate = OpenAI(temperature=0, model="gpt-3.5-turbo")
Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)

# Define service context for llm evaluation
service_context_gpt = ServiceContext.from_defaults(llm=llm_evaluate)

# Define Faithfulness and Relevancy Evaluators
faithfulness_gpt = FaithfulnessEvaluator(service_context=service_context_gpt)
relevancy_gpt = RelevancyEvaluator(service_context=service_context_gpt)

gpt-3.5-turbo


  service_context_gpt = ServiceContext.from_defaults(llm=llm_evaluate)


ValueError: 
******
Could not load OpenAI embedding model. If you intended to use OpenAI, please check your OPENAI_API_KEY.
Original error:
No API key found for OpenAI.
Please set either the OPENAI_API_KEY environment variable or openai.api_key prior to initialization.
API keys can be found or created at https://platform.openai.com/account/api-keys

Consider using embed_model='local'.
Visit our documentation for more embedding options: https://docs.llamaindex.ai/en/stable/module_guides/models/embeddings.html#modules
******

# Debugging local embeddings

In [52]:
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings

from llama_index.core.schema import IndexNode
from llama_index.core import (
    load_index_from_storage,
    StorageContext,
    VectorStoreIndex,
)
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import SummaryIndex
from llama_index.core.retrievers import RecursiveRetriever
import os
# from tqdm.notebook import tqdm
import pickle


def build_index(docs, chunk_size, out_path: str):
    print("Chunk size: ", chunk_size)

    embed_model = OpenAIEmbedding(model="text-embedding-3-large",
                                  chunk_size=chunk_size,
                                  )
    Settings.embed_model = embed_model

    nodes = []

    splitter = SentenceSplitter(
        chunk_size=chunk_size, chunk_overlap=chunk_size/4)
    for idx, doc in enumerate(docs):
        print('Splitting: ' + str(idx))

        cur_nodes = splitter.get_nodes_from_documents([doc])
        for cur_node in cur_nodes:
            # ID will be base + parent
            file_path = doc.metadata["file_path"].split(document_base_path)[1]
            new_node = IndexNode(
                text=cur_node.text or "None",
                index_id=str(file_path),
                metadata=doc.metadata,
                # obj=doc
            )
            nodes.append(new_node)
        

        # Debugging
        print(len(cur_nodes), len(str(doc)), len(str(cur_nodes[0])))
        for xyz in cur_nodes:
            print(xyz)
            print("-")
        print()
        print("----DOC-----")
        print(doc)

        print()
        print()

    print("num nodes: " + str(len(nodes)))

    service_context = ServiceContext.from_defaults(
        llm=llm_evaluate, embed_model=embed_model)

    # save index to disk
    if not os.path.exists(out_path):
        index = VectorStoreIndex(nodes, service_context=service_context)
        index.set_index_id("simple_index")
        index.storage_context.persist(f"./{out_path}")
    else:
        # rebuild storage context
        storage_context = StorageContext.from_defaults(
            persist_dir=f"./{out_path}"
        )
        # load index
        index = load_index_from_storage(
            storage_context, index_id="simple_index", service_context=service_context
            # storage_context, index_id="simple_index", embed_model=embed_model
        )

    return index


build_index(eval_documents, 1024, "Test")

Chunk size:  1024
Splitting: 0
3 398 399
Node ID: a9108669-4796-4216-9135-a70d84c8a297
Text: --- title: "Getting Started" order: 1 published: true ---
<LearningObjectives>  - Knows different options for using a database.
- Knows how to create and access to a database.  </LearningObjectives>
So far, we've worked with web applications that lose their data once
the server is restarted. For the data to be persisted and for it to be
avail...
-
Node ID: 60d50383-32c0-4ccd-9bcb-2696a227f61d
Text: ## Starting to use PostgreSQL  Taking PostgreSQL into use can be
done using multiple approaches. Here, we'll list two options.  - Using
the walking skeleton as recommended in [Course Tools](/web-software-
development-1-0/1-introduction-and-tooling/3-course-tools/) or
otherwise running PostgreSQL locally using Docker or some other
virtualization s...
-
Node ID: bf18903e-c4b3-436c-9586-703d5cce5c2c
Text: ## Example with Postgres.js  Now, you could also run a program
that accesses the database. The foll

  service_context = ServiceContext.from_defaults(


KeyboardInterrupt: 

# vs. Storing embeddings (Weaviate)

In [12]:

from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings

from llama_index.core.schema import IndexNode
from llama_index.core import (
    load_index_from_storage,
    StorageContext,
    VectorStoreIndex,
)
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import SummaryIndex
from llama_index.core.retrievers import RecursiveRetriever
import os
# from tqdm.notebook import tqdm
import pickle

import weaviate
from llama_index.vector_stores.weaviate import WeaviateVectorStore
from IPython.display import Markdown, display

# local
client = weaviate.Client("http://localhost:8080")


def build_index(docs, chunk_size, out_path: str):
    print("Chunk size: ", chunk_size)

    embed_model = OpenAIEmbedding(model="text-embedding-3-large",
                                  chunk_size=chunk_size,
                                  )
    Settings.embed_model = embed_model

    nodes = []

    splitter = SentenceSplitter(
        chunk_size=chunk_size, chunk_overlap=chunk_size/4)
    for idx, doc in enumerate(docs):
        print('Splitting: ' + str(idx))

        cur_nodes = splitter.get_nodes_from_documents([doc])
        for cur_node in cur_nodes:
            # ID will be base + parent
            file_path = doc.metadata["file_path"].split(document_base_path)[1]
            new_node = IndexNode(
                text=cur_node.text or "None",
                index_id=str(file_path),
                metadata=doc.metadata,
                # obj=doc
            )
            nodes.append(new_node)

    print("num nodes: " + str(len(nodes)))

    service_context = ServiceContext.from_defaults(
        llm=llm_evaluate, embed_model=embed_model)


    index_name = f"LlamaIndexDemo{chunk_size}"
    print("Schema exists already", client.schema.exists(index_name))

    # save index to disk if does not exist
    if not client.schema.exists(index_name):
        vector_store = WeaviateVectorStore(
            weaviate_client=client, index_name=index_name
        )
        storage_context = StorageContext.from_defaults(vector_store=vector_store)
        index = VectorStoreIndex.from_documents(
            documents, storage_context=storage_context, service_context=service_context
        )
    else:
        # load index
        vector_store = WeaviateVectorStore(
            weaviate_client=client, index_name=index_name
        )
        index = VectorStoreIndex.from_vector_store(vector_store, service_context=service_context)

    return index


            Please consider upgrading to the latest version. See https://weaviate.io/developers/weaviate/client-libraries/python for details.


In [7]:
chunk_sizes = [128, 128, 256, 512, 1024, 2048]
similarities_top_k = [10, 8, 6, 4, 2, 1]

chunk_sizes_vector_only = [] # used if we also want to evaluate vector only instead of hybrid search
similarities_top_k_vector_only = []

n_runs = len(chunk_sizes) + len(chunk_sizes_vector_only)

answers = [[] for _ in range(n_runs)]

time_scores = [[] for _ in range(n_runs)]
faithfulness_scores = [[] for _ in range(n_runs)]
relevancy_scores = [[] for _ in range(n_runs)]

faithfulness_false = [[] for _ in range(n_runs)]
relevancy_false = [[] for _ in range(n_runs)]


## **Response Evaluation For A Chunk Size**

We evaluate each chunk_size based on 3 metrics.

1. Average Response Time.
2. Average Faithfulness.
3. Average Relevancy.

Here's a function, `evaluate_response_time_and_accuracy`, that does just that which has:

1. VectorIndex Creation.
2. Building the Query Engine**.**
3. Metrics Calculation.

In [None]:
# Helper methods

def build_query_engine(vector_index, similarity_top_k, is_hybrid):
    if is_hybrid:
        query_engine = vector_index.as_query_engine(
            similarity_top_k=similarity_top_k, embed_model=Settings.embed_model,
            vector_store_query_mode="hybrid", alpha=0.0  # BM25
        )
        return query_engine
    # -- VEC ONLY --
    query_engine = vector_index.as_query_engine(
        similarity_top_k=similarity_top_k, embed_model=Settings.embed_model,
    )
    return query_engine


def get_document_paths():
    pass


def add_docpath_to_ctx():
    pass

In [19]:
# Define function to calculate average response time, average faithfulness and average relevancy metrics for given chunk size
# We use GPT-3.5-Turbo to generate response and GPT-4 to evaluate it.
Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)


def evaluate_response_time_and_accuracy(chunk_size, similarity_top_k, run_i, eval_questions=eval_questions, label="default", is_hybrid=True):
    """
    Evaluate the average response time, faithfulness, and relevancy of responses for a given chunk size.
    """

    # total_response_time = 0
    # total_faithfulness = 0
    # total_relevancy = 0
    # num_questions = len(eval_questions)

    # Load vector_index
    vector_index = build_index(
        eval_documents, chunk_size,
        f"vector_stores/openai_{chunk_size}-{'hyb' if is_hybrid else 'vec'}-{label}")


    # Build query engine
    query_engine = build_query_engine(vector_index, similarity_top_k, is_hybrid)
    

    # Iterate over each question in eval_questions to compute metrics.
    print("=========== QA pairs")
    for question in eval_questions:
        print("--Q\n", question, "\n--")
        start_time = time.time()
        response_vector = query_engine.query(question)
        print("--A\n", str(response_vector), "\n--")
        answers[run_i].append(response_vector)

        elapsed_time = time.time() - start_time

        faithfulness_result = faithfulness_gpt.evaluate_response(
            response=response_vector
        ).passing

        relevancy_result = relevancy_gpt.evaluate_response(
            query=question, response=response_vector
        ).passing

        if not faithfulness_result:
            faithfulness_false[run_i].append((question, str(response_vector)))
        if not relevancy_result:
            relevancy_false[run_i].append((question, str(response_vector)))

        # total_response_time += elapsed_time
        # total_faithfulness += faithfulness_result
        # total_relevancy += relevancy_result

        time_scores[run_i].append(elapsed_time)
        faithfulness_scores[run_i].append(faithfulness_result)
        relevancy_scores[run_i].append(relevancy_result)

        print(
            f"t={elapsed_time}, f={faithfulness_result}, r={relevancy_result}\n-------")

    print("===========")
    # average_response_time = total_response_time / num_questions
    # average_faithfulness = total_faithfulness / num_questions
    # average_relevancy = total_relevancy / num_questions

    # return average_response_time, average_faithfulness, average_relevancy


In [67]:

# from llama_index.core.base.response.schema import Response

# Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)

# def evaluate_response_time_and_accuracy_without_rag(eval_questions=eval_questions):
#     """
#     Evaluate the average response time, faithfulness, and relevancy of responses generated by GPT-3.5-turbo for a given chunk size.

#     Parameters:
#     chunk_size (int): The size of data chunks being processed.

#     Returns:
#     tuple: A tuple containing the average response time, faithfulness, and relevancy metrics.
#     """

#     total_response_time = 0
#     total_faithfulness = 0
#     total_relevancy = 0

#     # By default, similarity_top_k is set to 2. To experiment with different values, pass it as an argument to as_query_engine()
#     num_questions = len(eval_questions)

#     # index = VectorStoreIndex(nodes=[])
#     # query_engine = index.as_query_engine(llm=OpenAI())


#     # Iterate over each question in eval_questions to compute metrics.
#     # While BatchEvalRunner can be used for faster evaluations (see: https://docs.llamaindex.ai/en/latest/examples/evaluation/batch_eval.html),
#     # we're using a loop here to specifically measure response time for different chunk sizes.
#     print("=========== QA pairs")
#     for question in eval_questions:
#         print("--Q\n", question, "\n--")
#         start_time = time.time()

#         response = str(OpenAI().complete(question))
#         response_vector = Response(response)
#         # response_vector = query_engine.query(question)

#         print("--A\n", response, "\n--")
#         elapsed_time = time.time() - start_time

#         faithfulness_result = faithfulness_gpt.evaluate_response(
#             response=response_vector
#         ).passing

#         relevancy_result = relevancy_gpt.evaluate_response(
#             query=question, response=response_vector
#         ).passing

#         total_response_time += elapsed_time
#         total_faithfulness += faithfulness_result
#         total_relevancy += relevancy_result

#         # TODO: both response and retrieval evaluation

#     print("===========")
#     average_response_time = total_response_time / num_questions
#     average_faithfulness = total_faithfulness / num_questions
#     average_relevancy = total_relevancy / num_questions

#     return average_response_time, average_faithfulness, average_relevancy

## **Testing Across Different Chunk Sizes**

We'll evaluate a range of chunk sizes to identify which offers the most promising metrics

In [20]:
from statistics import mean

# Iterate over different chunk sizes to evaluate the metrics to help fix the chunk size.
for chunk_size, similarity_top_k, run_i in zip(chunk_sizes, similarities_top_k, range(n_runs)):
    evaluate_response_time_and_accuracy(
        chunk_size, similarity_top_k, run_i, eval_questions=eval_questions)

response_model_name = Settings.llm.model
evaluation_model_name = llm_evaluate.model

print("============= STATS ============")
print(f"n_questions: {len(eval_questions)}")
print(f"response model: {response_model_name}")
print(f"evaluation model: {evaluation_model_name}")
print("============= MODELS ===========")

for i in range(n_runs):
    time_avg = mean(time_scores[i])
    faithfulness_avg = mean(faithfulness_scores[i])
    relevancy_avg = mean(relevancy_scores[i])
    print(
        f"(hybr-{response_model_name}-{chunk_sizes[i]}*{similarities_top_k[i]}) - avg res time: {time_avg:.2f}s, avg faithfulness: {faithfulness_avg:.2f}, avg relevancy: {relevancy_avg:.2f}")


print("============= QA ============")

for i, question in enumerate(eval_questions):
    print(f"---Q evaluated by {evaluation_model_name}: {question}")
    for i in range(n_runs):
        print(
            f"(hybr-{response_model_name}-{chunk_sizes[i]}*{similarities_top_k[i]}): {answers[run_i][i]}")


Chunk size:  128
Splitting: 0
Splitting: 1
Splitting: 2
Splitting: 3
Splitting: 4
Splitting: 5
Splitting: 6
Splitting: 7
Splitting: 8
Splitting: 9
Splitting: 10
Splitting: 11
Splitting: 12
Splitting: 13
num nodes: 1330
Schema exists already True
--Q
 What is the importance of using a database in web applications? 
--


  service_context = ServiceContext.from_defaults(


--A
 Using a database in web applications is crucial for maintaining data persistence across server restarts, ensuring data availability, and facilitating scalability as the application and user base expand. Databases provide an efficient way to store and retrieve data, offering structured data management capabilities. This allows for seamless data access and manipulation, supporting functionalities like data addition, listing, and removal. Moreover, databases play a key role in upholding data integrity, security, and consistency within web applications, ultimately enhancing user experience and application performance. 
--
t=6.58270788192749, f=True, r=True
-------
--Q
 What database management system will be used in this course? 
--
--A
 PostgreSQL 
--
t=4.220039367675781, f=True, r=True
-------
--Q
 What are the learning objectives related to working with databases? 
--
--A
 The learning objectives related to working with databases include understanding how to create and manage datab

# Evaluating responses without RAG

## Warning
From the way that the questions are stated, it might be confusing for the LLM to provide a response to them or impossible to response.
For example:
- "What database management system will be used in this course?"
- "Why does the document strongly recommend using the first option for development when starting to use PostgreSQL?"
 
These have to be cleaned manually before running the evaluation.

In [69]:
# avg_response_time, avg_faithfulness, avg_relevancy = evaluate_response_time_and_accuracy_without_rag(eval_questions=eval_questions)
# print(f"Average Response time: {avg_response_time:.2f}s, Average Faithfulness: {avg_faithfulness:.2f}, Average Relevancy: {avg_relevancy:.2f}")


--Q
 What is the importance of using a database in web applications? 
--
--A
 Using a database in web applications is important for several reasons:

1. Data storage: Databases provide a structured way to store and organize data, making it easier to retrieve and manipulate information. This is essential for web applications that need to store user information, product details, and other data.

2. Data retrieval: Databases allow for efficient retrieval of data, enabling web applications to quickly access and display information to users. This helps improve the performance and responsiveness of the application.

3. Data consistency: Databases help maintain data consistency by enforcing rules and constraints on the data stored within them. This ensures that the data remains accurate and reliable, even as the application grows and evolves.

4. Data security: Databases provide security features such as user authentication, access control, and encryption to protect sensitive information from