# Midterm Challenge Notebook - Mike Dean

In [1]:
!pip install -qU langchain langchain_openai langchain_core==0.2.40 langchain_community
!pip install -qU qdrant_client pymupdf tiktoken ragas pandas

## Task 1.  Dealing with the Data
(Role: AI Solutions Engineer)

In [9]:
import defaults
llm = defaults.default_llm

In [10]:
# Load PDF documents from a directory
import loadReferenceDocuments
separate_pages, one_document = loadReferenceDocuments.loadReferenceDocuments("References/")

<built-in method count of list object at 0x336d94740>


## Chunking Strategies
#### Ingest the PDF by page - page_split
#### Ingest the PDF by page and recombine into single file and then chunk - chunk_split

In [11]:
import splitAndVectorize

page_split_vectorstore = splitAndVectorize.createVectorstore(
    separate_pages,
    "separate_page_collection",
)

page_split_retriever = page_split_vectorstore.as_retriever()

chunk_split_vectorstore = splitAndVectorize.createVectorstore(
    one_document,
    "chunk_split_collection",
    chunk_size=800,
    chunk_overlap=400,
)

chunk_split_retriever = chunk_split_vectorstore.as_retriever()


## Task 1 Deliverables:
1.  Describe the default chunking strategy that I will use:<br>
The default strategy will be to load the two PDF files using `PyMuPDFLoader` just as we have previously done.  This results in each PDF page being its own document.  I have checked sample pages with `tiktoken` and the token count per page is <1000, so these are small enough to just embed without further splitting. I saved these embeddings in `page_split_vectorstore`.

2.  Articulate a chunking strategy that I will also test:<br>
The disadvantage of the default strategy is that there is no chunk overlapping between the pages, and this might worsen the ability connect two pages that are both relevant to a query.  So I  recombine the page_content of all pages into a single string, convert it into a document, and split it with a chunk size of 800 and an overlap of 400 (the default settings used by OpenAI).  This strategy allows chunks to overlap, pehaps adding semantic continuity between adjacent pages.  These were embedded with the same embedding model and saved in `chunk_split_vectorstore`.

3.  Describe how and why I made these decisions:<br>
The default behavior of `PyMuPDFLoader` is not bad and I have been using it for several months.  However, I was splitting each of the documents created, not thinking through that if I had chunk sizes greater than the page itself, this was meaningless.  I also had chunk overlap, but had not thought through the implications of each page being a separate document.  So I made these decision for this Midterm Challenge so I can later compare the performance using RAGAS in Task 5.

## Task 2.  Building a Quick End-to-End Prototype
(Role: AI Systems Engineer)

In [12]:
import prompts
from langchain.prompts import ChatPromptTemplate
from operator import itemgetter
from langchain.schema.output_parser import StrOutputParser

rag_prompt = ChatPromptTemplate.from_template(prompts.rag_prompt_template)

page_split_rag_chain = (
    {"context": itemgetter("question") | page_split_retriever, "question": itemgetter("question")}
    | rag_prompt | llm | StrOutputParser()
)

chunk_split_rag_chain = (
    {"context": itemgetter("question") | chunk_split_retriever, "question": itemgetter("question")}
    | rag_prompt | llm | StrOutputParser()
)

In [13]:
from IPython.display import Markdown, display

## TEST THE TWO CHAINS
page_response = (page_split_rag_chain.invoke({"question": "List the ten major risks of AI?"}))
display(Markdown(page_response))

chunk_response = (chunk_split_rag_chain.invoke({"question": "What are some risks of AI?"}))
display(Markdown(chunk_response))

Based on the provided context, the major risks unique to or exacerbated by Generative AI (GAI) include:

1. Confabulation
2. Dangerous or Violent Recommendations
3. Data Privacy
4. Value Chain and Component Integration
5. Harmful Bias
6. Homogenization
7. CBRN Information or Capabilities (Chemical, Biological, Radiological, and Nuclear)
8. Human-AI Configuration
9. Obscene, Degrading, and/or Abusive Content
10. Information Integrity

Some risks of AI include:

1. **Confabulation:** The production of confidently stated but erroneous or false content that can mislead or deceive users.
2. **Dangerous, Violent, or Hateful Content:** Eased production and access to violent, inciting, radicalizing, or threatening content, including recommendations to carry out self-harm or illegal activities.
3. **Data Privacy:** Leakage and unauthorized use, disclosure, or de-anonymization of personally identifiable information or sensitive data.
4. **Environmental Impacts:** High compute resource utilization in training or operating AI models that may adversely impact ecosystems.
5. **Harmful Bias or Homogenization:** Amplification and exacerbation of historical, societal, and systemic biases; performance disparities between sub-groups or languages due to non-representative training data.
6. **Malicious Use:** Eased access to or synthesis of nefarious information or design capabilities related to dangerous materials or agents, such as chemical, biological, radiological, or nuclear weapons.
7. **Human-AI Configuration:** Risks arising from the interaction between humans and AI systems, including abuse, misuse, and unsafe repurposing by humans.

## Task 2 Deliverables:
1.  Build a live public prototype on Hugging Face, and include the public URL link to my space.<br>

Here is a one minute Loom video that demonstrates the prototype running in Hugging Face.
https://www.loom.com/share/70b741d3e4e14af792572b3aa9106463?sid=5aeb2e51-0f75-4c74-9f8c-6bc6d14aaa52

I used the page split retriever for this prototype, meaning that the documents were broken by page, not recombined, and were chunked as whole pages without overlap.

2.  How did I choose my stack, and why did I select each tool the way it Did? <br>

My stack consists of the following:
- Hardware is Apple Mac Studio
- Editor is VSC as recommended, and I have grown to like it very much  because it includes everything.
- Qdrant is the vector store.  I have used FAISS and Chroma, but Qdrant is fantastic.  I run it locally as a server, though for this Hugging Face situation, I am using a memory-based implementation.
- ChainLit is the interface, as recommended.  Notably, the current version of ChainLit does NOT work on Hugging Face, and the version needs to be locked (chainlit==0.7.700).
- RAGAS is part of my stack for purposes of later evaluation.
- LangChain is used to help simplify the code.

I learned an important lesson about software engineering - notebooks are NOT good ways to organize!  They are great for teaching.  So I reverted to my last 30 years of software development, and started refactoring code OUT OF THE NOTEBOOK, and then calling it inside the notebook.  So you (or others) can use the notebook as a navigation tool, but it doesn't become so unwieldly that you can't figure anything out.



## Task 3.  Creating a Golden Test Data Set
(Role: AI Evaluation and Performance Engineer)

In [14]:
import splitAndVectorize
# create a new splitting without embedding

eval_documents = splitAndVectorize.split_into_chunks(
    one_document,
    chunk_size=400,
    chunk_overlap=100,
)
len(eval_documents)


318

## The following code was used to create tests but I have commented out to avoid repetition.
## I created 50 pairs instead of 20.

In [15]:
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
import defaults

## llm and embedding models were set earlier
# generator_llm = llm
# critic_llm = llm
# embeddings = defaults.default_embedding_model

# generator = TestsetGenerator.from_langchain(
#     generator_llm,
#     critic_llm,
#     embeddings
# )

# distributions = {
#     simple: 0.5,
#     multi_context: 0.4,
#     reasoning: 0.1
# }

# num_qa_pairs = 50 # I increased this from 20 because I have a surplus of OpenAI credits

## I ALREADY RAN THIS SO HAVE COMMENTED IT OUT HERE SO I DON'T DO IT AGAIN
## I WILL READ IN THE CSV FILE TO CONTINUE
# testset = generator.generate_with_langchain_docs(eval_documents, num_qa_pairs, distributions)


### Get the test data from the stored CSV file

In [16]:
# READ IN THE TESTSET FROM THE CSV FILE
import pandas as pd
test_df = pd.read_csv("testset.csv")
test_questions = test_df["question"].values.tolist()
test_groundtruths = test_df["ground_truth"].values.tolist()

In [17]:
from langchain_core.runnables import RunnablePassthrough

retrieval_augmented_qa_chain_chunk = (
    {"context": itemgetter("question") | chunk_split_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | llm, "context": itemgetter("context")}
)
retrieval_augmented_qa_chain_paged = (
    {"context": itemgetter("question") | page_split_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | llm, "context": itemgetter("context")}
)


In [18]:
import evaluateRAGAS
results = evaluateRAGAS.evaluateRAGAS(retrieval_augmented_qa_chain_chunk,
                                                       test_questions, test_groundtruths)

Evaluating:   0%|          | 0/200 [00:00<?, ?it/s]

No statements were generated from the answer.


In [19]:

print(results)

{'faithfulness': 0.8993, 'answer_relevancy': 0.9060, 'context_recall': 0.9150, 'context_precision': 0.8983}


In [20]:
import evaluateRAGAS
paged_results = evaluateRAGAS.evaluateRAGAS(retrieval_augmented_qa_chain_paged,
                                                       test_questions, test_groundtruths)

Evaluating:   0%|          | 0/200 [00:00<?, ?it/s]

No statements were generated from the answer.


In [21]:
print(paged_results)

{'faithfulness': 0.8705, 'answer_relevancy': 0.9228, 'context_recall': 0.8600, 'context_precision': 0.8933}


In [22]:
df_chunked = pd.DataFrame(list(results.items()), columns=['Metric', 'total_chunked'])
df_paged = pd.DataFrame(list(paged_results.items()), columns=['Metric', 'separate_pages'])
df_merged = pd.merge(df_paged, df_chunked, on='Metric')
df_merged

Unnamed: 0,Metric,separate_pages,total_chunked
0,faithfulness,0.870491,0.899317
1,answer_relevancy,0.922842,0.905958
2,context_recall,0.86,0.915
3,context_precision,0.893333,0.898333


## Task 3 Deliverables:
1.  Assess my pipeline using the RAGAS framework including key metrics faithfulness, answer relevancy, context precision, and context recall.  Provide a table of my output results.<br>

I did the evaluation of my prototype model which was based on the PDF documents being divided by pages, but while I was here, I compared this with the other strategy, which was to recombine all the text and then split by chunks with overlap.

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Metric</th>
      <th>separate_pages</th>
      <th>total_chunked</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>faithfulness</td>
      <td>0.870491</td>
      <td>0.899317</td>
    </tr>
    <tr>
      <th>1</th>
      <td>answer_relevancy</td>
      <td>0.922842</td>
      <td>0.905958</td>
    </tr>
    <tr>
      <th>2</th>
      <td>context_recall</td>
      <td>0.860000</td>
      <td>0.915000</td>
    </tr>
    <tr>
      <th>3</th>
      <td>context_precision</td>
      <td>0.893333</td>
      <td>0.898333</td>
    </tr>
  </tbody>
</table>
</div>

When the PDF document is separated by page, and then those pages are INDIVIDUALLY chunked or embedded, the context recall is somewhat less, but other parameters are not really striking.  Both strategies need to be assessed later when we use the finetuned embedding model.

2.  What conclusions can I draw about performance and effectiveness of my pipeline with this information? <br>

- Faithfulness: Measures whether all claims or statements in the answer can be completely inferred from the context that was provided.  The value is the percentage of claims that can be inferred over the total number of claims.  
- Answer relevancy: Measures whether the answer is relevant to the question. It does not matter if the answer is actually correct - only that it directly answers the question without redundancy. 
- Context recall: This measures whether the facts that are in the ground truth reference answer can be inferred from the context that was provided to the LLM.  In a perfect situation, every statement in the ground truth should be able to be linked to the context.  
- Context precision: This measures whether all the elements in the ground truth are in the highest ranked parts of the context.  

**OVERALL CONCLUSION:**<br> 
Not really any significant differences here, though I suspect that letting the pages be kept as separate documents is going to be inferior in the long run.



## Task 4.  Fine-Tuning Open-Source Embeddings
(Role: Machine Learning Engineer)

I performed the fine tuning in a separate notebook (FineTunePartTwo.ipynb) that you can find in this location:
https://github.com/mdean77a/AIE4/blob/main/Midterm/FineTunePartTwo.ipynb

I did this separately because I anticipated needing to use Colab.  To my utter surprise, the training actually worked on my Mac Studio (M1 32 gb, 323 sec) and I could reproduce it on M3 laptop (128 gb, 106 sec).  So I didn't end up having to wrestle with Colab.


## Task 4 Deliverables:
1.  Swap out my existing embedding model for the new fine tuned version.  Provide a link to m fine-tuned embedding model on the Hugging Face Hub.<br>

https://huggingface.co/Mdean77/finetuned_arctic

2.  How did I choose the embedding model for this application?<br>

I selected Snowflake/snowflake-arctic-embed-m because it improved dramatically in our previous exercise with it.

## Task 5.  Assessing Performance
(Role: AI Evaluation and Performance Engineer)

In [23]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("Mdean77/finetuned_arctic")

In [24]:
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
finetune_embeddings = HuggingFaceEmbeddings(model_name="Mdean77/finetuned_arctic")

In [28]:
import splitAndVectorize

# page_split_vectorstore was created earlier, and uses te3 embedder.
page_split_vectorstore = splitAndVectorize.createVectorstore(
    separate_pages,
    "separate_page_collection",
)

arctic_page_split_vectorstore = splitAndVectorize.createVectorstore(
    separate_pages,
    "arctic_separate_page_collection",
    embedding_model=finetune_embeddings,
)

#need to RESPLIT the chunk split for testing 
chunk_split_vectorstore = splitAndVectorize.createVectorstore(
    one_document,
    "te3_chunk_split_collection",
    chunk_size=250,
    chunk_overlap=50,
    embedding_model=finetune_embeddings,
)

# now make chunk split with arctic
arctic_chunk_split_vectorstore = splitAndVectorize.createVectorstore(
    one_document,
    "arctic_chunk_split_collection",
    chunk_size=250,
    chunk_overlap=50,
)


In [29]:
## HERE ARE OUR RETRIEVERS

te3_page_split_retriever = page_split_vectorstore.as_retriever()
arctic_page_split_retriever = arctic_page_split_vectorstore.as_retriever()
te3_chunk_split_retriever = chunk_split_vectorstore.as_retriever()
arctic_chunk_split_retriever = arctic_chunk_split_vectorstore.as_retriever()

## HERE ARE OUR RETRIEVAL CHAINS

te3_chunk_split_chain = (
    {"context": itemgetter("question") | te3_chunk_split_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | llm, "context": itemgetter("context")}
)
te3_page_split_chain = (
    {"context": itemgetter("question") | te3_page_split_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | llm, "context": itemgetter("context")}
)
arctic_chunk_split_chain = (
    {"context": itemgetter("question") | arctic_chunk_split_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | llm, "context": itemgetter("context")}
)
arctic_page_split_chain = (
    {"context": itemgetter("question") | arctic_page_split_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | llm, "context": itemgetter("context")}
)

In [31]:
### NOW RUN OUR EVALUATIONS!
te3_chunk_results = evaluateRAGAS.evaluateRAGAS(te3_chunk_split_chain, test_questions, test_groundtruths)
te3_page_results = evaluateRAGAS.evaluateRAGAS(te3_page_split_chain, test_questions, test_groundtruths)
arctic_chunk_results = evaluateRAGAS.evaluateRAGAS(arctic_chunk_split_chain, test_questions, test_groundtruths)
arctic_page_results = evaluateRAGAS.evaluateRAGAS(arctic_page_split_chain, test_questions, test_groundtruths)

Evaluating:   0%|          | 0/200 [00:00<?, ?it/s]

No statements were generated from the answer.


Evaluating:   0%|          | 0/200 [00:00<?, ?it/s]

No statements were generated from the answer.


Evaluating:   0%|          | 0/200 [00:00<?, ?it/s]

No statements were generated from the answer.


Evaluating:   0%|          | 0/200 [00:00<?, ?it/s]

No statements were generated from the answer.


In [33]:
# Make a table of results!
df_te3_paged = pd.DataFrame(list(te3_page_results.items()), columns=['Metric', "TE3 paged"])
df_te3_chunked = pd.DataFrame(list(te3_chunk_results.items()), columns=['Metric', 'TE3 chunked'])
df_arctic_paged = pd.DataFrame(list(arctic_page_results.items()), columns=['Metric', 'ARCTIC paged'])
df_arctic_chunked = pd.DataFrame(list(arctic_chunk_results.items()), columns=['Metric', 'ARCTIC chunked'])
df_merged = df_te3_paged.merge(df_te3_chunked, on='Metric').merge(df_arctic_paged, on='Metric').merge(df_arctic_chunked, on='Metric')

df_merged

Unnamed: 0,Metric,TE3 paged,TE3 chunked,ARCTIC paged,ARCTIC chunked
0,faithfulness,0.885625,0.870754,0.900033,0.900109
1,answer_relevancy,0.961167,0.909941,0.90993,0.947806
2,context_recall,0.85,0.825,0.87,0.831667
3,context_precision,0.898889,0.874444,0.875556,0.862222


## Task 5 Deliverables:
1.  Test the fine-tuned embedding model using the RAGAS frameworks to quantify any improvements.  Provide results in a table.<br>
2.  Test the two chunking strategies using the RAGAS frameworks to quantify any improvements.  Provide results in a table.<br>
The tables are combined and shown here:
<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Metric</th>
      <th>TE3 paged</th>
      <th>TE3 chunked</th>
      <th>ARCTIC paged</th>
      <th>ARCTIC chunked</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>faithfulness</td>
      <td>0.885625</td>
      <td>0.870754</td>
      <td>0.900033</td>
      <td>0.900109</td>
    </tr>
    <tr>
      <th>1</th>
      <td>answer_relevancy</td>
      <td>0.961167</td>
      <td>0.909941</td>
      <td>0.909930</td>
      <td>0.947806</td>
    </tr>
    <tr>
      <th>2</th>
      <td>context_recall</td>
      <td>0.850000</td>
      <td>0.825000</td>
      <td>0.870000</td>
      <td>0.831667</td>
    </tr>
    <tr>
      <th>3</th>
      <td>context_precision</td>
      <td>0.898889</td>
      <td>0.874444</td>
      <td>0.875556</td>
      <td>0.862222</td>
    </tr>
  </tbody>
</table>
</div>




Previous results earlier in notebook with different chunking (using the TE3):
<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Metric</th>
      <th>separate_pages</th>
      <th>total_chunked</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>faithfulness</td>
      <td>0.870491</td>
      <td>0.899317</td>
    </tr>
    <tr>
      <th>1</th>
      <td>answer_relevancy</td>
      <td>0.922842</td>
      <td>0.905958</td>
    </tr>
    <tr>
      <th>2</th>
      <td>context_recall</td>
      <td>0.860000</td>
      <td>0.915000</td>
    </tr>
    <tr>
      <th>3</th>
      <td>context_precision</td>
      <td>0.893333</td>
      <td>0.898333</td>
    </tr>
  </tbody>
</table>
</div>

3.  The AI Solutions Engineer asks me "Which one is the best to test with internal stakeholders next week, and why?<br>

I have included the results from the four way comparison, but also brought down the chunking comparison that I did earlier for Task 3.  It is striking that there is variability.  Of these four metrics, the faithfulness metric is probably the most important, and the finetuned Snowflake outperforms TE3.  My gestalt from looking at these tables, and also common sense, suggests that combining the PDF pages and THEN chunking them is sensible.  I may have compromised performance here compared with the separate pages because I made the chunks small - the separate pages were embedded together and have more content.  

If you compare the TE3 chunked with the ARCTIC chunked, the ARCITC model is clearly better.  So my recommendation is that we use the finetuned model, but that we experiment with chunk sizes.

In [36]:
## 800 with 400 overlap
arctic_800_400_chunks_vectorstore = splitAndVectorize.createVectorstore(
    one_document,
    "arctic_800_400_chunk_collection",
    chunk_size=800,
    chunk_overlap=400,
)
arctic_800_400_chunk_retriever = arctic_800_400_chunks_vectorstore.as_retriever()

arctic_800_400_chunk_chain = (
    {"context": itemgetter("question") | arctic_800_400_chunk_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | llm, "context": itemgetter("context")}
)
arctic_800_400_chunk_results = evaluateRAGAS.evaluateRAGAS(arctic_800_400_chunk_chain, test_questions, test_groundtruths)

## 1200 with 400 overlap
arctic_1200_400_chunks_vectorstore = splitAndVectorize.createVectorstore(
    one_document,
    "arctic_1200_400_chunk_collection",
    chunk_size=1200,
    chunk_overlap=400,
)
arctic_1200_400_chunk_retriever = arctic_1200_400_chunks_vectorstore.as_retriever()

arctic_1200_400_chunk_chain = (
    {"context": itemgetter("question") | arctic_1200_400_chunk_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | llm, "context": itemgetter("context")}
)
arctic_1200_400_chunk_results = evaluateRAGAS.evaluateRAGAS(arctic_1200_400_chunk_chain, test_questions, test_groundtruths)

# 600 with 100 overlap
arctic_600_100_chunks_vectorstore = splitAndVectorize.createVectorstore(
    one_document,
    "arctic_600_100_chunk_collection",
    chunk_size=600,
    chunk_overlap=100,
)
arctic_600_100_chunk_retriever = arctic_600_100_chunks_vectorstore.as_retriever()

arctic_600_100_chunk_chain = (
    {"context": itemgetter("question") | arctic_600_100_chunk_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | llm, "context": itemgetter("context")}
)
arctic_600_100_chunk_results = evaluateRAGAS.evaluateRAGAS(arctic_600_100_chunk_chain, test_questions, test_groundtruths)

# Make a table of results!
df_600_100 = pd.DataFrame(list(arctic_600_100_chunk_results.items()), columns=['Metric', "600/100"])
df_800_400 = pd.DataFrame(list(arctic_800_400_chunk_results.items()), columns=['Metric', '800/400'])
df_1200_400 = pd.DataFrame(list(arctic_1200_400_chunk_results.items()), columns=['Metric', '1200/400'])

df_merged = df_600_100.merge(df_800_400, on='Metric').merge(df_1200_400, on='Metric')

df_merged

Evaluating:   0%|          | 0/200 [00:00<?, ?it/s]

No statements were generated from the answer.


Evaluating:   0%|          | 0/200 [00:00<?, ?it/s]

No statements were generated from the answer.


Evaluating:   0%|          | 0/200 [00:00<?, ?it/s]

No statements were generated from the answer.


Unnamed: 0,Metric,600/100,800/400,1200/400
0,faithfulness,0.892048,0.931814,0.883423
1,answer_relevancy,0.935133,0.931497,0.951927
2,context_recall,0.915,0.915,0.92
3,context_precision,0.882778,0.908333,0.901111


Unnamed: 0,Metric,Larger Chunks
0,faithfulness,0.872457
1,answer_relevancy,0.908182
2,context_recall,0.845
3,context_precision,0.863333


## Task 6.  Managing Your Boss and User Expectations
(Role: SVP of Technology)

## Task 6 Deliverables:
1.  What is the story that I will give to the CEO to tell the whole company at the launch next month?<br>

2.  There appears to be important information not included in our build.  How might we incorporate relevant white-house briefing information in future versions? <br>
