<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Knowledge-base-chat-with-automatic-data-filtering-use-function-calling" data-toc-modified-id="Knowledge-base-chat-with-automatic-data-filtering-use-function-calling-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Knowledge base chat with automatic data filtering use function calling</a></span><ul class="toc-item"><li><span><a href="#Key-Features:" data-toc-modified-id="Key-Features:-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Key Features:</a></span></li></ul></li><li><span><a href="#Prep-Data" data-toc-modified-id="Prep-Data-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Prep Data</a></span></li><li><span><a href="#Imports" data-toc-modified-id="Imports-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Imports</a></span></li><li><span><a href="#Data-preprocess-utilities" data-toc-modified-id="Data-preprocess-utilities-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Data preprocess utilities</a></span></li><li><span><a href="#Load-and-split-data" data-toc-modified-id="Load-and-split-data-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Load and split data</a></span><ul class="toc-item"><li><span><a href="#Split-Dataframe-with-Metadata" data-toc-modified-id="Split-Dataframe-with-Metadata-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Split Dataframe with Metadata</a></span></li><li><span><a href="#Embed-and-save-to-local-dir" data-toc-modified-id="Embed-and-save-to-local-dir-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>Embed and save to local dir</a></span><ul class="toc-item"><li><span><a href="#Quick-cost-estimation" data-toc-modified-id="Quick-cost-estimation-5.2.1"><span class="toc-item-num">5.2.1&nbsp;&nbsp;</span>Quick cost estimation</a></span></li><li><span><a href="#Embed-and-save" data-toc-modified-id="Embed-and-save-5.2.2"><span class="toc-item-num">5.2.2&nbsp;&nbsp;</span>Embed and save</a></span></li></ul></li></ul></li><li><span><a href="#Retrieval-Augmented-Generation-with-data-filter-(via-function-calling)" data-toc-modified-id="Retrieval-Augmented-Generation-with-data-filter-(via-function-calling)-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Retrieval Augmented Generation with data filter (via function calling)</a></span><ul class="toc-item"><li><span><a href="#Helper-function-for-semantic-search-&amp;-filtering" data-toc-modified-id="Helper-function-for-semantic-search-&amp;-filtering-6.1"><span class="toc-item-num">6.1&nbsp;&nbsp;</span>Helper function for semantic search &amp; filtering</a></span></li><li><span><a href="#Function-calling-ultilities-and-schema" data-toc-modified-id="Function-calling-ultilities-and-schema-6.2"><span class="toc-item-num">6.2&nbsp;&nbsp;</span>Function calling ultilities and schema</a></span></li><li><span><a href="#Chat-with-automatically-filtered,-more-relevant-data." data-toc-modified-id="Chat-with-automatically-filtered,-more-relevant-data.-6.3"><span class="toc-item-num">6.3&nbsp;&nbsp;</span>Chat with automatically filtered, more relevant data.</a></span><ul class="toc-item"><li><span><a href="#Chat-with-dataset" data-toc-modified-id="Chat-with-dataset-6.3.1"><span class="toc-item-num">6.3.1&nbsp;&nbsp;</span>Chat with dataset</a></span></li><li><span><a href="#Check-the-context-dataframe-this-answer-is-based-on" data-toc-modified-id="Check-the-context-dataframe-this-answer-is-based-on-6.3.2"><span class="toc-item-num">6.3.2&nbsp;&nbsp;</span>Check the context dataframe this answer is based on</a></span></li></ul></li><li><span><a href="#Another-Example" data-toc-modified-id="Another-Example-6.4"><span class="toc-item-num">6.4&nbsp;&nbsp;</span>Another Example</a></span></li></ul></li></ul></div>

# Knowledge base chat with automatic data filtering use function calling
---

This notebook illustrates a workflow tailored for knowledge-based chat interactions that utilize automatic data filtering through function calls. 

It's common to target specific data subsets; for example, in an Earnings Call Transcript dataset (with multiple companies and many calls across whole year), one might only interested in Nvidia's Q3 performance. However, a simple semantic search, even when enriched with date augmentation, often isn't comprehensive enough. This notebook provides a method for precise question-answering using contextually filtered data.


Here, Im using a sliced Earnings Call Transcript files from 2020, but this idea can be easily applied to other datasets. 

## Key Features:
1. **Text Splitting and Embeding**: Breaks down large transcripts into manageable chunks for better semantic search, optionally add contextual metadata. 
    - see **split_df_with_metadata** for more details
    

2. **Question Answering With Filtered Data**: Retrieval Augmented Generation with pre-filtered & more relevant data. Using function callings to adapt user questions into actionable queries to filter data, then doing retreivial and question answering. 

    - For instance, asking question like "whats Microsoft says about Azure in Q3 that year?" triggers call like `{'query': 'Azure in Q3', 'company_names': ['Microsoft Corporation (MSFT)'], 'start_date': '2020-07-01', 'end_date': '2020-09-30'}` to first filter dataset for more relevant ones. Before starting the RAG and chat process. 
    - see **get_answer_with_filtered_df** and **chat_completion_with_function_execution** for more details


# Prep Data

# Imports

In [None]:
!pip install requests
!pip install scipy
!pip install tenacity
!pip install langchain  
!pip install tiktoken
!pip install pandas
!pip install openai

In [164]:
import requests
from typing import List, Optional
from scipy import spatial
from tenacity import retry, wait_random_exponential, stop_after_attempt
import json
from IPython.display import display, Markdown, Latex
from langchain.text_splitter import RecursiveCharacterTextSplitter
import tiktoken
import pandas as pd
import openai

GPT_MODEL = "gpt-4-0613"
EMBEDDING_MODEL = "text-embedding-ada-002"
LOCAL_FILENAME = "ECT_selected.feather"

enc = tiktoken.encoding_for_model("gpt-4")
# openai.api_key = "YOUR_API_KEY"

# Data preprocess utilities
Breaks down large transcripts into manageable chunks for better semantic search, optionally add contextual metadata. 

In [133]:
def get_tiktoken_len(text):
    """Calculate the token length for a given text using tiktoken."""
    return len(enc.encode(text))


def split_text_using_threshold(row, target_col_name, splitter):
    """
    Split a text using the recursive splitter based on token count.
    """
    if row["token_count"] > splitter._chunk_size:
        return splitter.split_text(row[target_col_name])
    return [row[target_col_name]]


def add_context_to_text(row, target_col_name, context_col_names):
    """
    Add context to a chunk of text if it's not the first chunk.
    Constructs a dictionary-like context string using multiple columns.
    """
    if row["chunk_id"] == 0:
        return row[target_col_name]

    context_strings = []
    for col in context_col_names:
        value = row[col][
            :300
        ]  # Taking only the first 300 characters for brevity and not break max_token
        context_strings.append(f"{col}: {value}")

    context = "; ".join(context_strings)
    return f"context:{context};\n{row[target_col_name]}"


def split_df_with_metadata(
    df,
    target_col_name,
    max_token_threshold=2500,
    add_context_for_chunks=False,
    context_col_names=[],
    separators=["\n\n", "\n", " ", ""],
):
    """
    Recursively splits texts by separators from a DataFrame while preserving metadata.
    optionally add additional context based on metadata.
    """
    # Initialize token counts if not already present
    if "token_count" not in df.columns:
        df["token_count"] = df[target_col_name].apply(get_tiktoken_len)

    # Initialize the recursive splitter
    splitter = RecursiveCharacterTextSplitter(
        separators=separators,
        chunk_size=max_token_threshold,
        chunk_overlap=100,
        length_function=get_tiktoken_len,
    )

    # Use helper function for splitting
    df_expanded = df.copy()
    df_expanded["chunks"] = df_expanded.apply(
        lambda row: split_text_using_threshold(row, target_col_name, splitter), axis=1
    )

    # Expand the dataframe based on the split chunks
    df_expanded = df_expanded.explode("chunks")
    df_expanded[target_col_name] = df_expanded["chunks"]
    df_expanded["chunk_id"] = df_expanded.groupby(level=0).cumcount()
    df_expanded.drop(columns=["chunks"], inplace=True)

    # Add context if required using helper function
    if add_context_for_chunks:
        df_expanded[target_col_name] = df_expanded.apply(
            lambda row: add_context_to_text(row, target_col_name, context_col_names),
            axis=1,
        )

    # Update token count post processing
    df_expanded["token_count"] = df_expanded[target_col_name].apply(get_tiktoken_len)

    # Remove empty rows
    df_expanded = df_expanded[df_expanded[target_col_name] != ""]
    df_expanded.dropna(subset=[target_col_name], inplace=True)

    return df_expanded


# Embedding helper functions
async def aembedding_query(text, model=EMBEDDING_MODEL):
    response = await openai.Embedding.acreate(input=text, model=model)
    return response["data"][0]["embedding"]


def embedding_query(text, model=EMBEDDING_MODEL):
    response = openai.Embedding.create(input=text, model=model)
    return response["data"][0]["embedding"]


async def embedding_docs(texts, model=EMBEDDING_MODEL):
    responses = await openai.Embedding.acreate(input=texts, model=model)
    embedding_results = [r["embedding"] for r in responses["data"]]
    return embedding_results

# Load and split data

Use a sliced 2020's ECT data (only include Nvidia and Microsoft) for quick demo. 

Raw Data from Kaggle: https://www.kaggle.com/datasets/notis23/earnings-call-us-2020-sentiment-analysis-covid19


In [134]:
ect_data_url = "https://drive.google.com/uc?export=download&id=1vgpNe8ekkCqVqdVdXpoU4QsAvCuKfUp6"
data_df = pd.read_csv(ect_data_url)

In [135]:
data_df.head(2)

Unnamed: 0,file_name,article,company,call_date
0,NVIDIA Corporation (NVDA) CEO Jensen Huang on ...,NVIDIA Corporation (NASDAQ:NVDA) Q4 2020 Earni...,NVIDIA Corporation (NVDA),2020-02-13 21:46:00
1,Microsoft Corporation (MSFT) CEO Satya Nadella...,Microsoft Corporation (NASDAQ:MSFT) Q2 2020 Ea...,Microsoft Corporation (MSFT),2020-01-29 21:38:00


## Split Dataframe with Metadata
The text is divided into smaller chunks, each containing up to 800 tokens. 

Optional context/metadata can be added to each chunk for clarity of the larger document.
(This ensures that every chunk retains a sense of its broader context. For instance, even a segment of an article would be aware of its title and publication date.)


In [136]:
new_df = split_df_with_metadata(
    data_df,
    target_col_name="article",
    max_token_threshold=800, # Use 800 token as chunk_size
    add_context_for_chunks=True,
    context_col_names=["company", "call_date"],
    separators=["\n\n", "\n", " ", ""],
)

In [137]:
new_df.head(2)

Unnamed: 0,file_name,article,company,call_date,token_count,chunk_id
0,NVIDIA Corporation (NVDA) CEO Jensen Huang on ...,NVIDIA Corporation (NASDAQ:NVDA) Q4 2020 Earni...,NVIDIA Corporation (NVDA),2020-02-13 21:46:00,774,0
0,NVIDIA Corporation (NVDA) CEO Jensen Huang on ...,context:company: NVIDIA Corporation (NVDA); ca...,NVIDIA Corporation (NVDA),2020-02-13 21:46:00,797,1


In [138]:
data_df.shape, new_df.shape

((30, 5), (244, 6))

## Embed and save to local dir

### Quick cost estimation
*Token cost as of Oct, 2023.


In [139]:
COST_PER_TOKEN_IN_USD = 0.0000001   # $0.0001 / 1K
embedding_cost = round(new_df.token_count.sum() * COST_PER_TOKEN_IN_USD, 2)
print(f"Total embedding cost: {embedding_cost} USD")

Total embedding cost: 0.02 USD


### Embed and save

In [140]:
# Use async call for quick embedding 
new_df['embedding'] = await embedding_docs(new_df['article'].tolist())

In [141]:
# Save to local dir for loading later. Use feather format for faster loading.
new_df.reset_index().to_feather(LOCAL_FILENAME)

# Retrieval Augmented Generation with data filter (via function calling)
Retrieval Augmented Generation with pre-filtered & more relevant data. 

Using function callings to adapt user questions into actionable queries to filter data, then doing retreivial and question answering.

## Helper function for semantic search & filtering

In [142]:
QA_PROMPT_TEMPLATE = """You are an helpful AI assistant for answering questions about Earnings Call Transcript.
You are given the following extracted parts of a earling calls and a question as reference. Provide a conversational answer.
If you don't know the answer, just say "I'm not sure." Don't try to make up an answer.
Answer in a professional manner, and notice that not all the references are relevant, feel free to ignore parts of it. 

Question: {question}
=========
Reference:
{context}
=========

Answer:"""


def get_relevant_context(
    question,
    df,
    max_context_len=3000,
    embed_model="text-embedding-ada-002",
    top_k=3,
    print_top_relevant_chunks=False,
    max_distance=0.26,
    column_name_map=None,
):
    """
    Create a context for a question by finding the most similar context from dataframe.
    """

    # Default column names
    if column_name_map is None:
        column_name_map = {
            "embedding": "embedding",
            "token_count": "token_count",
            "text": "article",
        }

    question_embedding = embedding_query(question, model=embed_model)

    # Calculate distances from embeddings
    df["distances"] = distances_from_embeddings(
        question_embedding,
        df[column_name_map["embedding"]].values,
        distance_metric="cosine",
    )

    if print_top_relevant_chunks:
        print(df.sort_values("distances", ascending=True).head(3))

    context_list = []
    current_len = 0
    returned_df_idx = None

    # Sort df by distance
    sorted_df = (
        df[df.distances <= max_distance]
        .sort_values("distances", ascending=True)
        .reset_index(drop=True)
    )

    # Add context by top_k, break if reach max_len
    for i, row in sorted_df.iterrows():
        current_len += row[column_name_map["token_count"]]

        if current_len > max_context_len or i + 1 > top_k:
            break

        context_list.append(row[column_name_map["text"]])
        returned_df_idx = i

    # Check if no context was found
    if not context_list:
        print(
            f"No context with sufficient relevancy found, please adjust max_distance if you want results \n(currently threshold: {max_distance}, current nearest dist {round(sorted_df.distances.iloc[0], 2)})) ."
        )
        return None, None

    return "\n\n##Context:\n\n".join(context_list), sorted_df.head(returned_df_idx + 1)


def distances_from_embeddings(
    query_embedding: List[float],
    embeddings: List[List[float]],
    distance_metric="cosine",
) -> List[List]:
    """Return the distances between a query embedding and a list of embeddings."""
    distance_metrics = {
        "cosine": spatial.distance.cosine,
        "L1": spatial.distance.cityblock,
        "L2": spatial.distance.euclidean,
        "Linf": spatial.distance.chebyshev,
    }
    distances = [
        distance_metrics[distance_metric](query_embedding, embedding)
        for embedding in embeddings
    ]
    return distances


def get_filtered_df(
    df: pd.DataFrame,
    start_date=None,
    end_date=None,
    company_names=None,
    company_name_col="company",
    date_col="call_date",
) -> pd.DataFrame:
    if company_names:
        if isinstance(company_names, str):
            company_names = [company_names]
        df = df[df[company_name_col].isin(company_names)]

    # Check and handle date filters
    if start_date or end_date:
        if start_date and end_date and start_date > end_date:
            raise ValueError("start_date should be earlier than or equal to end_date.")

        date_filter = (df[date_col] >= (start_date or df[date_col].min())) & (
            df[date_col] <= (end_date or df[date_col].max())
        )
        df = df[date_filter]

    return df


def get_answer_with_filtered_df(
    query,
    company_names=[],
    start_date=None,
    end_date=None,
    max_tokens=500,
    df_dir=LOCAL_FILENAME,
    max_context_len=3000,
    embed_model="text-embedding-ada-002",
    top_k=5,
    max_distance=0.3,
    print_top_relevant_chunks=False,
):
    # Read saved_df
    df = pd.read_feather(df_dir).reset_index(drop=True)

    # Get filtered_df based on query and company name
    filtered_df = get_filtered_df(
        df,
        start_date=start_date,
        end_date=end_date,
        company_names=company_names,
    )
    print(
        f"Filtered {len(filtered_df)}  relevant document found given the time frame: {start_date}, {end_date} for company: {company_names}. out of {len(df)} docs."
    )
    if len(filtered_df) == 0:
        # mimic the ["choices"][0]["message"]["content"] results
        return {
            "choices": [
                {
                    "message": {
                        "content": f"No relevant document found given the time frame: {start_date}, {end_date} for company: {company_names}."
                    }
                }
            ]
        }

    # Get proper context by question.
    context_str, context_df = get_relevant_context(
        query,
        filtered_df,
        max_context_len=max_context_len,
        embed_model=embed_model,
        top_k=top_k,
        print_top_relevant_chunks=print_top_relevant_chunks,
        max_distance=max_distance,
    )

    # Get answer based on context
    cur_prompt = QA_PROMPT_TEMPLATE.format(context=context_str, question=query)

    answer = openai.ChatCompletion.create(
        model=GPT_MODEL,
        messages=[{"role": "user", "content": cur_prompt}],
        max_tokens=max_tokens,
        temperature=0.2,
    )

    return answer, context_df

## Function calling ultilities and schema
Function calling parts of the code, 

please see the **get_answer_with_filtered_df** and **get_filtered_df** for more details. 

In [161]:
get_answer_with_filtered_df_schema = [
    {
        "name": "get_answer_with_filtered_df",
        "description": """Use this function to get answers based for saved Earning transcript calls. You can use parameters like company names, start and end dates to filter out data to make answer more relevant.
        
        Q1 (First Quarter): January 1 to March 31
        Q2 (Second Quarter): April 1 to June 30
        Q3 (Third Quarter): July 1 to September 30
        Q4 (Fourth Quarter): October 1 to December 31
        """,
        "parameters": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": """
                            Given the conversation and a follow-up question, rephrase the follow-up question to be a standalone question that is not ambiguous. If the given question does not need any additional context, just return the original one as a standalone question, not changing anything. Only change when actually needed.
                            """,
                },
                "company_names": {
                    "type": "array",
                    "items": {
                        "type": "string",
                        "enum": [
                            "NVIDIA Corporation (NVDA)",
                            "Microsoft Corporation (MSFT)",
                        ],
                        "description": """
                            Choose between 'NVIDIA Corporation (NVDA)' and 'Microsoft Corporation (MSFT)'. If the user mentions a company like "nvidia" or "that graphic card company with green logo", help them choose 'NVIDIA Corporation (NVDA)'. If the company is not in the list, respond with "no I don't have this info".
                            """,
                    },
                    "description": "List of company names to filter the dataframe by.",
                },
                "start_date": {
                    "type": "string",
                    "format": "date",
                    "description": """
                            Use this when user ask about questions require sepcific date, 
                            The start date to filter the dataframe by, in the format "YYYY-MM-DD". If not mentioned, do not apply this filter.
                            """,
                },
                "end_date": {
                    "type": "string",
                    "format": "date",
                    "description": """
                            Use this when user ask about questions require sepcific date, 
                            The end date to filter the dataframe by, in the format "YYYY-MM-DD". If not mentioned, do not apply this filter.
                            """,
                },
            },
            "required": ["query"],
        },
    }
]


@retry(wait=wait_random_exponential(min=1, max=40), stop=stop_after_attempt(3))
def chat_completion_request(messages, functions=None, model=GPT_MODEL):
    headers = {
        "Content-Type": "application/json",
        "Authorization": "Bearer " + openai.api_key,
    }
    json_data = {"model": model, "messages": messages}
    if functions is not None:
        json_data.update({"functions": functions})
    try:
        response = requests.post(
            "https://api.openai.com/v1/chat/completions",
            headers=headers,
            json=json_data,
        )
        return response
    except Exception as e:
        print("Unable to generate ChatCompletion response")
        print(f"Exception: {e}")
        return e


# This helper functions for call_helper_function and Conversation class are modified from How_to_call_functions_for_knowledge_retrieval Cookbook example.
class Conversation:
    def __init__(self):
        self.conversation_history = []

    def add_message(self, role, content):
        message = {"role": role, "content": content}
        self.conversation_history.append(message)

    def display_conversation(self, detailed=False):
        role_to_color = {
            "system": "red",
            "user": "green",
            "assistant": "blue",
            "function": "magenta",
        }
        for message in self.conversation_history:
            print(
                f"{message['role']}: {message['content']}\n\n",
                role_to_color[message["role"]],
            )


def chat_completion_with_function_execution(messages, functions=[None]):
    """This function makes a ChatCompletion API call with the option of adding functions"""
    response = chat_completion_request(messages, functions)
    print("response", response)
    full_message = response.json()["choices"][0]
    if full_message["finish_reason"] == "function_call":
        print(f"Function generation requested, calling function")
        response, context_df = call_helper_function(messages, full_message)
        return response, context_df
    else:
        print(f"Function not required, responding to user")
        return response.json(), None


def call_helper_function(messages, full_message):
    if (
        full_message["message"]["function_call"]["name"]
        == "get_answer_with_filtered_df"
    ):
        results = None
        try:
            parsed_output = json.loads(
                full_message["message"]["function_call"]["arguments"]
            )
            print(parsed_output)
            results, context_df = get_answer_with_filtered_df(**parsed_output)
        except Exception as e:
            print(f"Function execution failed")
            print(f"Error message: {e}")
        try:
            return results, context_df

        except Exception as e:
            print(type(e))
            print(e)
            raise Exception("Function chat request failed")
    else:
        raise Exception("Function does not exist and cannot be called")

## Chat with automatically filtered, more relevant data.

In [159]:
# Start with a system message
ect_system_message = """You are a helpful assistant can pull Earning Transcript Calls from Nvidia and Microsoft from year of 2020 to answer user questions. 
User will be asking questions about the ECT files, and you can use it to find most relevant documents by narrow downn the timeframe and company name, and get the answer based on most relevant documents.
Answer in professional and concise manner. 
"""
ect_conv = Conversation()
ect_conv.add_message("system", ect_system_message)

### Chat with dataset 
Note that when ask about "whats Microsoft says about Azure in Q3 that year?"

It first use these infor to do automated data filtering ('company_names': ['Microsoft Corporation (MSFT)'], and 'start_date': '2020-07-01', 'end_date': '2020-09-30') 

Then, using the filtered data (30 out of 244 records), it trigger the RAG pipeline.  

In [145]:
# Add a user message
ect_conv.add_message("user", "whats Microsoft says about Azure in Q3 that year?")
chat_response, context_df = chat_completion_with_function_execution(
    ect_conv.conversation_history, functions=get_answer_with_filtered_df_schema
)

assistant_message = chat_response["choices"][0]["message"]["content"]
ect_conv.add_message("assistant", assistant_message)
print(f"\n\n{assistant_message}")

Function generation requested, calling function
{'query': 'Microsoft comments on Azure', 'company_names': ['Microsoft Corporation (MSFT)'], 'start_date': '2020-07-01', 'end_date': '2020-09-30'}
Filtered 30  relevant document found given the time frame: 2020-07-01, 2020-09-30 for company: ['Microsoft Corporation (MSFT)']. out of 244 docs.


Microsoft's comments on Azure highlight their focus on customer success, trust, and a multi-cloud environment. They emphasize their unique position in the market due to their long-standing relationships with enterprises and their investments in security, compliance, accreditation, and connectivity. They also highlight the success of their hybrid model, Azure Arc, which allows customers to connect their infrastructures into one cloud. This has been particularly successful in supporting a multi-cloud environment. They also mention their focus on cyber sovereignty and their global footprint as key differentiators. 

Microsoft also highlights the success

### Check the context dataframe this answer is based on

In [147]:
context_df[['company', 'call_date']].head(2)

Unnamed: 0,company,call_date
0,Microsoft Corporation (MSFT),2020-09-15 18:54:00
1,Microsoft Corporation (MSFT),2020-09-15 18:54:00


## Another Example

In [153]:
# If not asking follow up question, recommand start a new conversation. 
ect_conv = Conversation()
ect_conv.add_message("system", ect_system_message)

In [149]:
# Add a user message
ect_conv.add_message("user", "whats Nvidia says about Covid before July?")
chat_response, context_df = chat_completion_with_function_execution(
    ect_conv.conversation_history, functions=get_answer_with_filtered_df_schema
)

assistant_message = chat_response["choices"][0]["message"]["content"]
ect_conv.add_message("assistant", assistant_message)
display(Markdown(assistant_message))

Function generation requested, calling function
{'query': 'Covid impact', 'company_names': ['NVIDIA Corporation (NVDA)'], 'start_date': '2020-01-01', 'end_date': '2020-06-30'}
Filtered 53  relevant document found given the time frame: 2020-01-01, 2020-06-30 for company: ['NVIDIA Corporation (NVDA)']. out of 244 docs.


The Covid-19 pandemic has had several impacts on NVIDIA. On the supply chain side, the company's significant presence in the Asia-Pacific region allowed it to be at the forefront of the early stages of the pandemic. This helped them understand the early signals and navigate the challenges, such as the closure of retail channels which impacted their gaming business. However, they were able to quickly shift towards eTail as people began working and learning from home, leading to an increase in demand for gaming for entertainment. 

In terms of employee health, NVIDIA has been focused on understanding their suppliers and supply chain process, maintaining regular communication to understand the challenges they were facing. They also faced logistical challenges in moving supplies from place to place, but their connections with top suppliers and manufacturers aided them during this process.

On the business side, NVIDIA has been using its unique capabilities to fight the virus. Their technology has been used in various scientific endeavors related to Covid-19, such as sequencing the virus, analyzing drug candidates, imaging the virus at molecular resolution, and identifying elevated body temperature with AI cameras. They are also preparing for future outbreaks by developing an end-to-end computational defense system. 

In terms of financial impact, it's still early to determine the precise impact of the virus on their business. However, they have estimated a possible impact on both their gaming and data center businesses.

In [150]:
context_df[['company', 'call_date']].head(3)

Unnamed: 0,company,call_date
0,NVIDIA Corporation (NVDA),2020-06-01 18:23:00
1,NVIDIA Corporation (NVDA),2020-05-22 04:29:00
2,NVIDIA Corporation (NVDA),2020-05-22 04:29:00
