# Question Answering

## Short description and explenation

This function enables ad-hoc question answering over documents by ingesting text into a language model and returning formatted responses.<br>
It accepts:<br>

* A language model<br>
* Text files with content<br>
* Questions to answer<br>
* More inputs can be given for configuration <br>

The model processes the files to build understanding. Questions posed are then answered in one of two modes:

Default mode: <br>
The model directly answers each question using its own capabilities.

Poll mode: <br>
Additional models are included to separately answer each question. An aggregation algorithm determines the best response through consensus between models.<br>
Two options exist for consensus methodology:<br>

Average Answer: <br>
Each model's answer is scored. The response with the average highest score amongst models is selected. Useful for numeric or ranked responses.

Most Common Answer:<br> The answer that occurs most frequently across models is selected. Useful for textual responses to avoid outliers.

Using multiple models via the poll mode provides accuracy improvements for questions lacking definitive answers, as it refines responses through an ensemble process. <br>

## Background

At the core, advanced natural language processing (NLP) models called foundation models are being leveraged to read and comprehend the input text files. <br>
Specifically, models such as GPT-3 or Codex from Anthropic are used as the base language model.

These foundation models have been trained on huge datasets, allowing them to understand written content, reason about it, and generate text responsively.<br>
Underneath, they use techniques like transformer-based deep learning to build semantic representations of the text.

When documents are fed into the function, the background process invokes these models to ingest and digest the information.<br>
Sophisticated NLP encodings map the textual data into an embedded space that captures latent meanings and relationships.

This provides the knowledge base for the models to then offer informed answers tailored to any queries about the documents.<br>
The parameters controlling model size and computation time provide tradeoffs between cost, speed, and sophistication of comprehension.

Additionally, the poll option expands on a single model by sampling responses from a number of models. <br>

## Requirements

`transformers` <br>
`torch` <br>
`tqdm` <br>

## Documentation
### Explanation about the function's parameters

`data_path`:  A path to a directory of text files or a path to a text file to ask questions about. <br>

`model_name`: The pre-trained model name from the huggingface hub to use for answering questions. <br>

`questions`: The questions to ask. A list of lists of questions to ask per text file, and devided <br>
             by question groups, the groups can be determained by size (in order to <br>
             avoid large inputs to the llm) or by questioning method (regular or poll like questioning). <br>
             
`device_map`: A map to use for loading the model on multiple devices. <br>

`model_kwargs`: Keyword arguments to pass for loading the model using HuggingFace's <br>
                _transformers.AutoModelForCausalLM.from_pretrained_ function. <br>
                                           
`auto_gptq_exllama_max_input_length`: For AutoGPTQ models to set and extend the model's input buffer size. <br>

`tokenizer_name`: The tokenizer name from the huggingface hub to use. If not given, the given model name will be used. <br>
                                           
`tokenizer_kwargs`: Keyword arguments to pass for loading the tokenizer using HuggingFace's <br>
                    _transformers.AutoTokenizer.from_pretrained_ function. <br>
                                           
`text_wrapper`: Must have a placeholder ('{}') for the text of the file. <br>

`questions_wrapper`: A wrapper for the questions received. Will be added after the text wrapper in the prompt template. <br>
                     Must have a placeholder ('{}') for the questions. <br>
                                           
`generation_config`: HuggingFace's _GenerationConfig_ keyword arguments to pass to the _generate_ method. <br>
                                           
`questions_config`: A dictionary or list of dictionaries containing specific ways to answer questions (using a poll for example), <br>
                    each dictionary in the list is for corresponding question group and determines the question asking method <br>
                    for said group. <br>
                                           
`batch_size`: Batch size for inference. <br>

`questions_columns`: Columns to use for the dataframe returned. <br>

`verbose`: Whether to present logs of a progress bar and errors. Default: True. <br>


## Demo 1

This is a short and simple example to show the basic use of the function.

First we import everything we need.

In [None]:
import mlrun
import transformers
import tempfile

We create a text file that the model can be asked about

In [None]:
def _make_data_dir_for_test():
    data_dir = tempfile.mkdtemp()
    content = "The apple is red."
    with open(data_dir + "/test_data.txt", "w") as f:
        f.write(content)
    return data_dir

Then we set where to take the path to the text file we want to ask about, the questions, and column name for the answer table.

In [None]:
input_path = _make_data_dir_for_test()
question = ["What is the color of the apple?"]
column_name = ["color"]

Now we will create the mlrun project, in this project we will set the function to run.

In [None]:
project = mlrun.get_or_create_project(
    name="call-center-demo-1",
    context="./",
    user_project=True,
    parameters={
        "default_image": "yonishelach/mlrun-llm",
    })

In [None]:
func = project.set_function(
    "question-answering.py",
    name="question-answering",
    kind="job",
    image="yonishelach/mlrun-llm",
    handler="answer_questions",
)
# project.save()

Now we run the function with all the parameters we prepered earlier

In [None]:
demo1_run = func.run(
    handler="answer_questions",
    params={
        "model": "distilgpt2",
        "input_path": input_path,
        "questions": question,
        "questions_columns": column_name,
        "generation_config": {
            "do_sample": True,
            "temperature": 0.8,
            "top_p": 0.9,
            "early_stopping": True,
            "max_new_tokens": 20,
        },
    },
    returns=[
        "question_answering_df: dataset",
        "question_answering_errors: result",
    ],
    local=True,
    artifact_path="./"
)

and after the run is finished we can take a look and see our answer

In [None]:
demo1_run.outputs

## Demo 2

This is a much larger example, we will show how we use this function to analyze a number of calls between agents and customer of a internet company (all the data is generated by Iguazio). <br>
For something like this, we recomend using a strong model, and putting some time into making the prompts.

We start with imports

In [1]:
import os
import mlrun
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline



This example is a bit more complicated as we mentioned, we give the model a list of questions, for some of them we give the model a list of answers to choose from (e.g. TOPICS, TONES).

In [2]:
TOPICS = [
    "slow internet speed",
    "billing discrepancies",
    "account login problems",
    "setting up a new device",
    "phishing or malware concerns",
    "scheduled maintenance notifications",
    "service upgrades",
    "negotiating pricing",
    "canceling service",
    "customer service feedback",
]

TONES = [
    "Positive",
    "Neutral",
    "Negative",
]

QUESTIONS = [
    [
    f"1. Classify the topic of the text from the following list (choose one): {TOPICS}",
    "2. Write a long summary of the text, focus on the topic (max 50 words).",
    "3. Was the Client's concern addressed, (choose only one) [Yes, No]?",
    f"4. Was the Client tone (choose only one, if not sure choose Neutral) {TONES}? ",
    f"5. Was the Call Center Agent tone (choose only one, if not sure choose Neutral) {TONES}?",
    ],
    [
    "1. Did the agent try to upsale the customer (choose only one) [Yes, No]? (sell any additional product or service)",
    "2. If the agent indeed try to upsale the client, did he succeed (choose only one) [Yes, No]? if he didn't try' answer No",
    "3. Rate the agent's level of empathy (The ability to understand and share the feelings of others) on a scale of 1-5.",
    "4. Rate the agent's level of professionalism (Conducting oneself in a way that is appropriate for the workplace) on a scale of 1-5.",
    "5. Rate the agent's level of kindness (The quality of being friendly, generous, and considerate) on a scale of 1-5.",
    "6. Rate the agent's level of effective communication (The ability to convey information clearly and concisely) on a scale of 1-5.",
    "7. Rate the agent's level of active listening (The process of paying attention to and understanding what someone is saying) on a scale of 1-5.",
    "8. Rate the agent's level of customization (The process of tailoring something to the specific needs or preferences of an individual) on a scale of 1-5.",
    ]
]

qa_questions_columns = [
                        "Issue",
                        "Summary",
                        "is_fixed",
                        "customer_tone",
                        "agent_tone",
                        "upsale_attemted",
                        "upsale_success",
                        "empathy",
                        "professionalism",
                        "kindness",
                        "effective_communication",
                        "active_listening",
                        "customization",
                        ]

Another thing we give the model this time is answer examples (one/few shot answering), this can be done to show the model how you want the answer to be structured or caculated. <br>
So for every file we ask about, the model will be presented with this example of a call and how we want the answers.

In [None]:
DEMO_CALL = (
    "Agent: Good afternoon, you've reached [Internet Service Provider] customer support. I'm Megan. How can I assist "
    "you today?\n"
    "Customer: Hello, Megan. This is Lisa. I've noticed some billing discrepancies on my last statement.\n"
    "Agent: I'm sorry to hear that, Lisa. I'd be happy to help you with that. Could you please provide me with your "
    "account number or phone number associated with your account?\n"
    "Customer: Of course, my account number is 123456789.\n"
    "Agent: Thank you, Lisa. Let me pull up your account. I see the billing discrepancies you mentioned. It appears "
    "there was an error in the charges. I apologize for the inconvenience.\n"
    "Customer: Thank you for acknowledging the issue, Megan. Can you please help me get it resolved?\n"
    "Agent: Absolutely, Lisa. I've made note of the discrepancies, and I'll escalate this to our billing department "
    "for investigation and correction. You should see the adjustments on your next statement.\n"
    "Customer: That sounds good, Megan. I appreciate your help.\n"
    "Agent: You're welcome, Lisa. If you have any more questions or concerns in the future, please don't hesitate to "
    "reach out. Is there anything else I can assist you with today?\n"
    "Customer: No, that's all. Thank you for your assistance, Megan.\n"
    "Agent: Not a problem, Lisa. Have a wonderful day, and we'll get this sorted out for you.\n"
    "Customer: You too! Goodbye, Megan.\n"
    "Agent: Goodbye, Lisa!"
)

DEMO_ANSWERS = [(
    "1. billing discrepancies\n"
    "2. The customer, contacted the call center regarding billing discrepancies on her statement. The agent, "
    "acknowledged the issue, assured The customer it would be resolved, and escalated it to the billing department for "
    "correction.\n"
    "3. Yes.\n"
    "4. Natural.\n"
    "5. positive.\n"),
    # "6. No\n"
    # "7. No\n"),
    (
    "1. No\n"
    "2. No\n"
    "3. 4\n"
    "4. 5\n"
    "5. 4\n"
    "6. 5\n"
    "7. 4\n"
    "8. 3"
)]

Then we need to wrap it all nicely to be given to the model as a single prompt, this is done with a text wrapper, and a question wrapper. <br>
both of them will be concatenated inside the function with the questions and passed to the model.

In [None]:
TEXT_WRAPPER = [(
    f"<|im_start|>system: You are an AI assistant that answers questions accurately and shortly<|im_end|>\n"
    f"<|im_start|>user: Given the following text:\n"
    f"{DEMO_CALL}\n"
    f"answer the questions as accurately as you can:\n"
    f"{QUESTIONS[i]}<|im_end|>\n"
    f"<|im_start|>assistant:\n"
    f"{DEMO_ANSWERS[i]}<|im_end|>\n"
    f"<|im_start|>user: Given the following text:\n"
    "{}"
) for i in range(len(QUESTIONS))]
QUESTIONS_WRAPPER = (
    " answer the given questions as accurately as you can, do not write more answers the questions:\n"
    "{}<|im_end|>\n"
    "<|im_start|>assistant:\n"
)

Another interesting thing we do in this example is use two different questioning methods, the first is the "default" way (represented by a black dictionary). <br>
The second part of the config is for the second questioning method, we cal "poll", and in which we need to choose how many voting models we want participating,<br>
and in what way we want do decide the result, we currentlly support `average` and `most_common` as show here.<br>


*An explenation about both methods can be found in the begginig of this notebook

In [3]:
questions_config = [
    {},
    {
        "type": "poll",
        "poll_count": 3,
        "poll_strategy": "most_common"
    }]

Before we run the function using mlrun, we need to set a project, and our function in it.

In [4]:
project = mlrun.get_or_create_project(
    name="call-center-demo-2",
    context="./",
    user_project=True,
    parameters={
        "default_image": "yonishelach/mlrun-llm",
    })


> 2023-12-18 10:18:51,651 [info] Project loaded successfully: {'project_name': 'call-center-demo-zeev55'}


In [19]:
func = project.set_function(
    "question-answering.py",
    name="question-answering",
    kind="job",
    image="yonishelach/mlrun-llm",
    handler="answer_questions",
)
project.save()

<mlrun.projects.project.MlrunProject at 0x7f8bc5b0a370>

The last few parameters we need to set are the model we will use, the input lenth (no available for all models) and the batch size. <br>
The batch size determains how many files we want procced at each epoch, and the larger we go the faster the proccess will be, as long as our memor

In [None]:
qa_model = "TheBloke/Mistral-7B-OpenOrca-GPTQ"
auto_gptq_exllama_max_input_length = 4096
batch_size = 1

Finnaly, we run the function with all the parameters we prepared. 

In [20]:
# Question answering:
run = func.run(
    function="question-answering",
    local=True,
    handler="answer_questions",
    inputs={"data_path": os.path.abspath("./calls")},
    params={
        "model_name": qa_model,
        "device_map": "auto",
        "text_wrapper":TEXT_WRAPPER,
        "questions": QUESTIONS,
        "questions_wrapper": QUESTIONS_WRAPPER,
        "questions_columns": qa_questions_columns,
        "generation_config": {
            "max_new_tokens": 250,
            "do_sample": True,
            "temperature":0.7,
            "top_p": 0.95,
            "top_k": 40,
            "repetition_penalty": 1.1
        },
        "questions_config": questions_config,
        "batch_size": batch_size,
        "auto_gptq_exllama_max_input_length": auto_gptq_exllama_max_input_length,
    },
    returns=[
        "question_answering_df: dataset",
        "question_answering_errors: result",
    ],
)


> 2023-12-18 14:33:45,366 [info] Storing function: {'name': 'question-answering-answer-questions', 'uid': '0195a27963d94fb9805084e44b331162', 'db': 'http://mlrun-api:8080'}


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:32000 for open-end generation.
You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
Setting `pad_token_id` to `eos_token_id`:32000 for op

project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
call-center-demo-zeev55-zeevr,...4b331162,0,Dec 18 14:33:45,completed,question-answering-answer-questions,v3io_user=zeevrkind=localowner=zeevrhost=jupyter-zeev-gpu-5995df47dc-2ps5g,data_path,"model_name=TheBloke/Mistral-7B-OpenOrca-GPTQmodel_kwargs={}device_map=autotext_wrapper=['<|im_start|>system: You are an AI assistant that answers questions accurately and shortly<|im_end|>\n<|im_start|>user: Given the following text:\nAgent: Good afternoon, you\'ve reached [Internet Service Provider] customer support. I\'m Megan. How can I assist you today?\nCustomer: Hello, Megan. This is Lisa. I\'ve noticed some billing discrepancies on my last statement.\nAgent: I\'m sorry to hear that, Lisa. I\'d be happy to help you with that. Could you please provide me with your account number or phone number associated with your account?\nCustomer: Of course, my account number is 123456789.\nAgent: Thank you, Lisa. Let me pull up your account. I see the billing discrepancies you mentioned. It appears there was an error in the charges. I apologize for the inconvenience.\nCustomer: Thank you for acknowledging the issue, Megan. Can you please help me get it resolved?\nAgent: Absolutely, Lisa. I\'ve made note of the discrepancies, and I\'ll escalate this to our billing department for investigation and correction. You should see the adjustments on your next statement.\nCustomer: That sounds good, Megan. I appreciate your help.\nAgent: You\'re welcome, Lisa. If you have any more questions or concerns in the future, please don\'t hesitate to reach out. Is there anything else I can assist you with today?\nCustomer: No, that\'s all. Thank you for your assistance, Megan.\nAgent: Not a problem, Lisa. Have a wonderful day, and we\'ll get this sorted out for you.\nCustomer: You too! Goodbye, Megan.\nAgent: Goodbye, Lisa!\nanswer the questions as accurately as you can:\n[""1. Classify the topic of the text from the following list (choose one): [\'slow internet speed\', \'billing discrepancies\', \'account login problems\', \'setting up a new device\', \'phishing or malware concerns\', \'scheduled maintenance notifications\', \'service upgrades\', \'negotiating pricing\', \'canceling service\', \'customer service feedback\']"", \'2. Write a long summary of the text, focus on the topic (max 50 words).\', ""3. Was the Client\'s concern addressed, (choose only one) [Yes, No]?"", ""4. Was the Client tone (choose only one, if not sure choose Neutral) [\'Positive\', \'Neutral\', \'Negative\']? "", ""5. Was the Call Center Agent tone (choose only one, if not sure choose Neutral) [\'Positive\', \'Neutral\', \'Negative\']?""]<|im_end|>\n<|im_start|>assistant:\n1. billing discrepancies\n2. The customer, contacted the call center regarding billing discrepancies on her statement. The agent, acknowledged the issue, assured The customer it would be resolved, and escalated it to the billing department for correction.\n3. Yes.\n4. Natural.\n5. positive.\n<|im_end|>\n<|im_start|>user: Given the following text:\n{}', '<|im_start|>system: You are an AI assistant that answers questions accurately and shortly<|im_end|>\n<|im_start|>user: Given the following text:\nAgent: Good afternoon, you\'ve reached [Internet Service Provider] customer support. I\'m Megan. How can I assist you today?\nCustomer: Hello, Megan. This is Lisa. I\'ve noticed some billing discrepancies on my last statement.\nAgent: I\'m sorry to hear that, Lisa. I\'d be happy to help you with that. Could you please provide me with your account number or phone number associated with your account?\nCustomer: Of course, my account number is 123456789.\nAgent: Thank you, Lisa. Let me pull up your account. I see the billing discrepancies you mentioned. It appears there was an error in the charges. I apologize for the inconvenience.\nCustomer: Thank you for acknowledging the issue, Megan. Can you please help me get it resolved?\nAgent: Absolutely, Lisa. I\'ve made note of the discrepancies, and I\'ll escalate this to our billing department for investigation and correction. You should see the adjustments on your next statement.\nCustomer: That sounds good, Megan. I appreciate your help.\nAgent: You\'re welcome, Lisa. If you have any more questions or concerns in the future, please don\'t hesitate to reach out. Is there anything else I can assist you with today?\nCustomer: No, that\'s all. Thank you for your assistance, Megan.\nAgent: Not a problem, Lisa. Have a wonderful day, and we\'ll get this sorted out for you.\nCustomer: You too! Goodbye, Megan.\nAgent: Goodbye, Lisa!\nanswer the questions as accurately as you can:\n[\'1. Did the agent try to upsale the customer (choose only one) [Yes, No]? (sell any additional product or service)\', ""2. If the agent indeed try to upsale the client, did he succeed (choose only one) [Yes, No]? if he didn\'t try\' answer No"", ""3. Rate the agent\'s level of empathy (The ability to understand and share the feelings of others) on a scale of 1-5."", ""4. Rate the agent\'s level of professionalism (Conducting oneself in a way that is appropriate for the workplace) on a scale of 1-5."", ""5. Rate the agent\'s level of kindness (The quality of being friendly, generous, and considerate) on a scale of 1-5."", ""6. Rate the agent\'s level of effective communication (The ability to convey information clearly and concisely) on a scale of 1-5."", ""7. Rate the agent\'s level of active listening (The process of paying attention to and understanding what someone is saying) on a scale of 1-5."", ""8. Rate the agent\'s level of customization (The process of tailoring something to the specific needs or preferences of an individual) on a scale of 1-5.""]<|im_end|>\n<|im_start|>assistant:\n1. No\n2. No\n3. 4\n4. 5\n5. 4\n6. 5\n7. 4\n8. 3<|im_end|>\n<|im_start|>user: Given the following text:\n{}']questions=[[""1. Classify the topic of the text from the following list (choose one): ['slow internet speed', 'billing discrepancies', 'account login problems', 'setting up a new device', 'phishing or malware concerns', 'scheduled maintenance notifications', 'service upgrades', 'negotiating pricing', 'canceling service', 'customer service feedback']"", '2. Write a long summary of the text, focus on the topic (max 50 words).', ""3. Was the Client's concern addressed, (choose only one) [Yes, No]?"", ""4. Was the Client tone (choose only one, if not sure choose Neutral) ['Positive', 'Neutral', 'Negative']? "", ""5. Was the Call Center Agent tone (choose only one, if not sure choose Neutral) ['Positive', 'Neutral', 'Negative']?""], ['1. Did the agent try to upsale the customer (choose only one) [Yes, No]? (sell any additional product or service)', ""2. If the agent indeed try to upsale the client, did he succeed (choose only one) [Yes, No]? if he didn't try' answer No"", ""3. Rate the agent's level of empathy (The ability to understand and share the feelings of others) on a scale of 1-5."", ""4. Rate the agent's level of professionalism (Conducting oneself in a way that is appropriate for the workplace) on a scale of 1-5."", ""5. Rate the agent's level of kindness (The quality of being friendly, generous, and considerate) on a scale of 1-5."", ""6. Rate the agent's level of effective communication (The ability to convey information clearly and concisely) on a scale of 1-5."", ""7. Rate the agent's level of active listening (The process of paying attention to and understanding what someone is saying) on a scale of 1-5."", ""8. Rate the agent's level of customization (The process of tailoring something to the specific needs or preferences of an individual) on a scale of 1-5.""]]questions_wrapper= answer the given questions as accurately as you can, do not write more answers the questions:\n{}<|im_end|>\n<|im_start|>assistant:\nquestions_columns=['Issue', 'Summary', 'is_fixed', 'customer_tone', 'agent_tone', 'upsale_attemted', 'upsale_success', 'empathy', 'professionalism', 'kindness', 'effective_communication', 'active_listening', 'customization']generation_config={'max_new_tokens': 250, 'do_sample': True, 'temperature': 0.7, 'top_p': 0.95, 'top_k': 40, 'repetition_penalty': 1.1}questions_config=[{}, {'poll_count': 3, 'poll_strategy': 'most_common'}]batch_size=1auto_gptq_exllama_max_input_length=4096","question_answering_errors={'.ipynb_checkpoints': ""[Errno 21] Is a directory: '/User/call-center/demo-call-center/calls/.ipynb_checkpoints'""}",question_answering_df





> 2023-12-18 14:43:08,462 [info] Run execution finished: {'status': 'completed', 'name': 'question-answering-answer-questions'}


In [77]:
run.outputs

{'question_answering_errors': {'10ca54de28508a7c62453f618e55fdd3.txt, 1765b93c42a9343ed9a1da8f93d411e3.txt': 'list.extend() takes exactly one argument (2 given)',
  '2df61e0a2814c75b50e2c040996f5a61.txt, 339fcd0663e93bc6530dbf4987e1d4bc.txt': 'list.extend() takes exactly one argument (2 given)'},
 'question_answering_df': 'store://artifacts/call-center-demo-zeev55-zeevr/question-answering-answer-questions_question_answering_df:f6081a7ff60a4a90bdc84e1f7dfb0d4d'}