# Local Ask and Answer Dueling Bots with Small Language Models

### Introduction

You've probably asked a question to a language model before and then had it give you an answer. After all, this is what we most commonly use language models for.

But have you ever received a question from a language model? While not as common, this application of AI has diverse use cases in areas like education, where you might want a model to give you practice questions for a test, and in sales enablement, where you question your business's sales team about your products to improve their ability make sales.

Now, what if we had a face off⚔️ between two different models: one that asked questions about a topic and another that answered them? All without human intervention?

In this notebook, we're going to look at exactly that. We'll provide a sample passage about OpenAI's AI safety team as context to our models. We'll then let our models duel it out! One model will ask questions based on this passage, and another model will respond!

### For Google Colab users

If you are using Colab for free, we highly recommend you activate the T4 GPU hardware accelerator. Our models are designed to run with at least 16GB of RAM, activating T4 will grant the notebook 16GB of GDDR6 RAM as apposed to the ~13GB Colab gives automatically.

To activate T4:
1. click on the "Runtime" tab
2. click on "Change runtime type"
3. select T4 GPU under Hardware Accelerator

NOTE: there is a weekly usage limit on using T4 for free

### Streamlit example

We have an [interactive Streamlit program](https://github.com/llmware-ai/llmware/blob/main/examples/UI/dueling_chatbot.py) for this example in our repository. To run it, navigate to the directory the file is located in, and run `streamlit run dueling_chatbot.py` in a terminal.

### Installing and importing dependencies

In [1]:
!pip install llmware

Defaulting to user installation because normal site-packages is not writeable



[notice] A new release of pip is available: 24.0 -> 24.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
from llmware.models import ModelCatalog

### Generating questions

In this function, we will see how we can generate questions about the `source_passage` using the slim-q-gen-tiny-tool. The workflow for this function is as follows:
- Load in the the model using `ModelCatalog`
- Use `function_call()` to generate questions from the model `number_of_tries` times
- Add the generated question to the `questions` list only if it is unique and is not already in the list
- Output the `questions`.

In [3]:
def hello_world_test(source_passage, q_model="slim-q-gen-tiny-tool", number_of_tries=10, question_type="question", temperature=0.5):

    """ Shows a basic example of generating questions from a text passage, running a number of inferences,
    and then keeping only the unique questions generated.

        -- source_passage = text passage
        -- number_of_tries = integer number of times to call the model to generate a question
        -- question_type = "question" | "boolean" | "multiple choice"

    """

    #   recommend using temperature of 0.2 - 0.8 - for multiple choice, use lower end of the range
    q_model = ModelCatalog().load_model(q_model, sample=True, temperature=temperature)

    questions = []

    for x in range(0, number_of_tries):

        response = q_model.function_call(source_passage, params=[question_type], get_logits=False)

        # expect response in the form of:  "llm_response": {"question": ["generated question?"] }

        if response:
            if "llm_response" in response:
                if "question" in response["llm_response"]:
                    new_q = response["llm_response"]["question"]

                    #   keep only new questions
                    if new_q not in questions:
                        questions.append(new_q)

                print(f"inference {x} - response: {response['llm_response']}")

    print(f"\nDe-duped list of questions created\n")
    for i, question in enumerate(questions):

        print(f"new generated questions: {i} - {question}")

    return questions

### Dueling AIs

This function will allow us to generate the questions using the same process as above, but then have a different model answer those questions. The process is as follows:
- Generate the `questions` list using the same steps as the previous function.
- Load in the answer model using `ModelCatalog`
- Answer each question in `questions` using the `inference()` function of the answer model.

In [4]:
def ask_and_answer_game(source_passage, q_model="slim-q-gen-tiny-tool", number_of_tries=10, question_type="question",
                        temperature=0.5):

    """ Shows a simple two model game of using q-gen model to generate a question, and then a second model
    to answer the question generated. """

    #   this is the model that will generate the 'question'
    q_model = ModelCatalog().load_model(q_model, sample=True, temperature=temperature)

    #   this will be the model used to 'answer' the question
    answer_model = ModelCatalog().load_model("bling-phi-3-gguf")

    questions = []

    print(f"\nGenerating a set of questions automatically from the source passage.\n")

    for x in range(0,number_of_tries):

        response = q_model.function_call(source_passage, params=[question_type], get_logits=False)

        if response:
            if "llm_response" in response:
                if "question" in response["llm_response"]:
                    new_q = response["llm_response"]["question"]

                    #   only keep new questions
                    if new_q and new_q not in questions:
                        questions.append(new_q)

        print(f"inference - {x} - response: {response['llm_response']}")

    print("\nAnswering the generated questions\n")
    for i, question in enumerate(questions):

        print(f"\nquestion: {i} - {question}")
        if isinstance(question, list) and len(question) > 0:
            response = answer_model.inference(question[0], add_context=source_passage)
            print(f"response: ", response["llm_response"])

    return True

### Main block

Here, we state our source text and call both functions above.

In [5]:
#   test passage pulled from CNBC news story on Tuesday, May 28, 2024
test_passage = ("OpenAI said Tuesday it has established a new committee to make recommendations to the "
                "company’s board about safety and security, weeks after dissolving a team focused on AI safety.  "
                "In a blog post, OpenAI said the new committee would be led by CEO Sam Altman as well as "
                "Bret Taylor, the company’s board chair, and board member Nicole Seligman.  The announcement "
                "follows the high-profile exit this month of an OpenAI executive focused on safety, "
                "Jan Leike. Leike resigned from OpenAI leveling criticisms that the company had "
                "under-invested in AI safety work and that tensions with OpenAI’s leadership had "
                "reached a breaking point.")

In [6]:
hello_world_test(test_passage,q_model="slim-q-gen-tiny-tool",number_of_tries=10, question_type="question", temperature=0.5)

inference 0 - response: {'question': ['Who are the two members of the new safety and security committee?']}
inference 1 - response: {'question': ['What is the name of the new committee?']}
inference 2 - response: {'question': ['What is the name of the executive who resigned?']}
inference 3 - response: {'question': ['What is the name of the CEO of the company?']}
inference 4 - response: {'question': ['What is the name of the board chair?']}
inference 5 - response: {'question': ['Who is the chairman of OpenAI?']}
inference 6 - response: {'question': ['What is a list of the key points in the announcement?']}
inference 7 - response: {'question': ['What is a list of three key points?']}
inference 8 - response: {'question': ['What is the name of the executive who resigned?']}
inference 9 - response: {'question': ['What is the name of the executive who resigned from OpenAI?']}

De-duped list of questions created

new generated questions: 0 - ['Who are the two members of the new safety and sec

[['Who are the two members of the new safety and security committee?'],
 ['What is the name of the new committee?'],
 ['What is the name of the executive who resigned?'],
 ['What is the name of the CEO of the company?'],
 ['What is the name of the board chair?'],
 ['Who is the chairman of OpenAI?'],
 ['What is a list of the key points in the announcement?'],
 ['What is a list of three key points?'],
 ['What is the name of the executive who resigned from OpenAI?']]

We can see each question that was generated `number_of_tries` times, and then the final question list with the duplicates removed.

In [7]:
ask_and_answer_game(test_passage,q_model="slim-q-gen-phi-3-tool", number_of_tries=10, question_type="question", temperature=0.5)


Generating a set of questions automatically from the source passage.

inference - 0 - response: {'question': ['What is the name of the executive who resigned?']}
inference - 1 - response: {'question': ['When was this announcement made?']}
inference - 2 - response: {'question': ['What is the name of one of the members of the new committee?']}
inference - 3 - response: {'question': ['What is one of the names of the people who will lead the new committee?']}
inference - 4 - response: {'question': ['What is the name of the person who will lead the new committee?']}
inference - 5 - response: {'question': ['Who is leading the new advisory group?']}
inference - 6 - response: {'question': ['What is the name of the executive who resigned?']}
inference - 7 - response: {'question': ['What is the name of the person who is leading the new committee?']}
inference - 8 - response: {'question': ['What is one role of Nicole Seligman?']}
inference - 9 - response: {'question': ['What is one of the names 

True

Here, we can see each question generated, then the responses to each unique (non-duplicate) question.