# Question Answering Tutorial

In [1]:
import os
import mlrun

## Import function from hub
To import function from hub replace `function.yaml` with `hub://question-answering`

In [2]:
qa_function = mlrun.import_function("function.yaml")

## Write data file

In [3]:
os.makedirs("data", exist_ok=True)

In [4]:
%%writefile data/test-data.txt
The apple color is red.

Overwriting data/test-data.txt


## Runnig question answering

In [7]:
qa_run = qa_function.run(
        handler="answer_questions",
        params={
            "model": "gpt2",
            "text_wrapper": (
                "Given the following sentence:\n"
                "-----\n"
                "{}\n"
                "-----"
            ),
            "questions": [
                "What is the color of the apple?",
            ],
            "questions_columns": [
                "color",
            ],
            "generation_config": {
                "do_sample": True,
                "temperature": 0.8,
                "top_p": 0.9,
                "early_stopping": True,
                "max_new_tokens": 20,
            },
            "text_files_directory": "data",
        },
        returns=[
            "question_answering_df: dataset",
            "question_answering_errors: result",
        ],
        local=True,
)

> 2023-08-13 11:31:10,580 [info] Storing function: {'name': 'question-answering-answer-questions', 'uid': '13fda961ca544460a48aa8df2f1249b8', 'db': 'http://mlrun-api:8080'}
> 2023-08-13 11:31:10,696 [info] Creating prompt template
> 2023-08-13 11:31:10,696 [info] Prompt template created:

Given the following sentence:
-----
{}
-----
Answer the questions:
1. What is the color of the apple?

Answer:
1. 

> 2023-08-13 11:31:10,699 [info] Loading tokenizer: gpt2
> 2023-08-13 11:31:10,875 [info] Tokenizer loaded
> 2023-08-13 11:31:10,875 [info] Loading generation configuration
> 2023-08-13 11:31:10,876 [info] Generation configuration loaded.
> 2023-08-13 11:31:10,876 [info] Loading model 'gpt2' and tokenizer into a pipeline
> 2023-08-13 11:31:12,542 [info] Model loaded, pipeline created


Generating answers:   0%|          | 0/1 [00:00<?, ?file/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
default,...2f1249b8,0,Aug 13 11:31:10,completed,question-answering-answer-questions,v3io_user=yoniskind=owner=yonishost=jupyter-yoni-67b54c579f-fhq2n,,"model=gpt2text_wrapper=Given the following sentence:\n-----\n{}\n-----questions=['What is the color of the apple?']questions_columns=['color']generation_config={'do_sample': True, 'temperature': 0.8, 'top_p': 0.9, 'early_stopping': True, 'max_new_tokens': 20}text_files_directory=data",question_answering_errors={},question_answering_df





> 2023-08-13 11:31:13,407 [info] Run execution finished: {'status': 'completed', 'name': 'question-answering-answer-questions'}


## See the results
For better results need to use a better LLM, this notebook is just for showing how to use this function

In [8]:
qa_run.artifact("question_answering_df").as_df()

Unnamed: 0,text_file,color
