## Exploration of the Question Answering

### The Q and A Tutorial from HF

Question answering tasks return an answer given a question. If you’ve ever asked a virtual assistant like Alexa, Siri or Google what the weather is, then you’ve used a question answering model before. There are two common types of question answering tasks:
1. Extractive: extract the answer from the given context.
2. Abstractive: generate an answer from the context that correctly answers the question.

Lets use a bigger model like the [Bloom 3B](https://huggingface.co/bigscience/bloom-3b) and fine tune it as per the guide [here](https://huggingface.co/docs/transformers/tasks/question_answering).


In [1]:
!pip install transformers datasets evaluate

Collecting evaluate
  Downloading evaluate-0.4.1-py3-none-any.whl.metadata (9.4 kB)
Collecting responses<0.19 (from evaluate)
  Downloading responses-0.18.0-py3-none-any.whl (38 kB)
Downloading evaluate-0.4.1-py3-none-any.whl (84 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m1.1 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: responses, evaluate
Successfully installed evaluate-0.4.1 responses-0.18.0


In [3]:
!pip install ipywidgets

Collecting ipywidgets
  Downloading ipywidgets-8.1.1-py3-none-any.whl.metadata (2.4 kB)
Collecting widgetsnbextension~=4.0.9 (from ipywidgets)
  Downloading widgetsnbextension-4.0.9-py3-none-any.whl.metadata (1.6 kB)
Collecting jupyterlab-widgets~=3.0.9 (from ipywidgets)
  Downloading jupyterlab_widgets-3.0.9-py3-none-any.whl.metadata (4.1 kB)
Downloading ipywidgets-8.1.1-py3-none-any.whl (139 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.4/139.4 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hDownloading jupyterlab_widgets-3.0.9-py3-none-any.whl (214 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m214.9/214.9 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hDownloading widgetsnbextension-4.0.9-py3-none-any.whl (2.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m23.1 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: widgetsnbexte

> ⚠️ IMPORTANT ⚠️: must login via terminal: huggingface-cli login --token=$hf (take the hf token from env)
> The env is configured when the docker image was built via the ENV directive

In [1]:
from datasets import load_dataset

squad = load_dataset("squad")

Downloading data:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/1.82M [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

Generating validation split: 0 examples [00:00, ? examples/s]

In [3]:
len(squad["train"]), squad["train"][3422]

(87599,
 {'id': '56d5202a2593cc1400307a96',
  'title': '2008_Sichuan_earthquake',
  'context': "China Mobile had more than 2,300 base stations suspended due to power disruption or severe telecommunication traffic congestion. Half of the wireless communications were lost in the Sichuan province. China Unicom's service in Wenchuan and four nearby counties was cut off, with more than 700 towers suspended.",
  'question': 'Whose service in Wenchuan was cut off?',
  'answers': {'text': ['China Unicom'], 'answer_start': [200]}})

### Preprocess the Data with the Bloom Tokenizer

In [3]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-3b")

In [4]:
tokenizer.is_fast

True

Explore how to find beginning and end answer positions.. It seems we already have the beginning answer position. 

But the idea is to find the indices of the start and end tokens of the answer in the tokenized context. If they exist. Otherwise we set the label to (0,0) which is the [CLS] token.

In [3]:
train_set = squad["train"]
questions = [q.strip() for q in train_set["question"]]
inputs = tokenizer(
        questions,
        train_set["context"],
        max_length=512,
        truncation="only_second",
        return_offsets_mapping=True,
        padding="max_length",
    )

# TEST WITH SMALL SET FIRST!
# train_set = squad["train"][:1000]
# questions = [q.strip() for q in train_set["question"]]
# inputs = tokenizer(
#         questions,
#         train_set["context"],
#         max_length=512,
#         truncation="only_second",
#         return_offsets_mapping=True,
#         padding="max_length",
#     )

In [39]:
inputs.keys()

dict_keys(['input_ids', 'attention_mask', 'offset_mapping'])

In [9]:
offset_mapping = inputs.pop("offset_mapping")

In [10]:
answers = train_set["answers"]

In [25]:
answers[555], questions[555]

({'text': ['2013 Met Gala'], 'answer_start': [399]},
 'Of what event was Beyonce honorary chair?')

In [17]:
offset, answer, sequence_ids = offset_mapping[555], answers[555], inputs.sequence_ids(555)

In [18]:
idx = 0
while sequence_ids[idx] != 1:
    idx += 1
context_start = idx

while sequence_ids[idx] == 1 and idx < len(sequence_ids) - 1:
    idx += 1

context_end = idx
# this is the index of the tokens that the context starts and ends.
context_start, context_end

(371, 511)

In [22]:
start_positions = []
end_positions = []
start_char = answer["answer_start"][0]
end_char = start_char + len(answer["text"][0])
start_char, end_char

(399, 412)

In [24]:
if offset[context_start][0] > end_char or offset[context_end][1] < start_char:
    start_positions.append(0)
    end_positions.append(0)
else:
    # Otherwise it's the start and end token positions
    idx = context_start
    while idx <= context_end and offset[idx][0] <= start_char:
        idx += 1
    start_positions.append(idx - 1)
    idx = context_end
    while idx >= context_start and offset[idx][1] >= end_char:
        idx -= 1
    end_positions.append(idx + 1)

start_positions, end_positions

([463, 463], [465, 465])

In [33]:
# Test to see if we have found the correct positions for answer 555 from above
tokenizer.decode(inputs["input_ids"][555][463:465+1]).strip()

'2013 Met Gala'

#### Putting the Preprocessing Function Together

NOTE: this does not provide overflowing_tokens and sample mapping.
More in Chapter 6 in the NLP guide
In this case the context is not too long, but some of the examples in the dataset have very long contexts that will exceed the maximum length we set (which is 384 in this case). As we saw in Chapter 6 when we explored the internals of the question-answering pipeline, we will deal with long contexts by creating several training features from one sample of our dataset, with a sliding window between them.

In [2]:
def preprocess_function(examples):
    questions = [q.strip() for q in examples["question"]]
    inputs = tokenizer(
        questions,
        examples["context"],
        max_length=512,
        truncation="only_second",
        return_offsets_mapping=True,
        padding="max_length",
    )

    offset_mapping = inputs.pop("offset_mapping")
    answers = examples["answers"]
    start_positions = []
    end_positions = []

    for i, offset in enumerate(offset_mapping):
        answer = answers[i]
        start_char = answer["answer_start"][0]
        end_char = answer["answer_start"][0] + len(answer["text"][0])
        sequence_ids = inputs.sequence_ids(i)

        # Find the start and end of the context
        idx = 0
        while sequence_ids[idx] != 1:
            idx += 1
        context_start = idx
        while sequence_ids[idx] == 1 and idx < len(sequence_ids) - 1:
            idx += 1
        context_end = idx

        # If the answer is not fully inside the context, label it (0, 0)
        if offset[context_start][0] > end_char or offset[context_end][1] < start_char:
            start_positions.append(0)
            end_positions.append(0)
        else:
            # Otherwise it's the start and end token positions
            idx = context_start
            while idx <= context_end and offset[idx][0] <= start_char:
                idx += 1
            start_positions.append(idx - 1)

            idx = context_end
            while idx >= context_start and offset[idx][1] >= end_char:
                idx -= 1
            end_positions.append(idx + 1)

    inputs["start_positions"] = start_positions
    inputs["end_positions"] = end_positions
    return inputs

In [6]:
from transformers import DefaultDataCollator

data_collator = DefaultDataCollator()

In [9]:
from transformers import AutoModelForQuestionAnswering, TrainingArguments, Trainer

model = AutoModelForQuestionAnswering.from_pretrained("bigscience/bloom-3b", load_in_8bit=True)

Some weights of BloomForQuestionAnswering were not initialized from the model checkpoint at bigscience/bloom-3b and are newly initialized: ['qa_outputs.weight', 'qa_outputs.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [8]:
tokenized_squad = squad.map(preprocess_function, batched=True, remove_columns=squad["train"].column_names)

Map: 100%|██████████| 87599/87599 [00:08<00:00, 10119.04 examples/s]


In [15]:
from peft import LoraConfig

peft_config = LoraConfig(
    lora_alpha=16,
    lora_dropout=0.1,
    r=64,
    bias="none",
    task_type="CAUSAL_LM",
)

In [16]:
model.add_adapter(peft_config)

ValueError: Adapter with name default already exists. Please use a different name.

In [17]:
training_args = TrainingArguments(
    output_dir="./data/testing_fine_tune_qa",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    push_to_hub=True,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_squad["train"],
    eval_dataset=tokenized_squad["validation"],
    tokenizer=tokenizer,
    data_collator=data_collator,
)

In [18]:
trainer.train()

OutOfMemoryError: CUDA out of memory. Tried to allocate 288.00 MiB. GPU 0 has a total capacty of 23.65 GiB of which 61.88 MiB is free. Process 227035 has 69.91 MiB memory in use. Process 228647 has 22.71 GiB memory in use. Of the allocated memory 21.61 GiB is allocated by PyTorch, and 651.24 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Out of Memory. I will need to try the trl and quantinisation with bits and bytes next

### Trying the Qlora with BitsAndBytes

original notebook [here](https://github.com/curiousily/Get-Things-Done-with-Prompt-Engineering-and-LangChain/blob/master/14.fine-tuning-llama-2-7b-on-custom-dataset.ipynb). Video explanation [here](https://www.youtube.com/watch?v=MDA3LUKNl1E).

In [2]:
!pip install trl

Collecting trl
  Downloading trl-0.7.6-py3-none-any.whl.metadata (10 kB)
Collecting tyro>=0.5.11 (from trl)
  Downloading tyro-0.6.1-py3-none-any.whl.metadata (7.7 kB)
Collecting docstring-parser>=0.14.1 (from tyro>=0.5.11->trl)
  Downloading docstring_parser-0.15-py3-none-any.whl (36 kB)
Collecting rich>=11.1.0 (from tyro>=0.5.11->trl)
  Downloading rich-13.7.0-py3-none-any.whl.metadata (18 kB)
Collecting shtab>=1.5.6 (from tyro>=0.5.11->trl)
  Downloading shtab-1.6.5-py3-none-any.whl.metadata (7.3 kB)
Collecting markdown-it-py>=2.2.0 (from rich>=11.1.0->tyro>=0.5.11->trl)
  Downloading markdown_it_py-3.0.0-py3-none-any.whl.metadata (6.9 kB)
Collecting mdurl~=0.1 (from markdown-it-py>=2.2.0->rich>=11.1.0->tyro>=0.5.11->trl)
  Downloading mdurl-0.1.2-py3-none-any.whl (10.0 kB)
Downloading trl-0.7.6-py3-none-any.whl (139 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m139.3/139.3 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m [36m0:00:01[0m
[?25hDownloading tyro-0.6.

In [8]:
import pandas as pd
import torch
from transformers import (
    AutoModelForQuestionAnswering,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    Trainer
)

from datasets import load_dataset

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)
from peft import LoraConfig, get_peft_model



In [9]:
model = AutoModelForQuestionAnswering.from_pretrained("bigscience/bloom-3b", 
                                                        quantization_config=bnb_config,
                                                        device_map="auto")


Some weights of BloomForQuestionAnswering were not initialized from the model checkpoint at bigscience/bloom-3b and are newly initialized: ['qa_outputs.weight', 'qa_outputs.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [10]:
model

BloomForQuestionAnswering(
  (transformer): BloomModel(
    (word_embeddings): Embedding(250880, 2560)
    (word_embeddings_layernorm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
    (h): ModuleList(
      (0-29): 30 x BloomBlock(
        (input_layernorm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
        (self_attention): BloomAttention(
          (query_key_value): Linear4bit(in_features=2560, out_features=7680, bias=True)
          (dense): Linear4bit(in_features=2560, out_features=2560, bias=True)
          (attention_dropout): Dropout(p=0.0, inplace=False)
        )
        (post_attention_layernorm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
        (mlp): BloomMLP(
          (dense_h_to_4h): Linear4bit(in_features=2560, out_features=10240, bias=True)
          (gelu_impl): BloomGelu()
          (dense_4h_to_h): Linear4bit(in_features=10240, out_features=2560, bias=True)
        )
      )
    )
    (ln_f): LayerNorm((2560,), eps=1e-05, eleme

In [13]:
peft_config = LoraConfig(
    r=16,
    lora_alpha=16,
    lora_dropout=0.1,
    bias="none",
    task_type="QUESTION_ANS",
    target_modules=["query_key_value"],
)
peft_model = get_peft_model(model, peft_config)

In [14]:
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-3b")

In [7]:
# squad = load_dataset("squad")

Test with subset of the squad to see if it will fit in memory

In [15]:
squad = load_dataset("squad", split="train[:10000]").shuffle()
squad = squad.train_test_split(test_size=0.2)

In [12]:
squad

DatasetDict({
    train: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 8000
    })
    test: Dataset({
        features: ['id', 'title', 'context', 'question', 'answers'],
        num_rows: 2000
    })
})

In [16]:
from transformers import DefaultDataCollator

data_collator = DefaultDataCollator()

In [17]:
tokenized_squad = squad.map(preprocess_function, batched=True, remove_columns=squad["train"].column_names)

Map:   0%|          | 0/8000 [00:00<?, ? examples/s]

Map:   0%|          | 0/2000 [00:00<?, ? examples/s]

In [9]:
tokenized_squad["train"].column_names

['input_ids', 'attention_mask', 'start_positions', 'end_positions']

In [18]:
tokenizer.decode(tokenized_squad["train"]["input_ids"][188], skip_special_tokens=True), 

('What should the government of China be responsible for providing to earthquake survivors?Experts point out that the earthquake hit an area that has been largely neglected and untouched by China\'s economic rise. Health care is poor in inland areas such as Sichuan, highlighting the widening gap between prosperous urban dwellers and struggling rural people. Vice Minister of Health Gao Qiang told reporters in Beijing that the "public health care system in China is insufficient." The Vice Minister of Health also suggested that the government would pick up the costs of care to earthquake victims, many of whom have little or no insurance: "The government should be responsible for providing medical treatment to them," he said.',)

In [19]:
entry = 188
start = tokenized_squad["train"]["start_positions"][entry]
end = tokenized_squad["train"]["end_positions"][entry]
tokenizer.decode(tokenized_squad["train"]["input_ids"][entry][start:end+1]).strip(), squad["train"]["answers"][entry]["text"]

('medical treatment', ['medical treatment'])

In [20]:
model.config.quantization_config.to_dict()

{'quant_method': <QuantizationMethod.BITS_AND_BYTES: 'bitsandbytes'>,
 'load_in_8bit': False,
 'load_in_4bit': True,
 'llm_int8_threshold': 6.0,
 'llm_int8_skip_modules': None,
 'llm_int8_enable_fp32_cpu_offload': False,
 'llm_int8_has_fp16_weight': False,
 'bnb_4bit_quant_type': 'nf4',
 'bnb_4bit_use_double_quant': False,
 'bnb_4bit_compute_dtype': 'float16'}

In [21]:

# training_arguments = TrainingArguments(
#     per_device_train_batch_size=4,
#     gradient_accumulation_steps=4,    
#     optim="adamw_hf",
#     logging_steps=1,
#     learning_rate=2e-5,
#     num_train_epochs=1,
#     eval_steps=0.1,
#     evaluation_strategy="steps",
#     save_strategy="epoch",
#     output_dir="./data/testing_fine_tune_qa",
#     weight_decay=0.01,
#     save_safetensors=True,
#     push_to_hub=True,
#     fp16=False
# )
training_arguments = TrainingArguments(
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    gradient_accumulation_steps=4,
    optim="paged_adamw_32bit",
    logging_steps=1,
    learning_rate=1e-4,
    max_grad_norm=0.3,
    num_train_epochs=2,
    evaluation_strategy="steps",
    eval_steps=0.2,
    save_strategy="epoch",
    group_by_length=True,
    output_dir="./data/testing_fine_tune_qa",
    weight_decay=0.01,
    save_safetensors=True,
    push_to_hub=True,
    fp16=False
)


In [22]:
trainer = Trainer(
    model=peft_model,
    args=training_arguments,
    train_dataset=tokenized_squad["train"],
    eval_dataset=tokenized_squad["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
)


# trainer = SFTTrainer(
#     model=model,
#     train_dataset=tokenized_squad["train"],
#     eval_dataset=tokenized_squad["validation"],
#     peft_config=peft_config,
#     dataset_text_field="input_ids",
#     max_seq_length=4096,
#     tokenizer=tokenizer,
#     args=training_arguments,
# )
     


In [23]:
trainer.train()



Step,Training Loss,Validation Loss
200,1.9148,1.660156
400,1.6726,1.350586
600,1.0625,1.238281
800,0.8001,1.188477
1000,0.3615,1.170898


Checkpoint destination directory ./data/testing_fine_tune_qa/checkpoint-500 already exists and is non-empty.Saving will proceed but saved results may be invalid.


TrainOutput(global_step=1000, training_loss=2.0134426651000976, metrics={'train_runtime': 1844.4274, 'train_samples_per_second': 8.675, 'train_steps_per_second': 0.542, 'total_flos': 1.16255789088768e+17, 'train_loss': 2.0134426651000976, 'epoch': 2.0})

In [33]:
trainer.save_model("./data/saved-test")

In [23]:
import gc

del model
del peft_model
del tokenizer
del tokenized_squad
del squad
torch.cuda.empty_cache()
gc.collect()

3164

In [34]:
from transformers import pipeline

question_answerer = pipeline("question-answering", model="./data/saved-test/")


Some weights of BloomForQuestionAnswering were not initialized from the model checkpoint at bigscience/bloom-3b and are newly initialized: ['qa_outputs.weight', 'qa_outputs.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [37]:
question = "How many programming languages does BLOOM support?"
context = "BLOOM has 176 billion parameters and can generate text in 46 languages natural languages and 13 programming languages."

In [38]:
question_answerer(question=question, context=context)

{'score': 0.29287680983543396, 'start': 117, 'end': 49, 'answer': ''}

In [41]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("./data/testing_fine_tune_qa")
inputs = tokenizer(question, context, return_tensors="pt")

In [42]:
import torch
from transformers import AutoModelForQuestionAnswering

model = AutoModelForQuestionAnswering.from_pretrained("./data/testing_fine_tune_qa")
with torch.no_grad():
    outputs = model(**inputs)

Some weights of BloomForQuestionAnswering were not initialized from the model checkpoint at bigscience/bloom-3b and are newly initialized: ['qa_outputs.weight', 'qa_outputs.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Loading adapter weights from ./data/testing_fine_tune_qa led to unexpected keys not found in the model:  ['qa_outputs.modules_to_save.bias', 'qa_outputs.original_module.bias']. 


### Trying another model with no quant

In [3]:
import pandas as pd
import torch
from transformers import (
    AutoModelForQuestionAnswering,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    Trainer
)

from datasets import load_dataset
model_small = AutoModelForQuestionAnswering.from_pretrained("distilbert-base-uncased")
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

squad = load_dataset("squad", split="train[:5000]")
squad = squad.train_test_split(test_size=0.2)
tokenized_squad = squad.map(preprocess_function, batched=True, remove_columns=squad["train"].column_names)

Some weights of DistilBertForQuestionAnswering were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Map:   0%|          | 0/4000 [00:00<?, ? examples/s]

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

In [4]:
tokenized_squad.column_names

{'train': ['input_ids', 'attention_mask', 'start_positions', 'end_positions'],
 'test': ['input_ids', 'attention_mask', 'start_positions', 'end_positions']}

In [5]:
from transformers import DefaultDataCollator

data_collator = DefaultDataCollator()

In [8]:
training_args = TrainingArguments(
    output_dir="my_small_qa",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    push_to_hub=True,
)

trainer = Trainer(
    model=model_small,
    args=training_args,
    train_dataset=tokenized_squad["train"],
    eval_dataset=tokenized_squad["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
)

In [9]:
trainer.train()

Epoch,Training Loss,Validation Loss
1,No log,0.111887
2,0.369100,0.110987
3,0.369100,0.087143


TrainOutput(global_step=750, training_loss=0.27876301829020184, metrics={'train_runtime': 85.3802, 'train_samples_per_second': 140.548, 'train_steps_per_second': 8.784, 'total_flos': 1567837200384000.0, 'train_loss': 0.27876301829020184, 'epoch': 3.0})

In [10]:
from transformers import pipeline

question_answerer = pipeline("question-answering", model="./data/my_small_qa")


In [15]:
question = "How far from Warsaw does the Vistula river's environment change noticeably?"
context = "There are 13 natural reserves in Warsaw – among others, Bielany Forest, Kabaty Woods, Czerniaków Lake. About 15 kilometres (9 miles) from Warsaw, the Vistula river's environment changes strikingly and features a perfectly preserved ecosystem, with a habitat of animals that includes the otter, beaver and hundreds of bird species. There are also several lakes in Warsaw – mainly the oxbow lakes, like Czerniaków Lake, the lakes in the Łazienki or Wilanów Parks, Kamionek Lake. There are lot of small lakes in the parks, but only a few are permanent – the majority are emptied before winter to clean them of plants and sediments."

In [16]:
question_answerer(question=question, context=context)

{'score': 5.878135200809709e-10, 'start': 627, 'end': 628, 'answer': '.'}