# Intro
This notebook builds a simple dataset which will be a base for our RAG system.
The dataset will be composed of random wikipedia pages. It makes a good corpus to practice building
RAGs for a common usecase: internal documentation question answering. There are of course a few differences, including
scale (depending on the company) and lack of specialised lingo and concepts which are often out of
the LLMs training distribution (Wikipedia is actually in pre-training corpus of many LLMs).

Some chunking methods implemented can also be found in popular libraries (eg: LangChain). We are rewriting them for fun here.

In [1]:
import os
import re
from typing import Callable, Iterable, Dict, Any
from tqdm import tqdm

import numpy as np
from datasets import Dataset, load_dataset, load_from_disk
from huggingface_hub import InferenceClient

LOCAL_DATASET_FOLDER = "local_datasets"

In [2]:
# Saves some time to avoid fetching and parsing pages on your own
# by loading a HF dataset of wikipedia pages
wiki_data = load_dataset("wikipedia", "20220301.simple")["train"]

Downloading builder script:   0%|          | 0.00/36.7k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/16.0k [00:00<?, ?B/s]

In [3]:
wiki_data

Dataset({
    features: ['id', 'url', 'title', 'text'],
    num_rows: 205328
})

In [4]:
# Have a look at one element
first_el = wiki_data[0].copy() # for safe edits
first_el["text"] = first_el["text"][:200] + "..." # for easier readability
first_el

{'id': '1',
 'url': 'https://simple.wikipedia.org/wiki/April',
 'title': 'April',
 'text': 'April is the fourth month of the year in the Julian and Gregorian calendars, and comes between March and May. It is one of four months to have 30 days.\n\nApril always begins on the same day of week as ...'}

We are pretty close to the dataset format we need for indexing. The main blocker is that the text field is too long for the limited context size of some embedding models we'd like to use (eg: BERT uses a context of ~512 token which is about ~512 words). We could also use
larger models with larger context sizes, but research also suggest that models tend to lose track of some information in large context sizes: https://huggingface.co/papers/2307.03172.

As a result, it seems favorable to keep using a relatively small context size -> We'll need to chunk our text examples. The rest of the notebook plays with different methods to do it.

In [5]:
# Define utils to apply a chunking method to the dataset per batch
# We'll define different chunking methods to use after this
def get_chunk_from_batch(
    examples_batch: Iterable,
    chunk_text_method: Callable,
    **chunk_text_method_kwargs: Dict[str, int]
) -> Dict[str, Iterable[Any]]:
    """
    Apply 'chunk_text_method' to the examples_batch and returns
    a dictionnary in the formatexpected by the Dataset.map method (Dict[str, Iterable[Features values]])
    """
    example_ids = []
    example_urls = []
    example_titles = []
    chunks = []
    for ind, example_text in enumerate(examples_batch["text"]):
        for chunk in chunk_text_method(example_text, **chunk_text_method_kwargs):
            example_ids.append(examples_batch["id"][ind])
            example_titles.append(examples_batch["title"][ind])
            example_urls.append(examples_batch["url"][ind])
            chunks.append(chunk)
    return {
        "id": list(range(len(chunks))),
        "original_id": example_ids,
        "title": example_titles,
        "url": example_urls,
        "text_chunk": chunks
    }

# Chunking strategies

## (Dummy) Fixed-length chunking (with some overlap)


In [6]:
CHUNK_SIZE_WORDS = 300
OVERLAP_SIZE_WORDS = 10

# Alternative to using LangChain methods
# We do it indepedently of any tokeniser to make it generic (using words as a unit), at the risk
# of having issues with model context size later on if the number of tokens in the chunk is too high
def chunk_text_with_fixed_length(
    text: str,
    chunk_size_words: int = CHUNK_SIZE_WORDS,
    overlap_size_words: int = OVERLAP_SIZE_WORDS
) -> Iterable[str]:
    text_no_new_lines = text.replace("\n", " ")
    text_split = text_no_new_lines.split(" ")

    total_words = len(text_split)
    # iterate over words, chunk
    word_index = 0
    while word_index < total_words:
        yield " ".join(text_split[word_index:word_index+chunk_size_words])
        word_index += chunk_size_words - overlap_size_words

In [7]:
wiki_data_chunked = wiki_data.map(
    lambda example_batch: get_chunk_from_batch(example_batch, chunk_text_with_fixed_length),
    batched=True,
    remove_columns=["id", "title", "text", "url"] # Removes columns because of row expansion
)

Map:   0%|          | 0/205328 [00:00<?, ? examples/s]

In [8]:
wiki_data_chunked[0:2]

{'id': [0, 1],
 'url': ['https://simple.wikipedia.org/wiki/April',
  'https://simple.wikipedia.org/wiki/April'],
 'title': ['April', 'April'],
 'original_id': ['1', '1'],
 'text_chunk': ["April is the fourth month of the year in the Julian and Gregorian calendars, and comes between March and May. It is one of four months to have 30 days.  April always begins on the same day of week as July, and additionally, January in leap years. April always ends on the same day of the week as December.  April's flowers are the Sweet Pea and Daisy. Its birthstone is the diamond. The meaning of the diamond is innocence.  The Month   April comes between March and May, making it the fourth month of the year. It also comes first in the year out of the four months that have 30 days, as June, September and November are later in the year.  April begins on the same day of the week as July every year and on the same day of the week as January in leap years. April ends on the same day of the week as December e

In [9]:
# Save data back to disk
FIXED_LENGTH_CHUNK_DATASET_NAME = f"wiki-data-chunked-fixed-length-CS{CHUNK_SIZE_WORDS}-OS{OVERLAP_SIZE_WORDS}"
wiki_data_chunked.save_to_disk(
    os.path.join(LOCAL_DATASET_FOLDER, FIXED_LENGTH_CHUNK_DATASET_NAME)
)

Saving the dataset (0/1 shards):   0%|          | 0/263959 [00:00<?, ? examples/s]


## Paragraph recursive chunking

In [11]:
# Parse main sections and try to use those as chunks,
# Sections that are too long are split by sub-sections, and the same logic is applied recursively
# Parsing sections is done differently depending on the document format. For markdown, we'd split on '#' then '##' etc.
# with this corpus, paragraphs and sections are split with '\n\n' and it's hard to infer sections titles besides checking the size of the section
# We can simply assume that '\n\n' represent relatively good semantic breaks, and recursively use those to break sections that are too long in 'half'

# Makes the assumption that individual paragraphs are all smaller than section size

# Prepend all sections with title and subtitle
# cut on titles, looks at section sizes, if too long, cut

In [12]:
def chunk_text_recursively_per_section(text: str, max_chunk_size_words: int = CHUNK_SIZE_WORDS) -> Iterable[str]:
    SPLIT_STR = "\n\n"
    text_split = text.split(" ")
    if len(text_split) > max_chunk_size_words and SPLIT_STR not in text:
        # We can't split the text further and it's too big, resolve to dummy chunking strategy
        return list(chunk_text_with_fixed_length(text, max_chunk_size_words, OVERLAP_SIZE_WORDS))
    elif len(text_split) <= max_chunk_size_words:
        return [text]
    else:
        # There's at least one split candidate in the text, pick the best one
        # (=the one that looks to be the closest to the middle)
        text_len = len(text)
        all_potential_splits = [m.start() for m in re.finditer(SPLIT_STR, text)]
        all_potential_splits_distances_to_half = [abs(split_ind - text_len//2) for split_ind in all_potential_splits]
        best_split_ind = all_potential_splits[all_potential_splits_distances_to_half.index(min(all_potential_splits_distances_to_half))]

        return chunk_text_recursively_per_section(text[0:best_split_ind], max_chunk_size_words) + chunk_text_recursively_per_section(text[(best_split_ind+len(SPLIT_STR)):], max_chunk_size_words)
        

In [13]:
wiki_data_chunked_recursive = wiki_data.map(
    lambda example_batch: get_chunk_from_batch(example_batch, chunk_text_recursively_per_section),
    batched=True,
    remove_columns=["id", "title", "text", "url"] # Removes columns because of row expansion
)

Map:   0%|          | 0/205328 [00:00<?, ? examples/s]

In [14]:
wiki_data_chunked[0:2]

{'id': [0, 1],
 'url': ['https://simple.wikipedia.org/wiki/April',
  'https://simple.wikipedia.org/wiki/April'],
 'title': ['April', 'April'],
 'original_id': ['1', '1'],
 'text_chunk': ["April is the fourth month of the year in the Julian and Gregorian calendars, and comes between March and May. It is one of four months to have 30 days.  April always begins on the same day of week as July, and additionally, January in leap years. April always ends on the same day of the week as December.  April's flowers are the Sweet Pea and Daisy. Its birthstone is the diamond. The meaning of the diamond is innocence.  The Month   April comes between March and May, making it the fourth month of the year. It also comes first in the year out of the four months that have 30 days, as June, September and November are later in the year.  April begins on the same day of the week as July every year and on the same day of the week as January in leap years. April ends on the same day of the week as December e

In [15]:
# Save data back to disk
RECURSIVE_DATASET_NAME = f"wiki-data-chunked-recursive-CS{CHUNK_SIZE_WORDS}"
wiki_data_chunked_recursive.save_to_disk(
    os.path.join(LOCAL_DATASET_FOLDER, RECURSIVE_DATASET_NAME)
)

Saving the dataset (0/1 shards):   0%|          | 0/271938 [00:00<?, ? examples/s]

## [Optional - can be skipped] Modelling approach

If the previous method yields disappointing results, which could happen if the segmentation of sections is harder to work with, we could use a slightly more esoteric approach using a language model to detects interesting splitting points.

This could be done in different ways which may have varying performance depending on the dataset, a few similar ideas include:
    - Simply ask an LLM for the split points
    - Use an embedding model to capture the semantic meaning of each sentence, and add a split point where the topic seems to shift significantly
    - Use a model trained on 'Next Sentence Prediction' and add a split point where the model confidently says sentences are disconnected.

We'll try the latter here with [BERT](https://huggingface.co/google-bert/bert-base-uncased). #TODO replace with MiniLM

In [16]:
from transformers import BertTokenizer, BertForNextSentencePrediction
import torch
import nltk

In [17]:
def is_sentence_next(bert_model, bert_tokeniser, sentence_a, sentence_b):
    encoding = bert_tokeniser(sentence_a, sentence_b, return_tensors="pt")
    outputs = bert_model(**encoding, labels=torch.LongTensor([1]))

    # Decision logic to decide if we'd like to break the chunk here.
    # Finding the right threshold/logic requires some trials and errors and probably depends on the dataset used
    return outputs.logits[0, 0] > outputs.logits[0, 1] # Same sentence more likely than random
        

def chunk_text_with_bert(text: str, max_chunk_size_words: int = CHUNK_SIZE_WORDS):
    tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
    model = BertForNextSentencePrediction.from_pretrained("bert-base-uncased")
    model.eval()
    model.to("cpu") # switch to CUDA if you have a GPU

    # Split per sentence
    text_sentences = nltk.sent_tokenize(text)

    # Make sure sentences are all small enough to be sent forward. A dummy approach is to
    # truncate those.
    text_sentences = [
        " ".join(sentence.split(" ")[:int(0.8*max_chunk_size_words)])
        for sentence in text_sentences
    ]
    
    chunks = []
    last_sentence = text_sentences[0]
    current_chunk = text_sentences[0]

    for sentence in text_sentences:
        current_chunk_size_words = len(current_chunk.split(" "))
        sentence_size_words = len(sentence.split(" "))
        if current_chunk_size_words + sentence_size_words > max_chunk_size_words:
            # we have to split in any case
            chunks.append(current_chunk)
            current_chunk = sentence
        if is_sentence_next(model, tokenizer, last_sentence, sentence):
            # add to current chunk and continue
            current_chunk = current_chunk + sentence
        else:
            # split
            chunks.append(current_chunk)
            current_chunk = sentence
        last_sentence = sentence
    
    return chunks


In [18]:
# Method is quite slow and would benefit optimisation! for learning purpose only
wiki_data_chunked_w_model = wiki_data.select(range(20)).map( # Truncating dataset on purpose!
    lambda example_batch: get_chunk_from_batch(example_batch, chunk_text_with_bert),
    batched=True,
    remove_columns=["id", "title", "text", "url"], # Removes columns because of row expansion
    batch_size=16
)

Map:   0%|          | 0/20 [00:00<?, ? examples/s]



In [19]:
wiki_data_chunked_w_model[0:2]

{'id': [0, 1],
 'url': ['https://simple.wikipedia.org/wiki/April',
  'https://simple.wikipedia.org/wiki/April'],
 'title': ['April', 'April'],
 'original_id': ['1', '1'],
 'text_chunk': ["April is the fourth month of the year in the Julian and Gregorian calendars, and comes between March and May.April is the fourth month of the year in the Julian and Gregorian calendars, and comes between March and May.It is one of four months to have 30 days.April always begins on the same day of week as July, and additionally, January in leap years.April always ends on the same day of the week as December.April's flowers are the Sweet Pea and Daisy.Its birthstone is the diamond.The meaning of the diamond is innocence.",
  "The Month \n\nApril comes between March and May, making it the fourth month of the year.It also comes first in the year out of the four months that have 30 days, as June, September and November are later in the year.April begins on the same day of the week as July every year and on

In [20]:
# Save data back to disk
MODEL_DATASET_NAME = f"wiki-data-chunked-w-model-CS{CHUNK_SIZE_WORDS}"
wiki_data_chunked_w_model.save_to_disk(
    os.path.join(LOCAL_DATASET_FOLDER, MODEL_DATASET_NAME)
)

Saving the dataset (0/1 shards):   0%|          | 0/75 [00:00<?, ? examples/s]

# Questions-Answers generation
It would be excellent to have human-curated questions answers pairs to evaluate our retrieval logic.
(a bit like in https://www.kaggle.com/datasets/rtatman/questionanswer-dataset?resource=download)

If we can't afford this, we can always generate questions/answers with an LLM as well.

The notebook implements both methods below, if you have time on your hands to annotate things!

In [21]:
dataset_to_add_questions_to = RECURSIVE_DATASET_NAME

In [22]:
wiki_data_chunked = load_from_disk(
    os.path.join(LOCAL_DATASET_FOLDER, dataset_to_add_questions_to)
)

In [23]:
wiki_data_chunked

Dataset({
    features: ['id', 'url', 'title', 'original_id', 'text_chunk'],
    num_rows: 271938
})

## Manual hand-labeling

In [None]:
questions_dataset_name = f"{dataset_to_add_questions_to}-questions"

data = []
while True: # Interrupt when you'd like to stop
    rnd_chunk_id = np.random.randint(len(wiki_data_chunked))
    print("-----------------------")
    print("-----------------------")
    print("New chunk to annotate!")
    print("-----------------------")
    print("-----------------------")
    print(wiki_data_chunked[rnd_chunk_id]["text_chunk"])
    question = input("Type a question:")
    answer = input("Type the answer to the question:")

    new_el = {
        "chunk_id": rnd_chunk_id,
        "question": question,
        "answer" : answer
    }
    data.append(new_el)
    
    print(f"New data point: {new_el}")
    

In [None]:
data[:5]

In [None]:
questions_dataset = Dataset.from_list(data)

In [None]:
questions_dataset.save_to_disk(
    os.path.join(LOCAL_DATASET_FOLDER, questions_dataset_name)
)

## LLM labeling

In [24]:
# For fun, we can tryout the HF inference API
# os.environ["HF_TOKEN_SERVERLESS_API"] = "hf_*"
token = os.environ["HF_TOKEN_SERVERLESS_API"] # ADD YOUR TOKEN TO YOUR ENV! (It's a free service)
client = InferenceClient(
    token=token,
)

In [29]:
def fetch_question_pair_from_llm(text):
    response = client.chat_completion(
    	model="meta-llama/Meta-Llama-3-8B-Instruct",
    	messages=[
            # Prompt can be improved, LLM sometimes outputs things like "Who are notable figures mentioned in this list?"
            # which obviously doesnt work as we won't have access to the list... What would you suggest we change?
            {"role": "user", "content": "You are a helpful assistant. You will receive text chunks in quotes from users that originate from a wikipedia page. Your task will be to create a question/answer pair from this text chunk, with the answer being present in the chunk. Answer the query in the form [question] END_QUESTION [answer], nothing more. Please write short questions!"},
            {"role": "assistant", "content": "Sure! understood."},
            {"role": "user", "content": f"'{text}'"}],
    	max_tokens=50,
    )

    llm_output = response.choices[0]["message"]["content"]
    
    result = llm_output.split(" END_QUESTION ")
    if len(result) != 2:
        print("LLM Failed! returning None")
        return None, None
    
    question, answer = result
    return question, answer

    


In [30]:
# Test it once before sending multiple requests
fetch_question_pair_from_llm(
    wiki_data_chunked[0]["text_chunk"]
)

("What are April's flowers?", 'Sweet Pea and Daisy.')

In [None]:
questions_llm_data = []

In [31]:
N_REQUESTS = 100

for _ in tqdm(range(N_REQUESTS)):
    rnd_chunk_id = np.random.randint(len(wiki_data_chunked))
    text_chunk = wiki_data_chunked[rnd_chunk_id]["text_chunk"]
    print("--- New ELEMENT ---")
    print("Fetching a question for:")
    print(text_chunk)
    print("LLM answered:")

    question, answer = fetch_question_pair_from_llm(text_chunk)
    
    new_el = {
        "chunk_id": rnd_chunk_id,
        "question": question,
        "answer" : answer
    }
    questions_llm_data.append(new_el)
    
    print(f"New data point: {new_el}")

  0%|                                                                                       | 0/100 [00:00<?, ?it/s]

--- New ELEMENT ---
Fetching a question for:
Vega has not much of the elements heavier than helium. Vega is a suspected variable star that may vary slightly in magnitude in a periodic manner. It is spinning rapidly with a velocity of 274 km/s at the equator. This is causing the equator to bulge outward because of centrifugal effects, and, as a result, there is a variation of temperature across the star's photosphere that reaches a maximum at the poles. From Earth, Vega is being observed from the direction of one of these poles.

Based on an observed excess emission of infrared radiation, Vega appears to have a circumstellar disk of dust. This dust is likely to be the result of collisions between objects in an orbiting disk of debris, similar to the Kuiper belt in the Solar System. Stars that display an infrared excess because of dust emission are termed Vega-like stars. Irregularities in Vega's disk also suggest the presence of at least one planet, likely to be about the size of Jupite

  1%|▊                                                                              | 1/100 [00:00<01:30,  1.09it/s]

New data point: {'chunk_id': 148893, 'question': 'What causes the equator to bulge outward on the star Vega?', 'answer': 'It is spinning rapidly with a velocity of 274 km/s.'}
--- New ELEMENT ---
Fetching a question for:
Cauby Peixoto (10 February 1931 – 15 May 2016) was a Brazilian singer, and actor. His career lasted from the late 1940s until his death in 2016. He is known for his deep voice and hairstyles. His genres include jazz and soft rock.

Peixoto was known for his album "Maxximum". He also appeared in movies. He was known for his appearance as a singer in the movie Com Água na.

He had a brief career in the United States in the 1950s, where he presented under the pseudonyms Ron Coby or Coby Dijon.

Peixoto was born in Niterói, Rio de Janeiro. He studied at a Salesian school in Niteroi. Peixoto died in São Paulo, São Paulo from pneumonia on 15 May 2016, aged 85.

Cauby was named by Time and Life magazines as the Brazilian Elvis Presley.

References

Other websites

 Official s

  2%|█▌                                                                             | 2/100 [00:01<01:33,  1.05it/s]

New data point: {'chunk_id': 173321, 'question': '"Who was Cauby Peixoto named by?"', 'answer': 'Cauby Peixoto was named by Time and Life magazines.'}
--- New ELEMENT ---
Fetching a question for:
Nolberto Solano (born 12 December 1974) is a Peruvian football player. He has played for Peru national team.

Club career statistics

|-
|1994||rowspan="4"|Sporting Cristal||rowspan="4"|Primera División||0||0||||||||||||||0||0
|-
|1995||38||12||||||||||||||38||12
|-
|1996||26||13||||||||||||||26||13
|-
|1997||11||7||||||||||||||11||7

|-
|1997/98||Boca Juniors||Primera División||32||5||||||||||||||32||5

|-
|1998/99||rowspan="6"|Newcastle United||rowspan="6"|Premier League||29||6||7||0||1||0||2||0||39||6
|-
|1999/00||30||3||3||0||1||0||6||1||40||4
|-
|2000/01||33||6||1||1||4||0||colspan="2"|-||38||7
|-
|2001/02||37||7||5||1||4||0||6||4||52||12
|-
|2002/03||31||7||1||0||1||0||12||1||43||8
|-
|2003/04||12||0||2||0||1||0||5||1||20||1
|-
|2003/04||rowspan="3"|Aston Villa||rowspan="3"|Premier Leagu

  3%|██▎                                                                            | 3/100 [00:03<01:41,  1.05s/it]

New data point: {'chunk_id': 96462, 'question': 'When was Nolberto Solano born?', 'answer': '12 December 1974'}
--- New ELEMENT ---
Fetching a question for:
Okinawa is the name for the biggest island in the Ryūkyū Islands, far south of Japan. It is also the common name for Okinawa Prefecture, which controls the Southern Ryūkyū Islands.

The capital of Okinawa is Naha. Naha is on the island with the most people, Okinawa island. Okinawa used to be called Great Lew Chew Island.

Okinawa is home to the Okinawan people, who also inhabit the minor islands surrounding Okinawa as well. There are also many Japanese in Okinawa.

Many of Okinawa's islands are scenic, and there are many beaches there. 

The temperature in Okinawa is often warm or hot.  Many animals make their home around the Okinawa islands.  For example, sea turtles, jellyfish, and many kinds of birds all live around Okinawa.

An average person in Okinawa lives to be older than 100 years old. Some people think that is because Oki

  4%|███▏                                                                           | 4/100 [00:04<01:40,  1.05s/it]

New data point: {'chunk_id': 28196, 'question': '"What is the capital of Okinawa?"', 'answer': '"Naha"'}
--- New ELEMENT ---
Fetching a question for:
Lina Anatolyevna Cheryazova (, 1 November 1968 – 23 March 2019) was an Uzbek freestyle skier. She won a gold medal at the 1994 Winter Olympics. Cheryazova was born in Tashkent, Uzbekistan.

Cheryazova died on 23 March 2019 in Novosibirsk, Russia from complications of ALS, aged 50.

References

1968 births
2019 deaths
Deaths from motor neurone disease
Olympic gold medalists
Uzbekistani people
LLM answered:


  5%|███▉                                                                           | 5/100 [00:04<01:29,  1.06it/s]

New data point: {'chunk_id': 202696, 'question': 'What year was Lina Cheryazova born?', 'answer': '1968'}
--- New ELEMENT ---
Fetching a question for:
Jerome (; ; ;  – 30 September 420), was a Christian priest, theologian and historian. He lived in the Roman Empire. He is best known for translating the Bible into Latin (the Vulgate). He is recognised as a saint by the Catholic Church, the Eastern Orthodox Church, the Lutheran Church, and the Church of England (Anglican Communion).

References

Other websites
Saint Jerome at the Catholic Encyclopedia

340s births
420 deaths
Ancient Roman historians
Ancient Roman writers
Bible translators
Christian clergy
Christian saints
Christian theologians
Priests
LLM answered:


  6%|████▋                                                                          | 6/100 [00:05<01:18,  1.19it/s]

New data point: {'chunk_id': 145697, 'question': 'When did Saint Jerome die?', 'answer': '30 September 420'}
--- New ELEMENT ---
Fetching a question for:
A city is a classification of municipalities used in the Canadian province of British Columbia.

List of cities

Notes:

Gallery

References

British Columbia
LLM answered:


  7%|█████▌                                                                         | 7/100 [00:06<01:08,  1.36it/s]

New data point: {'chunk_id': 202868, 'question': 'What classification of municipalities is used in British Columbia?', 'answer': 'A city'}
--- New ELEMENT ---
Fetching a question for:
Sasha Calle (; born August 7, 1995) is an American actress. She is known for her work in the soap opera The Young and the Restless as chef Lola Rosales. She will play the superheroine Supergirl in the DC Extended Universe media franchise, starting in The Flash (2022).

Calle was born in Boston, Massachusetts.  She is of Colombian descent.

References

Other websites

Living people
1995 births
Actors from Boston, Massachusetts
American television actors
American movie actors
LLM answered:


  8%|██████▎                                                                        | 8/100 [00:06<01:05,  1.41it/s]

New data point: {'chunk_id': 251013, 'question': 'Where was Sasha Calle born?', 'answer': 'Boston, Massachusetts'}
--- New ELEMENT ---
Fetching a question for:
Anger Management is a 2003 American slapstick comedy movie directed by Peter Segal. It was written by David S. Dorfman. It stars Adam Sandler, Jack Nicholson, Marisa Tomei, John Turturro, and Woody Harrelson. It was produced by Revolution Studios. It has cameo appearances of Rudy Giuliani, Bob Sheppard, and Derek Jeter. It was released to positive reviews on April 11, 2003.

Other websites
 
 
 
 
 

2003 comedy movies
American comedy movies
Movies set in New York City
Movies directed by Peter Segal
LLM answered:


  9%|███████                                                                        | 9/100 [00:07<01:10,  1.28it/s]

New data point: {'chunk_id': 166746, 'question': 'Who directed the 2003 American slapstick comedy movie Anger Management?', 'answer': 'Adam Sandler'}
--- New ELEMENT ---
Fetching a question for:
El Llano (The Plain) is a Dominican municipality in the Elías Piña province.

Population
The municipality had, in 2010, a total population of 4,193: 2,161 men and 2,032 women. The urban population was  of the total population.

History
El Llano became a municipality by the law 687 of 2 July 1974. Before that date, it was part of Comendador.

Geography
El Llano has an area of . It is to the east of Comendador, next to the San Juan province.

Administrative division
The municipality of El Llano has only one municipal district; this is:

Economy
The main economic activity of the municipality is farming. The main crop is rice.

References 

Settlements in the Dominican Republic
1974 establishments in North America
1970s establishments in the Dominican Republic
LLM answered:


 10%|███████▊                                                                      | 10/100 [00:08<01:10,  1.29it/s]

New data point: {'chunk_id': 44286, 'question': 'What is the name of the province where El Llano is located?', 'answer': 'El Lías Piña'}
--- New ELEMENT ---
Fetching a question for:
Sacriston is a village in County Durham, England.

References

Villages in England
Settlements in County Durham
LLM answered:


 11%|████████▌                                                                     | 11/100 [00:08<01:03,  1.40it/s]

New data point: {'chunk_id': 81689, 'question': 'What is Sacriston?', 'answer': 'A village in County Durham, England.'}
--- New ELEMENT ---
Fetching a question for:
Bulgaria national volleyball team is the national volleyball team of Bulgaria. They have participated in the FIVB world league ever since 1994. Bulgaria played at the first world championship in 1949.

Honors 
 Olympics: Silver medal (1980)
 World Championship: Silver Medal (1970) - Bronze medal (1949, 1952, 1986, 2000)
 World Cup: Bronze medal (2007)
 European Championship: Silver medal (1951)

References

Other websites 
 BG Vollyball

National volleyball teams
Sport in Bulgaria
LLM answered:


 12%|█████████▎                                                                    | 12/100 [00:09<01:04,  1.36it/s]

New data point: {'chunk_id': 164938, 'question': 'What year did Bulgaria first participate in the FIVB world league?', 'answer': '1994'}
--- New ELEMENT ---
Fetching a question for:
24
Volodymyr Bondarenko, 68, Ukrainian politician, Deputy (1996–2014) and Head of the Kyiv City State Administration (2014).
O. Chandrashekar, 86, Indian footballer (national team).
Anatoliy Chizhov, 87, Soviet-born Russian engineer (Progress Rocket Space Centre) and politician, Deputy (1989–1991). (death announced on this date)
Olabiyi Durojaiye, 88, Nigerian politician, Senator (1999–2003), COVID-19.
Dale Derby, 72, American physician and politician, member of the Oklahoma House of Representatives (2016–2018), drowned.
Léopold K. Fakambi, 78, Beninese agronomist and engineer.
Nicholas Felice, 94, American politician, Mayor of Fair Lawn (1972–1974) and member of the New Jersey General Assembly (1982–2002).
Hissène Habré, 79, Chadian politician and convicted war criminal, Prime Minister (1978–1979) and Pres

 13%|██████████▏                                                                   | 13/100 [00:11<01:19,  1.09it/s]

New data point: {'chunk_id': 239965, 'question': 'Here is the question/answer pair:\n\nWho was the English Hall of Fame drummer who died at the age of 80?', 'answer': 'Charlie Watts'}
--- New ELEMENT ---
Fetching a question for:
Stacy Martin (born 1 January 1991) is a French-English actress. She is best known for her role as the younger version of Charlotte Gainsbourg's character Joe in Nymphomaniac. Martin also appeared as J. Paul Getty's secretary in Ridley Scott's All the Money in the World. 

Martin was born in Paris. She lives in London with her partner Daniel Blumberg.

References

Other websites

 

1991 births
Living people
French movie actors
English movie actors
Actors from Paris
Actors from London
LLM answered:


 14%|██████████▉                                                                   | 14/100 [00:11<01:09,  1.23it/s]

New data point: {'chunk_id': 191447, 'question': 'Who is the partner of Stacy Martin?', 'answer': 'Daniel Blumberg'}
--- New ELEMENT ---
Fetching a question for:
Svay Rieng is a province in Cambodia. The capital is Svay Rieng while the largest city is Bavet.

There are 6 districts and 2 municipalities:
Bavet Municipality
Chanthrea
Kampong Rou
Romdoul
Romeas Haek
Svay Chrom
Svay Rieng Municipality
Svay Theab

References

Provinces of Cambodia
Establishments in Cambodia
1907 establishments in Asia
LLM answered:


 15%|███████████▋                                                                  | 15/100 [00:12<01:07,  1.25it/s]

New data point: {'chunk_id': 218937, 'question': 'What is the capital of Svay Rieng province in Cambodia?', 'answer': 'Svay Rieng'}
--- New ELEMENT ---
Fetching a question for:
Annéot is a commune. It is found in the Yonne department in the center of France.

References
INSEE

Communes in Yonne
LLM answered:


 16%|████████████▍                                                                 | 16/100 [00:12<00:59,  1.41it/s]

New data point: {'chunk_id': 134278, 'question': 'Where is Annéot found?', 'answer': 'France'}
--- New ELEMENT ---
Fetching a question for:
Michael English (24 December 1930 – 13 July 2019) was a British Labour Party politician. He was a Member of Parliament for the British Parliament for Nottingham West from 1964 to 1983. 

English died on 13 July 2019 at the age of 88.

References

Other websites
 Peter Fry interview at History of Parliament Online

1930 births
2019 deaths
Former Labour Party (UK) MPs
UK MPs 1964–1966
UK MPs 1970–1974
UK MPs 1974
UK MPs 1974–1979
UK MPs 1979–1983
Politicians from Nottinghamshire
People from Nottingham
People from Liverpool
LLM answered:


 17%|█████████████▎                                                                | 17/100 [00:13<00:56,  1.46it/s]

New data point: {'chunk_id': 207119, 'question': 'What year was Michael English born?', 'answer': '1930'}
--- New ELEMENT ---
Fetching a question for:
There are 93 counties in the U.S. state of Nebraska.

Nebraska's postal abbreviation is NE and its FIPS state code is 31.

County list

|}

Former counties of Nebraska
 Clay (1855-64) Formed from un-organized and dissolved into Gage and Lancaster County.
 Jackson (1855-6) Formed from un-organized and dissolved to the Fillmore County and un-organized.
 Johnson (1855-6) Formed from un-organized and dissolved to un-organized
 Blackbird (1855-88) Formed from Burt County and dissolved to Thurston County
 Loup (1855-6) Formed from Burt and un-organized and then dissolved Madison, Monroe and Platte Counties
 Jones (1856-66) Formed from un-organized and dissolved into Jefferson County.
 Grant, Harrison, Jackson, Lynn, Monroe and Taylor counties listed in 1870 (But no proof on where)

Notes 

Nebraska
LLM answered:


 18%|██████████████                                                                | 18/100 [00:14<00:56,  1.45it/s]

New data point: {'chunk_id': 122070, 'question': 'How many counties are in the state of Nebraska?', 'answer': '93'}
--- New ELEMENT ---
Fetching a question for:
Pamela Ann Melroy (born September 17, 1961) is a retired United States Air Force officer and a former NASA astronaut. She is the Deputy Administrator of NASA since 2021 during the Joe Biden administration. She served as pilot on Space Shuttle missions STS-92 and STS-112 and commanded mission STS-120 before leaving the agency in August 2009. 

Melroy will be inducted into the United States Astronaut Hall of Fame at the Kennedy Space Center Visitor Complex.

References

1961 births
American astronauts
Living people
Scientists from California
Politicians from California
Engineers from California
Military people from California
People from Palo Alto, California
US Democratic Party politicians
LLM answered:


 19%|██████████████▊                                                               | 19/100 [00:15<01:06,  1.21it/s]

New data point: {'chunk_id': 249389, 'question': '"Who is the Deputy Administrator of NASA?"', 'answer': '"Pamela Ann Melroy"'}
--- New ELEMENT ---
Fetching a question for:
30
 Yawovi Agboyibo, 76, Togolese politician, Prime Minister (2006–2007).
 Michael Angelis, 68, English actor (Boys from the Blackstuff, The Liver Birds, Thomas & Friends), heart attack.
 Bobby Dimond, 90, Australian rugby league footballer (Western Suburbs, New South Wales, national team).
 Elsa Dorfman, 83, American photographer, kidney failure.
 Michel Gauthier, 70, Canadian politician, Quebec MNA (1981–1988), Leader of the Opposition (1996–1997) and MP (1994–2007), lung cancer.
 Józef Grzesiak, 79, Polish Olympic bronze medalist boxer (1964), problems caused by dementia.
 Hassan Hosny, 88, Egyptian actor (Nasser 56, El-Limby, Bobbos) and comedian, heart attack.
 Mady Mesplé, 89, French opera singer, problems caused by Parkinson's disease.
 Bobby Morrow, 84, American sprinter, Olympic champion (1956), anemia and 

 20%|███████████████▌                                                              | 20/100 [00:16<01:15,  1.06it/s]

New data point: {'chunk_id': 218660, 'question': 'What was the age of Yawovi Agboyibo?', 'answer': '76'}
--- New ELEMENT ---
Fetching a question for:
The mute e also occurs in the ending of verbs (usually -en).

Consonants 
 b
 c
 d
 f
 g/ch – not pronounced as the English G; the Dutch G is pronounced in the back of the throat with a "scratching" sound. In the south of the Netherlands, the G is spoken differently (so called soft G) than in the north (hard G).
 h
 j – like "y" in you
 k
 l
 m
 n
 p
 q – only used rarely; spoken as k
 r – not like English; the Dutch R is a more rolling R
 s
 t
 v
 w
 x – only used rarely, mostly in foreign words, pronounced as ks
 z

Note: In words that end with "-d", the "-d" is pronounced like "-t".

Grammar 
The grammar of Dutch is slightly more complex than that of English. Word order in sentences is different in complex sentences. The basic simple sentence-structure is subject-verb.
LLM answered:


 21%|████████████████▍                                                             | 21/100 [00:17<01:08,  1.15it/s]

New data point: {'chunk_id': 5588, 'question': 'What diacritic is used for the ending of Dutch verbs?', 'answer': 'mute e'}
--- New ELEMENT ---
Fetching a question for:
The Río Negro department () is a department of Uruguay. The capital is the city of Fray Bentos.

Its ISO 3166-2 code  is UY-RN.

History 
The Río Negro department was created on 20 March 1880. Before that time, its territory was part of the Paysandú department.

Geography

The Río Negro department is in the western part of Uruguay, along the Uruguay river. It has an area of . The population is 54,765 inhabitants (2011 census), one of the departments with fewest inhabitants, for a population density of  inhabitants/km2.

There are two chains of low hills in the department. These chains of low and rounded hills are named cuchillas in the country. The two cuchillas in the department are cuchilla de Navarro to the south and forming the border with the Durazno department, and cuchilla de Haedo that goes from the northeast to

 22%|█████████████████▏                                                            | 22/100 [00:18<01:07,  1.16it/s]

New data point: {'chunk_id': 183521, 'question': 'What is the ISO 3166-2 code of the Río Negro department?', 'answer': 'UY-RN.'}
--- New ELEMENT ---
Fetching a question for:
Concerts
As of 6 October 2020, the only music concert at BMO Field was performed by the progressive rock group Genesis on September 7, 2007.

Milestones

The first goal at BMO Field was scored by Eddie Johnson for the Kansas City Wizards in a 1–0 Major League Soccer win over home side Toronto FC in the stadium opener on April 28, 2007. The first Toronto FC goal at the stadium was Danny Dichio's first-half strike against Chicago Fire on May 12, 2007 (also his club's first MLS goal).

The first goal at BMO Field scored by a Canadian came at the official opening on May 11, 2007, in a U-20 friendly between Canada and Argentina. David Edgar scored a penalty in a 2–1 defeat for Canada, just four minutes after Gómez had scored the first international goal at the stadium.

Costa Rica's Víctor Núñez scored the first senior 

 23%|█████████████████▉                                                            | 23/100 [00:18<01:01,  1.26it/s]

New data point: {'chunk_id': 236143, 'question': "'Who scored the first goal at BMO Field?'", 'answer': 'Eddie Johnson'}
--- New ELEMENT ---
Fetching a question for:
Mills County is the name of two counties in the United States:
 Mills County, Iowa
 Mills County, Texas

Other 
 Roger Mills County, Oklahoma
LLM answered:


 24%|██████████████████▋                                                           | 24/100 [00:19<00:57,  1.32it/s]

New data point: {'chunk_id': 196591, 'question': 'How many counties have the name "Mills County" in the United States?', 'answer': 'Two'}
--- New ELEMENT ---
Fetching a question for:
Mont-Disse is a commune of the Pyrénées-Atlantiques département in the southwestern part of France.

Mont-Disse
LLM answered:


 25%|███████████████████▌                                                          | 25/100 [00:20<00:56,  1.32it/s]

New data point: {'chunk_id': 71102, 'question': 'What commune is located in the Pyrénées-Atlantiques department of France?', 'answer': 'Mont-Disse'}
--- New ELEMENT ---
Fetching a question for:
Prunus japonica (also Cerasus japonica) is a shrub in the genus Prunus.  It is also called Korean cherry, Flowering almond or Oriental bush cherry.  It is used for ornamental use. Its range goes from Central China through to the Korean peninsula. P. maximowiczii, the Miyama cherry is also often called Korean cherry.

Description
The shrub reaches 1.5 m by 1.5 m. Its flowers are hermaphrodite.  They are pollinated by insects. The plant blossoms in May. Its fruit reaches about 14 mm. It is used in making pies.

Every fruit has one seed. The plant usually grows from seed.

Other uses
The leaves of this plant procure a green dye, while the fruit procures a greenish to grayish dye.

Varieties
There are several varieties:
P. japonica eujaponica
P. japonica gracillima
P. kerii
P. japonica nakaii, origi

 26%|████████████████████▎                                                         | 26/100 [00:21<00:59,  1.24it/s]

New data point: {'chunk_id': 125340, 'question': "What is the average size of Prunus japonica's fruit?", 'answer': '14\xa0mm'}
--- New ELEMENT ---
Fetching a question for:
Lyudmila Mikhaylovna Alexeyeva (, , 20 July 1927 — 8 December 2018) was a Russian historian and leading human rights activist. She was a founding member of the Moscow Helsinki Watch Group. Alexeyeva was one of the last Soviet dissidents active in modern Russia. She participated in the Strategy-31 protests. She was born in Yevpatoria, Crimea, Soviet Union.

Alexeyeva died on 8 December 2018 in Moscow at the age of 91.

References

Other websites

 Lyudmila Alexeyeva's blog on LiveJournal
 The Alexeyeva File, National Security Archive Electronic Briefing Book, 20 July 2012
 

1927 births
2018 deaths
Russian writers
Russian historians
Human rights activists
LLM answered:


 27%|█████████████████████                                                         | 27/100 [00:22<01:01,  1.18it/s]

New data point: {'chunk_id': 199669, 'question': 'Where was Lyudmila Mikhaylovna Alexeyeva born?', 'answer': 'Yevpatoria, Crimea, Soviet Union'}
--- New ELEMENT ---
Fetching a question for:
Heinz Oberhummer (19 May 1941 – 24 November 2015) was an Austrian physicist and skeptic. He was born in Bischofshofen and raised in Obertauern, Austria. Oberhummer was professor emeritus of Theoretical Physics at the Atominstitut of the Vienna University of Technology. His main research area was nucleosynthesis.

Oberhummer died of pneumonia in Vienna, Austria, aged 74.

References

Other websites
 Articles of Heinz Oberhummer in arxiv.org
 Kann das alles Zufall sein? - Geheimnisvolles Universum, Ecowin-Verlag, 2008; Award for "Best Popular Science Book in Austria 2009"
 Science Busters
 Cinema and Science (CISCI)
 Neue Erkenntnisse zur Entstehung der Grundlagen für Leben 

1941 births
2015 deaths
Austrian educators
Austrian physicists
Austrian scientists
Deaths from pneumonia
Disease-related deaths

 28%|█████████████████████▊                                                        | 28/100 [00:23<01:02,  1.14it/s]

New data point: {'chunk_id': 169210, 'question': 'Where was Heinz Oberhummer born?', 'answer': 'Heinz Oberhummer was born in Bischofshofen.'}
--- New ELEMENT ---
Fetching a question for:
The Kansas City metropolitan area surrounds the city of Kansas City, Missouri. It includes 15 counties in both Missouri and Kansas. With a population of 2.34 million people, it is the second largest metropolitan area in Missouri behind Greater St. Louis. However, the Kansas city metropolitan area is currently showing a 3.1 percent growth rate. The St. Louis metropolitan area is only growing at a rate of .7 percent.

References 

Kansas City, Missouri
Metropolitan areas of the United States
LLM answered:


 29%|██████████████████████▌                                                       | 29/100 [00:23<00:57,  1.23it/s]

New data point: {'chunk_id': 171027, 'question': 'How large is the Kansas City metropolitan area?', 'answer': '2.34 million people'}
--- New ELEMENT ---
Fetching a question for:
Ranvir Singh (born 11 August 1977) is a British television presenter. She is the Political Editor and newsreader and a deputy presenter for Good Morning Britain.

Early life
Singh was born in 1977 in Preston, Lancashire in a Sikh family. Educated at Kirkham Grammar School, an independent school in Kirkham, Lancashire, she graduated from the University of Lancaster with a degree in English and Philosophy. She then gained a postgraduate qualification in journalism at the School of Journalism, Media and Communication, University of Lancashire.

Career
Singh joined BBC Radio Lancashire in 2002, initially on work experience before being a six-month contract. She then moved to BBC GMR, covering the 2002 Commenwealth Games in Manchester. Singh joined the BBC North West regional news programme North West Tonight in 200

 30%|███████████████████████▍                                                      | 30/100 [00:24<00:54,  1.28it/s]

New data point: {'chunk_id': 249838, 'question': 'Where was Ranvir Singh born?', 'answer': '1977'}
--- New ELEMENT ---
Fetching a question for:
Boones Creek is an unincorporated community and neighborhood of Johnson City, in north Washington County, Tennessee. Almost all of Boones Creek has been annexed by Johnson City and has become a neighborhood of Johnson City. Much of it has the postal addresses of Gray, Tennessee. It follows Boone's Creek and other tributaries of Boone Lake. It is a part of the Tri-Cities area.

History
The community was the first permanent European settlement in Tennessee. It was named for the creek that runs through it. The creek is named for pioneer Daniel Boone.

In the center of Boones Creek is a historic marker that tells the origins of the community's name.  Daniel Boone was a frontiersman and hunted over large areas of the early frontier lands.  On one of these hunting trips, he was chased by the local Indians.  He hid under a waterfall on Boones Creek, n

 31%|████████████████████████▏                                                     | 31/100 [00:25<00:51,  1.34it/s]

New data point: {'chunk_id': 270520, 'question': 'Who was the community of Boones Creek named after?', 'answer': 'Daniel Boone'}
--- New ELEMENT ---
Fetching a question for:
Carlos José Castilho (27 November 1927 – 2 February 1987) was a former Brazilian football player. He has played for Brazil national team.

References

1927 births
1987 deaths
Footballers from Rio de Janeiro
Brazilian association football goalkeepers
LLM answered:


 32%|████████████████████████▉                                                     | 32/100 [00:25<00:49,  1.38it/s]

New data point: {'chunk_id': 82644, 'question': 'Who was the Brazilian football player who played for the Brazil national team?', 'answer': 'Carlos José Castilho'}
--- New ELEMENT ---
Fetching a question for:
Saint-Aubin-de-Branne is a commune. It is found in the Aquitaine region in the Gironde department in the southwest of France.

Communes in Gironde
LLM answered:


 33%|█████████████████████████▋                                                    | 33/100 [00:26<00:48,  1.38it/s]

New data point: {'chunk_id': 75796, 'question': 'What is the region where Saint-Aubin-de-Branne is found?', 'answer': 'Aquitaine'}
--- New ELEMENT ---
Fetching a question for:
Simbox distributions 
Iliad introduced the Simboxes in Italy, a new type of sales point, conceived and created by the French group, Aures and already in use since 2014 on the French market by Free Mobile; these are SIM card vending machines, which allow customers to register and purchase it independently.

Network 
Iliad signed an agreement, respectively, with Cellnex (February 2018) and INWIT (February 2019) in order to install its antennas on their towers.

Iliad Italia utilises, inter alia, CommScope telecommunications equipment and collaborates with Cisco Systems (April 2019) and Nokia (September 2019) to implement a state-of-the-art national network (IPv6) in Italy based on segment routing (SRv6) and to achieve its 5G network.

References

Other websites
  

2010s establishments in Italy
2016 establishments 

 34%|██████████████████████████▌                                                   | 34/100 [00:27<00:45,  1.45it/s]

New data point: {'chunk_id': 210003, 'question': 'Who introduced Simboxes in Italy?', 'answer': 'Iliad'}
--- New ELEMENT ---
Fetching a question for:
Hurricane Nate was an Atlantic hurricane in the 2005 Atlantic hurricane season. It came close to Bermuda, but did not land there. Nate was the 14th named storm and 7th hurricane of the 2005 season. It started southwest of Bermuda on September 5. It then moved northeast very slowly. It passed south of Bermuda and moved into colder waters. Then, it began to weaken. Nate later became part of a larger weather system.

The storm caused no damage because it stayed at sea. However, one person was killed from rip currents that were caused by the storm. Nate caused light rainfall and gusty winds on Bermuda. Canadian Navy ships that were carrying supplies to help people after Hurricane Katrina were slowed down because of Nate.

Hurricanes in Bermuda
2005 Atlantic hurricane season
2005 in Bermuda
2000s in New Jersey
2005 in the United States
LLM ans

 35%|███████████████████████████▎                                                  | 35/100 [00:27<00:49,  1.31it/s]

New data point: {'chunk_id': 78342, 'question': 'When did Hurricane Nate start southwest of Bermuda?', 'answer': 'September 5'}
--- New ELEMENT ---
Fetching a question for:
The Colisée Pepsi (English: Pepsi Coliseum), formerly the Colisée de Québec (English: Quebec Coliseum) was a multi-purpose arena in Quebec City, Quebec. The arena opened on December 8, 1949, and was formerly home to the Quebec Remparts of the Quebec Major Junior Hockey League (QMJHL). It was also the home of the Quebec Nordiques of the World Hockey Association (WHA) and National Hockey League (NHL) from 1972 until 1995 when they relocated to Denver, Colorado to become the Avalanche.

The 1971 Memorial Cup championship series was hosted at the arena and saw the Remparts defeat the Edmonton Oil Kings two games to none. The first game of the 1974 Summit Series between the Canadian WHA all-stars and the Soviet national team was held at the Coliseum, as well as one game in each of the 1976 and 1991 Canada Cups.

Referenc

 36%|████████████████████████████                                                  | 36/100 [00:28<00:46,  1.36it/s]

New data point: {'chunk_id': 158310, 'question': 'What city was the Colisée Pepsi located in?', 'answer': 'Quebec City'}
--- New ELEMENT ---
Fetching a question for:
A visitor center (also called a visitor centre or tourist information center) is a place where people can find information about the place where the visitor center is located. It is generally used by people who are on vacation or holiday so they can learn about things to do and see.

Buildings and structures
Tourism
LLM answered:


 37%|████████████████████████████▊                                                 | 37/100 [00:29<00:55,  1.14it/s]

New data point: {'chunk_id': 181039, 'question': 'What is a typical purpose of a visitor center?', 'answer': 'Visitor centers are used by people who are on vacation or holiday so they can learn about things to do and see.'}
--- New ELEMENT ---
Fetching a question for:
Peyrieu is a commune. It is found in the region Auvergne-Rhône-Alpes in the Ain department  in the east of France.
LLM answered:


 38%|█████████████████████████████▋                                                | 38/100 [00:30<00:53,  1.17it/s]

New data point: {'chunk_id': 70164, 'question': 'What region is Peyrieu found in?', 'answer': 'Auvergne-Rhône-Alpes'}
--- New ELEMENT ---
Fetching a question for:
Kevin Brady Dillon (born August 19, 1965) is an American actor. He played Johnny "Drama" Chase on the HBO comedy series Entourage and Bunny in the war movie Platoon.

References

1965 births
Living people
American movie actors
American television actors
American stage actors
Actors from New York
LLM answered:


 39%|██████████████████████████████▍                                               | 39/100 [00:31<00:46,  1.30it/s]

New data point: {'chunk_id': 242197, 'question': 'When was Kevin Brady Dillon born?', 'answer': 'August 19, 1965'}
--- New ELEMENT ---
Fetching a question for:
Soil structure (clumps)
The smallest parts of soil are sand and silt and clay. Those small parts join to make larger parts we call "clumps" or "aggregates". The clumps are made when sand and silt and clay stick together. The humus and clay and  minerals in the soil are like glue. The glue sticks the sand and silt and clay together and makes clumps. The clumps make shapes by themselves. Some soils have small round clumps. Other soils have large, hard and flat clumps. The soil with small round clumps is best because it lets in air and water. A little glue is best. If the soil has only a little glue there will be space for water and air and the soil will be soft. If the soil has too much glue the soil will be hard. If the soil has no glue, there will be no space in the soil for air and water. A soil with no spaces is not healthy. W

 40%|███████████████████████████████▏                                              | 40/100 [00:31<00:45,  1.31it/s]

New data point: {'chunk_id': 6436, 'question': '"What are the smallest parts of soil?"', 'answer': '"sand and silt and clay"'}
--- New ELEMENT ---
Fetching a question for:
Statenville is a census-designated place (CDP) in the U.S. state of Georgia. It is the county seat of Echols County.

Census-designated places in Georgia (U.S. state)
County seats in Georgia
LLM answered:


 41%|███████████████████████████████▉                                              | 41/100 [00:32<00:41,  1.41it/s]

New data point: {'chunk_id': 227974, 'question': 'What is the county seat of Echols County?', 'answer': 'Statenville'}
--- New ELEMENT ---
Fetching a question for:
The Austin serial bombings were a series of five parcel bomb explosions which occurred from March 2 - 20, 2018 in Austin, Texas, United States. They killed two civilians and the bomber, as well as injuring another six people.

The suspected bomber was Mark Anthony Conditt, age 23, who lived in Pflugerville, Texas, outside Austin.

Background
The Austin Police Department (APD) believe the explosions were connected and considered the possibility that they are racially motivated. They have also warned civilians to not open suspicious packages, and to call the police.

Austin police officially connected the March 2 bombing following the bombings on March 12. None of the packages were mailed, instead they were placed near the individuals' homes. Two of the bombs were triggered upon being picked up, another was triggered upon bein

 42%|████████████████████████████████▊                                             | 42/100 [00:33<00:40,  1.42it/s]

New data point: {'chunk_id': 190011, 'question': 'Who was the first victim of the Austin serial bombings?', 'answer': 'Anthony Stephan House'}
--- New ELEMENT ---
Fetching a question for:
Tendring is a village and civil parish in Tendring district, Essex, England. In 2001 there were 679 people living in Tendring.

References 

Villages in Essex
Civil parishes in Essex
LLM answered:


 43%|█████████████████████████████████▌                                            | 43/100 [00:33<00:39,  1.45it/s]

New data point: {'chunk_id': 126908, 'question': 'What is the population of Tendring?', 'answer': '679'}
--- New ELEMENT ---
Fetching a question for:
A mid-engine layout describes the location of an automobile engine between the front and rear axles.  A physics term, moment of inertia, shows how hard it is to turn a moving object. In a front engine front-wheel drive car, the drive wheels also have to steer the car, causing torque steer (pull to one side during Acceleration). Front-wheel drive can cause the vehicle to oversteer (turn more sharply than the driver expects) in a corner. A front engine rear-wheel drive can have good weight distribution (balance front to rear), but has a higher moment of inertia than mid engine layout. The mid-engine layout has none of these disadvantages. It has better weight distribution and a lower moment of inertia. Its main disadvantage is that the engine, mounted in the middle, leaves much less room for passengers and cargo. However, in racing there is

 44%|██████████████████████████████████▎                                           | 44/100 [00:34<00:43,  1.27it/s]

New data point: {'chunk_id': 173874, 'question': 'What is the main disadvantage of a mid-engine layout in a car?', 'answer': 'The engine, mounted in the middle, leaves much less room for passengers and cargo.'}
--- New ELEMENT ---
Fetching a question for:
is an old province of Japan in the area of Miyagi Prefecture and Iwate Prefecture on the island of Honshū.  Along with Rikuchū and Mutsu Provinces, it was sometimes called .  The history of the province started in 1868 and ended in 1872.

History

in 1868, Rikuaen was separated from Mutsu.

In the Meiji period, the provinces of Japan were converted into prefectures.  Maps of Japan and Rikuzen Province were reformed in the 1870s.

Related pages
 Provinces of Japan
 Prefectures of Japan
 List of regions of Japan
 List of islands of Japan
 Sanriku

References

Other websites 

  Murdoch's map of provinces, 1903

Former provinces of Japan
Iwate Prefecture
Miyagi Prefecture
LLM answered:


 45%|███████████████████████████████████                                           | 45/100 [00:35<00:45,  1.21it/s]

New data point: {'chunk_id': 118948, 'question': 'What is the geographic area where Rikuzen Province was located?', 'answer': 'Iwate Prefecture and Miyagi Prefecture'}
--- New ELEMENT ---
Fetching a question for:
The Armed Forces DNA Identification Laboratory (AFDIL) is a laboratory in America that studies DNA. It is run by the United States Armed Forces.

AFDIL usually uses fingerprints from ID card records.

Forensics
Military of the United States
Research organizations in the United States
LLM answered:


 46%|███████████████████████████████████▉                                          | 46/100 [00:36<00:43,  1.25it/s]

New data point: {'chunk_id': 104448, 'question': 'What is the name of the laboratory in America that studies DNA?', 'answer': 'AFDIL'}
--- New ELEMENT ---
Fetching a question for:
Birthday (Zoroastrianism)  March 26 - Day of Democracy (Mali)  March 28 - Serfs Emancipation Day (Tibet)  March 28 - Teachers' Day (Czech Republic and Slovakia)  March 29 - Boganda Day (Central African Republic)  March 29 - Youth Day (Republic of China)  March 31 - Cesar Chavez Day (United States)  March 31 - Freedom Day (Malta)
LLM answered:


 47%|████████████████████████████████████▋                                         | 47/100 [00:37<00:37,  1.41it/s]

New data point: {'chunk_id': 796, 'question': 'What is the day of democracy?', 'answer': 'March 26'}
--- New ELEMENT ---
Fetching a question for:
Carroll County is a county in the U.S. state of Ohio. In 2010, 28,836 people lived there. The county seat is Carrollton.

1830s establishments in Ohio
1833 establishments in the United States
Ohio counties
LLM answered:


 48%|█████████████████████████████████████▍                                        | 48/100 [00:37<00:35,  1.48it/s]

New data point: {'chunk_id': 223698, 'question': 'Where is the county seat of Carroll County in Ohio?', 'answer': 'Carrollton'}
--- New ELEMENT ---
Fetching a question for:
In 1977, the federal government placed alligators on the endangered species list. They were removed from the endangered list in 1987 and Florida allowed selective hunting in 1988.

Bird and turtle habitats
In 1987, Brevard County, Florida hosted the last member of the Dusky Seaside Sparrow, now extinct. There have been only two extinct bird species since listing of endangered species began in 1973. This event has presented a challenge to ensure that other environmental concerns are addressed quickly.

The Florida Scrub Jay has been thought to be threatened for many years, because the species is territorial and cannot move to better grounds when its habitat is in danger.

Nesting beaches of loggerhead sea turtles are protected.

References

Other websites
The Living Marine Resources of the Western Central Atlantic, F

 49%|██████████████████████████████████████▏                                       | 49/100 [00:38<00:35,  1.44it/s]

New data point: {'chunk_id': 108872, 'question': '"In what year were alligators removed from the endangered species list?"', 'answer': 'In 1987'}
--- New ELEMENT ---
Fetching a question for:
 Authumes (71013)
 Bantanges (71018)
 Beaurepaire-en-Bresse (71027)
 Beauvernois (71028)
 Bellevesvre (71029)
 Bosjean (71044)
 Bouhans (71045)
 Branges (71056)
 Brienne (71061)
 Bruailles (71064)
 Champagnat (71079)
 Charette-Varennes (71101)
 Condal (71143)
 Cuiseaux (71157)
 Cuisery (71158)
 Dampierre-en-Bresse (71168)
 Devrouze (71173)
 Diconne (71175)
 Dommartin-lès-Cuiseaux (71177)
 Flacey-en-Bresse (71198)
 Frangy-en-Bresse (71205)
 Fretterans (71207)
 Frontenard (71208)
 Frontenaud (71209)
 Huilly-sur-Seille (71234)
 Joudes (71243)
 Jouvençon (71244)
 Juif (71246)
 L'Abergement-de-Cuisery (71001)
 La Chapelle-Naude (71092)
 La Chapelle-Saint-Sauveur (71093)
 La Chapelle-Thècle (71097)
 La Chaux (71121)
 La Frette (71206)
 La Genête (71213)
 La Racineuse (71364)
 Lays-sur-le-Doubs (71254)
 L

 50%|███████████████████████████████████████                                       | 50/100 [00:39<00:38,  1.31it/s]

New data point: {'chunk_id': 161881, 'question': 'What is the name of the department for which Louhans is a commune?', 'answer': 'Louhans'}
--- New ELEMENT ---
Fetching a question for:
Hank Stuever is an American journalist working for the Washington Post. He writes for its Style section about entertainment.

Other websites
Hank Stuever on Twitter.

1968 births
Living people
LLM answered:


 51%|███████████████████████████████████████▊                                      | 51/100 [00:39<00:35,  1.39it/s]

New data point: {'chunk_id': 207310, 'question': 'What year was Hank Stuever born?', 'answer': '1968'}
--- New ELEMENT ---
Fetching a question for:
A number and its opposite always add to zero. So the sum of −3 and +3 is 0. We can write this either as −3 + 3 = 0 or as 3 + (− 3) = 0. In addition, a number and its opposite are said to "cancel each other out".

The set of negative real numbers is sometimes written as .

Arithmetic with negative numbers 
 Adding a negative number to something is the same as subtracting a positive number from it. For example, to add the negative number "−1" to the number "9" is the same as subtracting one from nine. In symbols:
 9 + (−1) = 9 − 1 = 8
 Subtracting a negative number from something is the same as adding a positive number to it. For example, to subtract the negative number "−8" from the number "6" is the same as adding the number "6" and the number "8". In symbols:
 6 − (−8) = 6 + 8 = 14
 A negative number multiplied by another negative number p

 52%|████████████████████████████████████████▌                                     | 52/100 [00:41<00:45,  1.06it/s]

New data point: {'chunk_id': 19299, 'question': 'Here are the question/answer pairs:\n\nWhat happens when you add a negative number to something?', 'answer': 'Adding a negative number to something is the same as subtracting a positive number from it.\n\nWhat is the result of subtracting a negative number from'}
--- New ELEMENT ---
Fetching a question for:
Sheila Rena Ingram (March 23, 1957 – September 1, 2020) was an American athlete. She competed mainly in the 400 metres. She was born in Washington, D.C.. She competed for United States in the 1976 Summer Olympics, winning a silver medal.

Ingram died on September 1, 2020 at the age of 63.

References

1957 births
2020 deaths
American track and field athletes
American Olympic silver medalists
Sportspeople from Washington, D.C.
LLM answered:


 53%|█████████████████████████████████████████▎                                    | 53/100 [00:42<00:40,  1.16it/s]

New data point: {'chunk_id': 234549, 'question': 'Where was Sheila Rena Ingram born?', 'answer': 'Washington, D.C.'}
--- New ELEMENT ---
Fetching a question for:
The Asian Handball Nations Championship is the official competition for senior national handball teams of Asia (since 2018, also includes teams from Oceania), and takes place every two years. In addition to crowning the Asian champions, the tournament also serves as a qualifying tournament for the World Championship.

Summary

Medal table

Related pages
Asian Women's Handball Championship
Asian Men's Junior Handball Championship
Asian Men's Youth Handball Championship

Other websites
Asian Handball Federation
Archives at Todor66.com

 
Handball
LLM answered:


 54%|██████████████████████████████████████████                                    | 54/100 [00:42<00:37,  1.22it/s]

New data point: {'chunk_id': 233212, 'question': 'What is the official competition for senior national handball teams of Asia?', 'answer': 'The Asian Handball Nations Championship'}
--- New ELEMENT ---
Fetching a question for:
Factors affecting price elasticity of demand  Factors that affect the price elasticity of demand for a good include:  Availability of substitutes: When substitutes of a good is more available, consumers would be able to choose between the different types of goods available. Hence, a good with more substitutes available would be more price elastic as an increase in price would cause many consumers to change to one of the substitutes.  Closeness of substitutes: If the substitutes of a good is very similar to the original good, consumers are able to change between the different types of goods easily as it makes no difference in using either good. For example, the demand for a specific brand of sugar is likely to be price elastic as the taste of sugar is largely the 

 55%|██████████████████████████████████████████▉                                   | 55/100 [00:44<00:42,  1.06it/s]

New data point: {'chunk_id': 166234, 'question': '"What affects the price elasticity of demand?"', 'answer': 'Availability of substitutes, closeness of substitutes, degree of necessity, and proportion of income.'}
--- New ELEMENT ---
Fetching a question for:
The Citroën Visa is a car produced by Citroën. Built in Rennes between 1978 and 1988 when it was replaced by the AX. Some were used in France by National Police.

Visa
1970s automobiles
1980s automobiles
LLM answered:


 56%|███████████████████████████████████████████▋                                  | 56/100 [00:44<00:39,  1.11it/s]

New data point: {'chunk_id': 139354, 'question': 'What was the production period of the Citroën Visa?', 'answer': '1978-1988'}
--- New ELEMENT ---
Fetching a question for:
Rohrbachgraben is a municipality in the administrative district of Oberaargau in the canton of Berne in Switzerland.

Villages
Glasbach, Liemberg, Ganzenberg, Flückigen, Kaltenegg, Matten, Wald and Wil.

References

Other websites

 Official website of Rohrbachgraben 

Municipalities of Bern
LLM answered:


 57%|████████████████████████████████████████████▍                                 | 57/100 [00:45<00:35,  1.22it/s]

New data point: {'chunk_id': 175075, 'question': 'Where is Rohrbachgraben located?', 'answer': 'Switzerland'}
--- New ELEMENT ---
Fetching a question for:
The large-billed crow (Corvus macrorhynchos) is a type of crow found in East Asia. Sometimes people think that this bird is a raven. This crow is of least concern so it is safe. It is also called the jungle crow. The bird was called a 'gorilla crow' in Japan, this is because people said it looked like one when they were looking at it.

References

Corvids
Birds of Asia
LLM answered:


 58%|█████████████████████████████████████████████▏                                | 58/100 [00:46<00:38,  1.10it/s]

New data point: {'chunk_id': 213150, 'question': 'What is the other name for the large-billed crow?', 'answer': 'Jungle crow'}
--- New ELEMENT ---
Fetching a question for:
A climbing route is a trail that a mountain climber uses to go up a mountain.
It is often simply called a climb.

Examples : D1 on The Diamond and The Nose on El Capitan are climbing routes.

The difficulties and lengths of climbing routes (as well as descriptions and photographies) can be found in books called climbing guides. For instance, :en:Fifty Classic Climbs of North America is a climbing guide that describes climbing routes of North America.

Mountains
Rock climbing
Trails
LLM answered:


 59%|██████████████████████████████████████████████                                | 59/100 [00:47<00:36,  1.13it/s]

New data point: {'chunk_id': 118200, 'question': 'What is a climbing route used for?', 'answer': 'A climbing route is used for a mountain climber to go up a mountain.'}
--- New ELEMENT ---
Fetching a question for:
East Kilbride is a large suburban town in the South Lanarkshire council area of Scotland. It is Scotland's first new town, and lies on high ground on the south side of the Cathkin Braes, about  southeast of Glasgow city centre. The Rotten Calder river flows along the east side of the settlement, northwards toward the River Clyde. The town is also known as the Polo Mint city due to its many roundabouts.

Landmarks

Dollan Baths 
One of the most significant buildings of an earlier phase of development was Dollan Baths leisure complex (opened 1968) which has Grade A listed status.

Twin towns 
  Ballerup, Denmark

Notable people 
 William and John Hunter, medical pioneers, were born at Long Calderwood within the present-day area of East Kilbride.
 Lorraine Kelly, television pres

 60%|██████████████████████████████████████████████▊                               | 60/100 [00:48<00:33,  1.20it/s]

New data point: {'chunk_id': 66827, 'question': 'What is the first new town in Scotland?', 'answer': 'East Kilbride'}
--- New ELEMENT ---
Fetching a question for:
The Church was treated especially badly in lands that Germany took over.  In Austria, Catholic property was taken, Catholic organizations were closed, and many priests were sent to Dachau.  In Czechoslovakia, the Nazis refused to let people follow religious orders; closed schools; made religious teachings illegal; and sent priests to concentration camps.  Catholic people, bishops, clergy, and nuns protested and attacked Nazi policies.

In 1942, Dutch bishops protested the mistreatment of the Jews. When the Dutch Archbishop refused to obey the Nazis, the Gestapo rounded up Catholic "Jews" and sent 92 to Auschwitz  A Dutch Catholic nun named Edith Stein was murdered at Auschwitz.  So was Maximilian Kolbe, a Polish priest.  Both were eventually made into Catholic saints by Pope John Paul II in the 1980s.  Other Catholic victims 

 61%|███████████████████████████████████████████████▌                              | 61/100 [00:48<00:32,  1.21it/s]

New data point: {'chunk_id': 171436, 'question': 'What did the Nazis refuse to let people follow in Czechoslovakia?', 'answer': 'Catholic religious orders'}
--- New ELEMENT ---
Fetching a question for:
State Grid Corporation of China (SGCC; ) is a company owned by the government of the People's Republic of China which provides electricity to Northern China, Northeastern China, Eastern China, Central China and Northwestern China.

Website
 State Grid Corporation of China

China
Electric power companies
LLM answered:


 62%|████████████████████████████████████████████████▎                             | 62/100 [00:49<00:30,  1.26it/s]

New data point: {'chunk_id': 148078, 'question': 'Who is the owner of State Grid Corporation of China?', 'answer': "Government of the People's Republic of China"}
--- New ELEMENT ---
Fetching a question for:
Saint-Vaast-sur-Seulles is a commune. It is found in the region Basse-Normandie in the Calvados department in the northwest of France.

Communes in Calvados
LLM answered:


 63%|█████████████████████████████████████████████████▏                            | 63/100 [00:50<00:29,  1.25it/s]

New data point: {'chunk_id': 77643, 'question': 'What region is Saint-Vaast-sur-Seulles found in?', 'answer': 'Basse-Normandie'}
--- New ELEMENT ---
Fetching a question for:
Högsby Municipality () is a municipality in Kalmar County in southern Sweden. The seat is in Högsby.

References

Other websites
 Högsby Municipality

Hogsby Municipality
LLM answered:


 64%|█████████████████████████████████████████████████▉                            | 64/100 [00:51<00:26,  1.34it/s]

New data point: {'chunk_id': 143940, 'question': 'Where is the seat of Högsby Municipality?', 'answer': 'Högsby'}
--- New ELEMENT ---
Fetching a question for:
Because composers were now writing opera it was important for the audience to hear the words clearly. In the Renaissance the groups of a choir were often singing several different words using different melodies all at once. This was called “polyphony”. Polyphony was widely used in instrumental music, but was not used in opera, which needed to tell a story without being confusing.

When a soloist in an opera sings a song (an aria) the aria is in a particular mood. They called this “affection”. There were several “affections” or moods: there were arias about revenge, jealousy, anger, love, despair, peaceful happiness etc. Each movement in a concerto also had one particular mood. Music from later periods is different. For example: Haydn in the Classical Period would often change its mood during a piece.

Suite
The Baroque suite is a

 65%|██████████████████████████████████████████████████▋                           | 65/100 [00:51<00:25,  1.37it/s]

LLM Failed! returning None
New data point: {'chunk_id': 24270, 'question': None, 'answer': None}
--- New ELEMENT ---
Fetching a question for:
Aftermath
Chapman did not try to get away, and was reading a book, The Catcher in the Rye, when police came to the scene. They arrested Chapman, who later pled guilty to Lennon's murder, telling the court God had told him to do so. He was sentenced to twenty-five years to life in prison in 1981. He later wrote to Yoko Ono, trying to apologize to her and explain his actions, but she did not answer. In 1985, actor Mark Lindsay Chapman lost the chance to portray Lennon in a movie about his life with Ono, because his name was similar to that of Lennon's killer.

When Chapman became eligible for parole, he applied but was turned down. Ono gave a statement to the court, telling them that Lennon's death still hurt her and their son Sean Lennon every day, and they still felt his loss. She reminded them of how Chapman's act was only one among many celebri

 66%|███████████████████████████████████████████████████▍                          | 66/100 [00:52<00:25,  1.34it/s]

New data point: {'chunk_id': 39159, 'question': 'What book was John Lennon reading when police arrived?', 'answer': 'The Catcher in the Rye'}
--- New ELEMENT ---
Fetching a question for:
Igael Tumarkin (Hebrew: יגאל תומרקין; 23 October 1933 – 12 August 2021) was an Israeli painter and sculptor.

Tumarkin's best known works are the Holocaust and Revival memorial in Rabin Square, Tel Aviv and his sculptures honoring dead soldiers in the Negev.

References

1933 births
2021 deaths
German sculptors
German painters
Naturalized citizens of Israel
Israeli sculptors
Israeli painters
People from Dresden
LLM answered:


 67%|████████████████████████████████████████████████████▎                         | 67/100 [00:53<00:25,  1.32it/s]

New data point: {'chunk_id': 258347, 'question': 'Who was Igael Tumarkin?', 'answer': 'Igael Tumarkin'}
--- New ELEMENT ---
Fetching a question for:
(4) There are a large number of synonyms of the herb names, in both Chinese, English and Latin. This makes it even more essential to use the written form of the Chinese names to ensure accuracy.

(5) Some herbs are prone to adulteration, substitution, or both. For example, Ling Zhi Cao (Cordyceps) can be substituted with an inactive form of the herb. If an extract is first made with water, the herb will be tasteless when dried and re-used/recycled. On the other hand, the genuine Ling Zhi tastes bitter, but few consumers would know the difference, especially if the herb is mixed in a prescription and boiled together with other herbs. The pharmacists themselves may not be aware, as some may trust their suppliers implicitly, and do not conduct regular testing.

(6) Animal-based "herbs" are prone to substitution or adulteration, including Xion

 68%|█████████████████████████████████████████████████████                         | 68/100 [00:54<00:28,  1.11it/s]

New data point: {'chunk_id': 70715, 'question': 'What is an example of an herb prone to adulteration?', 'answer': 'Ling Zhi Cao (Cordyceps).'}
--- New ELEMENT ---
Fetching a question for:
Deer Creek is a town of Oklahoma in the United States.

Towns in Oklahoma
LLM answered:


 69%|█████████████████████████████████████████████████████▊                        | 69/100 [00:55<00:24,  1.28it/s]

New data point: {'chunk_id': 207841, 'question': 'Where is Deer Creek located?', 'answer': 'Oklahoma'}
--- New ELEMENT ---
Fetching a question for:
The Wolf of Wall Street is an 2013 American black comedy movie directed by Martin Scorsese and produced by Leonardo DiCaprio. The movie is about a stockbroker who works on Wall Street in New York City during the 1990s. He runs a company that commits stock fraud and insider trading. The story is based on the memoirs of Jordan Belfort.

DiCaprio plays Belfort (the stockbroker), Matthew McConaughey plays Mark Hanna, Jonah Hill plays Donnie Azoff, Rob Reiner plays Max Belfort and Kyle Chandler plays Denham. P. J. Byrne, Jon Favreau, Joanna Lumley, Margot Robbie, Spike Jonze, and Edward Herrmann also appear in the movie. The movie is 179 minutes long. Some sex scenes were cut away to prevent MPAA from giving NC-17. It was released on December 25, 2013. It was nominated for five Academy Awards, including Best Picture.

Release dates

Other websit

 70%|██████████████████████████████████████████████████████▌                       | 70/100 [00:56<00:31,  1.04s/it]

New data point: {'chunk_id': 148860, 'question': 'What year was the movie "The Wolf of Wall Street" released?', 'answer': '2013'}
--- New ELEMENT ---
Fetching a question for:
Ugni is a genus of plants of the myrtle family (Myrtaceae). Four species belong to this genus, all from western America.

Description

They are small evergreen shrubs. The leaves are simple, entire, opposite, elliptical; they are 1–2 cm long and 0.2-2.5 cm broad, dark green, and with a spicy scent if broken into many small pieces.

The solitary flowers are usually hanging; they are 1–2 cm diameter with four or five white or pale pink petals and many short stamens. The fruit is a small red or purple berry, 1 cm in diameter, with many seeds.

Name
The name comes from Uñi with which the Mapuches (native people from south-central Chile and southwestern Argentina) name the fruits of the best known species of the genus, Ugni molinae.

The genus was formerly often included in either Myrtus or Eugenia; it is distinguished

 71%|███████████████████████████████████████████████████████▍                      | 71/100 [00:57<00:29,  1.02s/it]

New data point: {'chunk_id': 143487, 'question': 'Here is the first question/answer pair:\n\nWhat is the family of plants that Ugni belongs to?', 'answer': 'Myrtaceae'}
--- New ELEMENT ---
Fetching a question for:
Miaphysitism (or henophysitism) is an idea about the nature of Christ. The idea says that Jesus Christ had two different aspects, one godly, and one human. It says these two aspects are united in one nature. They are indistinguishable, and they co-exist.  This is very close to the idea of dualism which says that the mind and the body are separate things that combine to make one unit, the person. In the case of miaphysitism, the separate things are Jesus' divine and human traits.  

Miaphysitism has often been considered by Chalcedonian Christians to be a form of monophysitism, but the Oriental Orthodox Churches themselves reject this characterization, Recently, the Eastern Orthodox and Roman Catholic Churches have begun to take this position more seriously.

References

Chris

 72%|████████████████████████████████████████████████████████▏                     | 72/100 [00:59<00:32,  1.18s/it]

New data point: {'chunk_id': 54805, 'question': 'What is miaphysitism?', 'answer': 'According to the text, miaphysitism is an idea about the nature of Christ that says Jesus Christ had two different aspects, one godly, and one human, united in one nature.'}
--- New ELEMENT ---
Fetching a question for:
Gabriella Tucci (4 August 1929 – 11 July 2020) was an Italian operatic soprano. She was born in Rome. She made her debut at La Scala in Milan in 1959, as Mimi in La bohème.

Tucci performed as Donna Elvira in Don Giovanni, Elvira in I puritani, Gilda in Rigoletto, Violetta in La traviata,  and Marguerite in Faust, as well as Maddalena in Andrea Chénier and the title role in Tosca.

Tucci died on 11 July 2020 in Rome, aged 90.

References

Other websites
 GabriellaTucci.it

1929 births
2020 deaths
Italian opera singers
Italian stage actors
Italian singers
Actors from Rome
LLM answered:


 73%|████████████████████████████████████████████████████████▉                     | 73/100 [00:59<00:26,  1.00it/s]

New data point: {'chunk_id': 229394, 'question': 'Where was Gabriella Tucci born?', 'answer': 'Rome'}
--- New ELEMENT ---
Fetching a question for:
Covington County is the name of two counties in the United States:
 Covington County, Alabama
 Covington County, Mississippi
LLM answered:


 74%|█████████████████████████████████████████████████████████▋                    | 74/100 [01:00<00:21,  1.19it/s]

New data point: {'chunk_id': 208163, 'question': 'Where is Covington County located?', 'answer': 'Alabama and Mississippi'}
--- New ELEMENT ---
Fetching a question for:
Kama Sutra: A Tale of Love is a 1996 erotic romance movie. While this movie was being shot in India, the name of the movie was kept secret because the Indian government would not have allowed filming with the name "Kama Sutra". It was instead named "Maya & Tara". When the movie was finished, it was banned in India due to the erotic scenes, especially homosexual scenes.

In the United States, this movie was originally rated NC-17. It was later trimmed and re-rated R.

Other websites 

1996 movies
1990s LGBT movies
1990s romantic drama movies
English-language movies
Indian movies
LLM answered:


 75%|██████████████████████████████████████████████████████████▌                   | 75/100 [01:01<00:21,  1.19it/s]

New data point: {'chunk_id': 143372, 'question': 'What was the original rating of Kama Sutra: A Tale of Love in the United States?', 'answer': 'NC-17'}
--- New ELEMENT ---
Fetching a question for:
Margival is a commune. It is found in the region Picardie in the Aisne department in the north of France.

Communes in Aisne
LLM answered:


 76%|███████████████████████████████████████████████████████████▎                  | 76/100 [01:02<00:20,  1.17it/s]

New data point: {'chunk_id': 72234, 'question': 'What is the region where Margival is located?', 'answer': 'Margival is located in the region Picardie.'}
--- New ELEMENT ---
Fetching a question for:
Fünfstetten is a municipality in the district of Donau-Ries in Bavaria in Germany. The mayor is currently Christa Lechner of the CSU party.

References 

Donau-Ries
LLM answered:


 77%|████████████████████████████████████████████████████████████                  | 77/100 [01:02<00:20,  1.14it/s]

New data point: {'chunk_id': 182650, 'question': 'What is the current mayor of Fünfstetten?', 'answer': 'Christa Lechner'}
--- New ELEMENT ---
Fetching a question for:
Nabil Elaraby (Arabic: نبيل العربي; born 15 March 1935) is an Egyptian diplomat. He was Secretary-General of the Arab League from 1 July 2011 to 3 July 2016. He was Foreign Affairs Minister of Egypt in Essam Sharaf's government from March to June 2011.

Since December 2008 he is the Director of the Regional Cairo Centre for International Commercial Arbitration.

References

1935 births
Living people
Ambassadors of Egypt
Egyptian politicians
People from Cairo
Secretaries General of the Arab League
LLM answered:


 78%|████████████████████████████████████████████████████████████▊                 | 78/100 [01:04<00:21,  1.05it/s]

New data point: {'chunk_id': 195419, 'question': 'Who was the Secretary-General of the Arab League from 1 July 2011 to 3 July 2016?', 'answer': 'Nabil Elaraby'}
--- New ELEMENT ---
Fetching a question for:
Jeotgalicoccus psychrophilus is a gram-positive bacterium.
The cells are coccoid. It is psychrophilic. It growth between 4 and 34 °C.

It belongs to the family Staphylococcaceae.

References

Gram-positive bacteria
LLM answered:


 79%|█████████████████████████████████████████████████████████████▌                | 79/100 [01:05<00:19,  1.05it/s]

New data point: {'chunk_id': 163544, 'question': 'What is the type of bacterium Jeotgalicoccus psychrophilus?', 'answer': 'Gram-positive bacterium'}
--- New ELEMENT ---
Fetching a question for:
Brave is a free web browser made by Brave Software, Inc. based on the Chromium web browser, which is the browser Google Chrome is also based on. The browser web blocks ads and website trackers. Brave has a Get paid to surf business model, where users get small amounts of a cryptocurrency called the Basic Attention Token for seeing ads inside the browser, which they can give to their favorite authors.

As of 2019, Brave has been created for Windows, macOS, Linux, Android, and iOS. It includes 5 search engines, including duckduckgo as the main.

References 

Web browsers
LLM answered:


 80%|██████████████████████████████████████████████████████████████▍               | 80/100 [01:05<00:17,  1.14it/s]

New data point: {'chunk_id': 210775, 'question': 'What platforms does Brave browser support?', 'answer': 'Android, iOS, Linux, macOS, Windows'}
--- New ELEMENT ---
Fetching a question for:
There are 816 communes in the Aisne département in France.

Aisne
LLM answered:


 81%|███████████████████████████████████████████████████████████████▏              | 81/100 [01:06<00:15,  1.20it/s]

New data point: {'chunk_id': 71779, 'question': 'How many communes are in the Aisne département in France?', 'answer': '816'}
--- New ELEMENT ---
Fetching a question for:
Brian Roy "Spinner" Spencer (September 3, 1949 – June 3, 1988) was a Canadian professional ice hockey left winger. He played a career total of 10 seasons in the National Hockey League (NHL). He played for the Toronto Maple Leafs, New York Islanders, Buffalo Sabres and Pittsburgh Penguins.

Career
Spencer was drafted 55th overall by the Toronto Maple Leafs in the 1969 NHL Amateur Draft. On December 12, 1970, He was called up to play for the Leafs in what would be his first NHL game on television. Spencer phoned his father Roy in British Columbia to tell him to watch the game that night on Hockey Night in Canada. There was an interview with Spencer between periods of the game. However, in British Columbia, instead of the Toronto Maple Leafs/Chicago Black Hawks game being shown, CBC Television aired a game featuring the 

 82%|███████████████████████████████████████████████████████████████▉              | 82/100 [01:07<00:14,  1.22it/s]

New data point: {'chunk_id': 158877, 'question': 'What team was Spencer drafted by in the 1969 NHL Amateur Draft?', 'answer': 'Toronto Maple Leafs'}
--- New ELEMENT ---
Fetching a question for:
Osteoporosis is the weakening of bones in the body. It is caused by lack of calcium deposited in the bones. This lack of calcium causes the bones to become brittle. They break easily. Side effects include limping. Some symptoms late in the disease include pain in the bones, bones breaking very easily and lower back pain due to spinal bone fractures. It is more likely for a woman to get osteoporosis than a man. Elderly people are more likely to develop osteoporosis than younger people.  The amount of calcium in the bones decreases as a person gets older. There are three kinds of osteoporosis.

There is no cure for osteoporosis. A person can keep it from happening by exercising and taking the right amount of calcium each day. 75 million people in the United States, Japan, and Europe have osteoporo

 83%|████████████████████████████████████████████████████████████████▋             | 83/100 [01:08<00:14,  1.21it/s]

New data point: {'chunk_id': 88389, 'question': 'Here are the questions and answers:\n\nWho is more likely to get osteoporosis?', 'answer': 'A woman.'}
--- New ELEMENT ---
Fetching a question for:
Lowell Fillmore "Sly" Dunbar (born 10 May 1952) is a Jamaican reggae drummer. He is best known as one half of the prolific Jamaican rhythm section and reggae production duo Sly and Robbie.

References

1952 births
Living people
Jamaican entertainers
Reggae musicians
Drummers
Record producers
LLM answered:


 84%|█████████████████████████████████████████████████████████████████▌            | 84/100 [01:08<00:12,  1.32it/s]

New data point: {'chunk_id': 266116, 'question': "What is Sly Dunbar's birth year?", 'answer': '1952'}
--- New ELEMENT ---
Fetching a question for:
Bengt Walter Feldreich (12 September 1925 – 21 October 2019) was a Swedish radio and television journalist, producer and television presenter. He worked for public service between 1950 until 1985. He produced and hosted the show "Snillen spekulerar" on SVT, the annual Christmas Eve broadcasts from the same channel. He was born in Stockholm.

Feldreich died on 21 October 2019 in Stockholm at the age of 94, from pneumonia.

References

Deaths from pneumonia
Television producers
Swedish radio personalities
Swedish television personalities
Swedish television presenters
Swedish journalists
2019 deaths
1925 births
LLM answered:


 85%|██████████████████████████████████████████████████████████████████▎           | 85/100 [01:09<00:11,  1.34it/s]

New data point: {'chunk_id': 210489, 'question': 'Who was Bengt Walter Feldreich?', 'answer': 'Bengt Walter Feldreich'}
--- New ELEMENT ---
Fetching a question for:
Immunology is the study of the immune system. The immune system is the parts of the body which work against infection and parasitism by other living things. Immunology deals with the working of the immune system in health and diseases, and with malfunctions of the immune system.

An immune system is present in all plants and animals. We know this because biologists have found genes coding for toll-like receptors in many different metazoans. These toll-like receptors can recognise bacteria as 'foreign', and are the starting-point for immune reactions. The type of immunity which is triggered by the toll-like receptors is called innate immunity. This is because it is entirely inherited in our genome, and is fully working as soon as our tissues and organs are properly developed.

Vertebrates, and only vertebrates, have a second

 86%|███████████████████████████████████████████████████████████████████           | 86/100 [01:10<00:11,  1.24it/s]

New data point: {'chunk_id': 666, 'question': 'What is the study of the immune system?', 'answer': 'Innate immunity.'}
--- New ELEMENT ---
Fetching a question for:
Lincoln Park is a designated community area in North Side, Chicago, Illinois.

As of 2015, the neighborhood is primarily made up of young urban professionals, recent college graduates, and young families. The slang terms Trixie and Chad have their origins in Lincoln Park. The 2003 Chicago balcony collapse occurred in Lincoln Park.

Community areas of Chicago
LLM answered:


 87%|███████████████████████████████████████████████████████████████████▊          | 87/100 [01:10<00:09,  1.39it/s]

New data point: {'chunk_id': 152813, 'question': 'What neighborhood is Lincoln Park located in North Side of?', 'answer': 'Chicago'}
--- New ELEMENT ---
Fetching a question for:
Politics 
Egypt is a country that has had many different rulers and many political systems.  After World War II, Egypt was still ruled by a king, Farouk of Egypt (11 February 1920 – 18 March 1965). He was the last ruler of the Muhammad Ali dynasty.

Farouk was overthrown on 23 July 1952 by a military coup. The coup was led by Muhammad Naguib, and Gamal Abdel Nasser. From then on, Egypt had military rulers or rulers who had the backing of the army and many citizens. 

Nasser became President, from 1956 to 1970. Later rulers were Anwar Sadat, and Hosni Mubarak.  

Abdel Fattah el-Sisi became President in 2014.

Revolution of 2011 

In January 2011, thousands of protesters gathered in Cairo. They wanted Hosni Mubarak to leave office. He had been the President for almost 30 years. On February 11, 2011, Vice Preside

 88%|████████████████████████████████████████████████████████████████████▋         | 88/100 [01:12<00:11,  1.04it/s]

LLM Failed! returning None
New data point: {'chunk_id': 446, 'question': None, 'answer': None}
--- New ELEMENT ---
Fetching a question for:
A politician (from Classical Greek πόλις, "polis") is a person active in party politics, or a person holding or seeking office in government.  In democratic countries, politicians seek elective positions within a government through elections or, at times, temporary appointment to replace politicians who have died, resigned or have been otherwise removed from office.  In non-democratic countries, they employ other means of reaching power through appointment, bribery, revolutions and intrigues.    

Some politicians are experienced in the art or science of government.  Politicians propose, support and create laws or policies that govern the land and, by extension, its people.  The word politician is sometimes replaced with the euphemism statesman.  Basically, a "politician" can be anyone who seeks to achieve political power in any bureaucratic instit

 89%|█████████████████████████████████████████████████████████████████████▍        | 89/100 [01:13<00:12,  1.10s/it]

LLM Failed! returning None
New data point: {'chunk_id': 19994, 'question': None, 'answer': None}
--- New ELEMENT ---
Fetching a question for:
Bayerbach is a municipality in Landshut in Bavaria in Germany.

Geography 
It is on the Bayerbacher Bach, a stream which flows into the Kleine Laber.

Subdivisions 
Bayerbach bei Ergoldsbach consists of 22 villages:

Neighbouring communities 
 Ergoldsbach
 Postau
 Weng
 Mengkofen (Dingolfing-Landau)
 Laberweinting (Straubing-Bogen)
 Mallersdorf-Pfaffenberg (Straubing-Bogen)

References

Landshut (district)
LLM answered:


 90%|██████████████████████████████████████████████████████████████████████▏       | 90/100 [01:14<00:10,  1.02s/it]

New data point: {'chunk_id': 266791, 'question': 'What is the name of the river that flows into the Kleine Laber?', 'answer': 'Bayerbacher Bach'}
--- New ELEMENT ---
Fetching a question for:
Usually, people only see hagfish when nets that sweep the sea floor are pulled up. Every fish, even the dead ones at the bottom of the sea, are brought up into the boat by the net. In some of those dead fish, hagfish are found eating. The smelly fish are dumped onto the deck of ships with the hagfish poking out from their bodies.

Slime 
When hagfish are afraid, they make slime. This slime comes out of the sides of the hagfish's body. They are able to make enough slime to completely fill a two-gallon bucket. The reason such a small fish can make so much slime is because the slime comes out in strings that quickly swell up much bigger when they are in the water.Their unusual way of eating and their slime has made many people call the hagfish the most "disgusting" of all sea creatures. Although hagfi

 91%|██████████████████████████████████████████████████████████████████████▉       | 91/100 [01:15<00:08,  1.12it/s]

LLM Failed! returning None
New data point: {'chunk_id': 94226, 'question': None, 'answer': None}
--- New ELEMENT ---
Fetching a question for:
Romualdas Ozolas (January 31, 1939 – April 6, 2015) was a Lithuanian politician, activist, writer and educator. He was a member of the Lithuanian branch of the Communist Party of the Soviet Union from 1973 to 1990. He was also a member of the Lithuanian independence movement, the Sąjūdis Initiative Group from 1988 to 1990. 

In 1988 he founded the nationalist Vilnija organization. He joined the Lithuanian Centre Union political party in 1993 and became its chairman. In 1996, he was elected to the Seimas and served until 2000.

References

1939 births
2015 deaths
Lithuanian politicians
Communists
Activists
Lithuanian writers
Lithuanian educators
LLM answered:


 92%|███████████████████████████████████████████████████████████████████████▊      | 92/100 [01:15<00:06,  1.19it/s]

New data point: {'chunk_id': 164000, 'question': 'Who was Romualdas Ozolas?', 'answer': 'Romualdas Ozolas'}
--- New ELEMENT ---
Fetching a question for:
The American River is a river in northern California in the United States. It is a tributary of the Sacramento River. The river starts in the Sierra Nevada and flows west through the city of Sacramento. The river is  long with a watershed of .

Folsom Dam and Folsom Lake are located on the American river east of Sacramento.

Tributaries
North Fork American River
Middle Fork American River
South Fork American River

Rivers of California
El Dorado County, California
Placer County, California
Sacramento County, California
LLM answered:


 93%|████████████████████████████████████████████████████████████████████████▌     | 93/100 [01:16<00:05,  1.37it/s]

New data point: {'chunk_id': 146240, 'question': 'Where does the American River flow?', 'answer': 'Long'}
--- New ELEMENT ---
Fetching a question for:
The southwestern border of Connecticut, where it abuts New York State, is marked by a panhandle in Fairfield County, containing Greenwich, Stamford, Fairfield, Westport, Wilton, and Darien, housing some of the wealthiest residents in the world. This irregularity in the boundary is the result of territorial disputes in the late 1600s, culminating with New York giving up its claim to this area, whose residents considered themselves part of Connecticut, in exchange for an equivalent area extending northwards from Ridgefield, Connecticut to the Massachusetts border as well as undisputed claim to Rye, New York.

Areas maintained by the National Park Service include Appalachian National Scenic Trail; Quinebaug & Shetucket Rivers Valley National Heritage Corridor; and Weir Farm National Historic Site.

Cities and towns in Connecticut
 Ansonia
 

 94%|█████████████████████████████████████████████████████████████████████████▎    | 94/100 [01:17<00:04,  1.32it/s]

New data point: {'chunk_id': 14876, 'question': 'What does the name "Connecticut" come from?', 'answer': 'Quinnehtukqut'}
--- New ELEMENT ---
Fetching a question for:
Child abandonment is when a parent leaves their child. There are many causes of child abandonment, including poverty or mental illness. In many countries, if a children are abandoned, they become orphans and live in an orphanage. They are raised there until they reach 18.

In the United States, "Safe Haven Laws" allow parents to leave their infants at certain safe places.  The goal of these laws is to keep parents from leaving their infants in unsafe places, or killing them.  In some states, leaving a baby at a Safe Haven is thought of as child abandonment, and a complaint may be filed in family court.  But as long as the baby has not been hurt, the parents cannot be charged with a crime for leaving their baby at a Safe Haven.  The baby is given to state child protection workers, who find a safe place for the baby and try

 95%|██████████████████████████████████████████████████████████████████████████    | 95/100 [01:18<00:03,  1.32it/s]

New data point: {'chunk_id': 93417, 'question': 'What are the causes of child abandonment?', 'answer': 'Poverty or mental illness.'}
--- New ELEMENT ---
Fetching a question for:
Maurice Far Eckhard Tio (born July 26, 1983 in Barcelona) is a cyclist from Spain.  He has a disability: He has cerebral palsy and is a C2 type athlete. He competed at the 2004 Summer Paralympics in cycling. He competed at the 2008 Summer Paralympics in cycling. He competed at the 2012 Summer Paralympics in cycling. He was the third person to finish in the C2 road trial race.

References 

Spanish cyclists
Living people
1983 births
Spanish Paralympic bronze medalists
2004 Summer Paralympics
2008 Summer Paralympics
2012 Summer Paralympics
People from Barcelona
LLM answered:


 96%|██████████████████████████████████████████████████████████████████████████▉   | 96/100 [01:18<00:03,  1.32it/s]

New data point: {'chunk_id': 140760, 'question': 'Who was the third person to finish in the C2 road trial race?', 'answer': 'Maurice Far Eckhard Tio'}
--- New ELEMENT ---
Fetching a question for:
James Karen (born James Karnofsky; November 28, 1923 – October 23, 2018) was an American actor. He was known for his roles as Martin Frohm in The Pursuit of Happyness and as Ben Hubbard in Superman Returns. He also starred in Poltergeist and The Return of the Living Dead.

Karen was born on November 28, 1923 in Wilkes-Barre, Pennsylvania to a Russian-Jewish family. He studied at Neighborhood Playhouse School of the Theatre. Karen was married to Susan Reed from 1958 until they divorced in 1967. They had one child. He married Alba Francesca in 1986.  

Karen died on October 23, 2018 in Los Angeles at the age of 94. The cause was cardiopulmonary arrest.

References

Other websites
 
 Great Character Actors / James Karen 
Pathmark TV commercial with James Karen (YouTube)

1923 births
2018 deaths
D

 97%|███████████████████████████████████████████████████████████████████████████▋  | 97/100 [01:19<00:02,  1.37it/s]

New data point: {'chunk_id': 137883, 'question': '"Where was James Karen born?"', 'answer': '"Wilkes-Barre, Pennsylvania"'}
--- New ELEMENT ---
Fetching a question for:
The Lavalleja Department () is a department in the southeast of Uruguay. Its capital is Minas.

Departments of Uruguay
LLM answered:


 98%|████████████████████████████████████████████████████████████████████████████▍ | 98/100 [01:20<00:01,  1.46it/s]

New data point: {'chunk_id': 229994, 'question': 'What is the capital of the Lavalleja Department?', 'answer': 'Minas'}
--- New ELEMENT ---
Fetching a question for:
Yanta District (, pinyin:Yàntǎ Qū, meaning "Wild goose Tower District") is a District of Xi'an, it's in southern part of Xi'an. Area of it is 151.44km2, and as of November 2010, 1,178,529 people live in here. The name of Yanta District is come from Big Wild Goose Tower, a tower that builded in 652 A.D. It's famous in culture and tour. The district have many university.

History 

In September 1954, Yanta District was set up. It was merge from Xi'an No.9 District, a part of Chang'an County, a part of Weiqu District and some other place.

In May 1960, some part of Xincheng District, Lianhu District and Beilin District was merged to Yanta District. But that was revoked in April 1962.

In September 1965, Yanta District was changed to Xi'an suburb toghter with Weiyang District, original Epang District and some other place. It's 

 99%|█████████████████████████████████████████████████████████████████████████████▏| 99/100 [01:20<00:00,  1.35it/s]

New data point: {'chunk_id': 126276, 'question': 'What is the name of the tower that Yanta District was named after?', 'answer': 'The Wild Goose Tower'}
--- New ELEMENT ---
Fetching a question for:
Valbroye is a municipality in the Broye-Vully district in the canton of Vaud in Switzerland. On 1 July 2011 the former municipalities of Cerniaz, Combremont-le-Grand, Combremont-le-Petit, Granges-près-Marnand, Marnand, Sassel, Seigneux and Villars-Bramard joined together to become the new municipality of Valbroye.

References

Other websites 

 Official website 

2011 establishments in Switzerland
Municipalities of Vaud
LLM answered:


100%|█████████████████████████████████████████████████████████████████████████████| 100/100 [01:21<00:00,  1.23it/s]

New data point: {'chunk_id': 209089, 'question': 'What is the name of the new municipality created in 2011?', 'answer': 'Valbroye'}





In [38]:
# Filter out None,None tuples if the model output is incorrect
questions_llm_data = [element for element in questions_llm_data if element["question"] is not None and element["answer"] is not None]

In [39]:
questions_dataset_llm = Dataset.from_list(questions_llm_data)

In [40]:
questions_dataset_llm_name = f"{dataset_to_add_questions_to}-questions-llm"

questions_dataset_llm.save_to_disk(
    os.path.join(LOCAL_DATASET_FOLDER, questions_dataset_llm_name)
)

Saving the dataset (0/1 shards):   0%|          | 0/96 [00:00<?, ? examples/s]