## Download Data Set

Download a [BBC news article dataset from HuggingFace](https://huggingface.co/datasets/csebuetnlp/xlsum) and preview the data.

In [1]:
from datasets import load_dataset
dataset = load_dataset("csebuetnlp/xlsum", "english", split="train")
dataset[15]

{'id': 'uk-scotland-highlands-islands-51206457',
 'url': 'https://www.bbc.com/news/uk-scotland-highlands-islands-51206457',
 'title': 'New virtual reality experience of Scottish waters',
 'summary': "Scotland's opportunities for sailing and boating on rivers, lochs and seas are being promoted in a new campaign.",
 'text': "A series of 360 degree virtual reality videos have been produced as part of #MustSeaScotland. St Kilda, Islay, Skye and Inverness Marina are among the locations featured. Sail Scotland has created the campaign with other organisations, including the National Trust for Scotland and VisitScotland. The campaign comes during Scotland's Year of Coasts and Waters 2020. All images are the copyright of Airborne Lens."}

## Store Data in Vector Database

Instantiate in-memory [ChromaDB](https://www.trychroma.com/) Vector Database client with a new, empty collection.

In [2]:
import chromadb
chroma_client = chromadb.Client()
collection = chroma_client.create_collection(name="bbc_news_articles")

Add news articles to the vector database. Since we aren't providing our own custom embeddings, ChromaDB uses the [Sentence Transformers](https://www.sbert.net/) `all-MiniLM-L6-v2` model to create the embeddings automatically.

In [3]:
metadata = [
    {
        "title": x["title"], 
        "text": x["text"],
    } for x in dataset.to_list()
]

number_of_articles = 2000

collection.add(
    ids=dataset['id'][:number_of_articles],
    documents=dataset['summary'][:number_of_articles],
    metadatas=metadata[:number_of_articles],
)

## Get Relevant Data from Vector Database

In [4]:
collection.query(
    query_texts=["earthquakes and natural disasters"],
    n_results=5
)

{'ids': [['32549706',
   'uk-england-37653952',
   'world-asia-17329880',
   'science-environment-32472310',
   'world-10785301']],
 'distances': [[0.8070799112319946,
   0.8324653506278992,
   0.9004838466644287,
   0.9371232986450195,
   1.0706489086151123]],
 'metadatas': [[{'text': 'By Kate RaviliousScience writer Chile\'s earthquake barely made the news, whilst Nepal\'s has brought complete and utter devastation. How did two such similar earthquakes have such disparate effects? A huge part of the answer is, of course, building standards and wealth. Since Chile\'s terrible M9.5 earthquake in 1960, where over 5,500 people died, the country has taken big steps in modernising its buildings, designing them to withstand the shaking produced by great earthquakes. Meanwhile, in Nepal, few buildings were up to code, and many toppled when the earthquake struck. But wealth and building codes don\'t tell the entire story: the geology is different, too. Nepal sits on a continental collision zo

## Query Large Language Model