# Workshop 1 - Question and Answers
In this workshop, you will learning how to write prompts and feed them into LLMs. You
will also be learning how to use different prompt techniques to improve the response
from the LLM.

## Loading and Explorng the Dataset
The workshop will be using [`facebook/ExploreToM`](https://huggingface.co/datasets/facebook/ExploreToM) dataset from [HuggingFace](https://huggingface.co).

In [1]:
# TODO: Load the following libraries: datasets
from datasets import load_dataset

In [2]:
# Dataset name
dataset_name = "facebook/ExploreToM"

# TODO: load dataset
ds = load_dataset(dataset_name)

In [3]:

# TODO: number of rows in the dataset
print(ds.shape)

# TODO: Keys in the dataset
print(ds.keys())

# TODO: Feature names
print(ds['train'].features)

# TODO: Display a single row
idx = 100
for k, v in ds['train'][idx].items():
   print(f'key = {k}, value = {v}')

{'train': (13309, 18)}
dict_keys(['train'])
{'story_structure': Value('string'), 'infilled_story': Value('string'), 'question': Value('string'), 'expected_answer': Value('string'), 'qprop=params': Value('string'), 'qprop=nth_order': Value('int64'), 'qprop=non_unique_mental_state': Value('bool'), 'sprop=is_false_belief_story_1st': Value('bool'), 'sprop=is_false_belief_story_1st_and_2nd': Value('bool'), 'sprop=story_accuracy_1st_raw': Value('float64'), 'sprop=story_accuracy_1st_infilled': Value('float64'), 'sprop=global_idx': Value('int64'), 'param=story_type': Value('string'), 'param=num_stories_total': Value('int64'), 'param=max_sentences': Value('int64'), 'param=num_people': Value('int64'), 'param=num_moves': Value('int64'), 'param=num_rooms': Value('int64')}
key = story_structure, value = Leslie entered the main tent. Leslie left the main tent. Isabella entered the storage trailer. Isabella moved the stuffed rabbit to the wooden chest, which is also located in the storage trailer. Le

In [4]:
# TODO: import pipeline
from transformers import pipeline

## `pipeline`
[`pipeline`](https://huggingface.co/docs/transformers/en/main_classes/pipelines) is an easy to use API to perform inferencing. It provides a wrapper for task-specific pipelines and abstracts most of the complexity by allowing you to focus on the model and the task. 

You can use `pipeline` to perform summarisation, image classification, audio generation, etc. You can find an exhaustive list of `pipeline` task [here](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.pipeline.task).

In [5]:
# TODO: Summarise the text with the pipeline's default model
qna = pipeline('question-answering', model="distilbert/distilbert-base-cased-distilled-squad")


Fetching 0 files: 0it [00:00, ?it/s]

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

Fetching 0 files: 0it [00:00, ?it/s]

Device set to use cpu


In [8]:
# Prepare a question
idx = 10
question = ds['train'][idx]['question']
story = ds['train'][idx]['story_structure']
story = ds['train'][idx]['infilled_story']
expected_ans = ds['train'][idx]['expected_answer']

# query the model
predicted_ans = qna(question=question, context=story)

print(story)
print(question)
print('------------------------------')
print(predicted_ans)
print(expected_ans)

The bustling theater was a hive of activity on this chilly autumn evening, its worn wooden floors and faded velvet curtains a testament to years of countless performances. The dimly lit green room, tucked away backstage, was a cozy refuge from the chaos, its plush armchairs and ornate wooden chest offering a warm respite for those who needed it. Dylan slipped away from the crowd, disappearing behind a tattered curtain as he made his way to a more secluded space. The soft glow of a floor lamp in the green room enveloped him, providing a sense of tranquility amidst the pre-show chaos. Dylan carefully placed the pocket watch in the ornate wooden chest, hidden from view, and Clayton caught a glimpse of this sneaky maneuver from his secret vantage point. Clayton's eyes narrowed as he pushed open the creaky door and stepped into the green room, the sudden movement making the softly lit space seem almost anticipatory. In a swift motion, Dylan delved into the leather satchel, repositioning its

## Manual Inference - Question and Answer
In this section, we will look at what `pipeline` does under the hood to perform its inference. This will give us a better understanding of the major steps involved.

In [9]:
# TODO: load tokenizer
from transformers import AutoTokenizer

## DistilBERT base cased distilled SQuAD
DistilBERT is a small, fast, cheap and light Transformer model trained by distilling BERT base. More details [here](https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).

In [10]:
model_name = "distilbert/distilbert-base-cased-distilled-squad"

In [11]:
# TODO: Create a tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)


In [12]:
# TODO: Encode text
text = 'Thou shall not create a machine in the likeness of the human mind'

# encode the text
enc_text = tokenizer(text, return_tensors='pt')

# print the tokenized text
print(enc_text)


{'input_ids': tensor([[  101,   157, 14640,  4103,  1136,  2561,   170,  3395,  1107,  1103,
          1176,  1757,  1104,  1103,  1769,  1713,   102]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}


In [13]:
print(enc_text.input_ids[0].shape)

torch.Size([17])


In [None]:
for i in enc_text.input_ids[0]:
   print(i, tokenizer.decode(i))

In [None]:
# TODO: Encoding multiple texts
message = [
   "Big black bug bleeds black blood",
   "To call a pipeline on many items, you can call it with a list.",
   "Custom Generative AI Systems for Enterprises"
]

enc_message = tokenizer(message, return_tensors='pt', padding=True)

print(enc_message)

{'input_ids': tensor([[  101,  2562,  1602, 15430, 24752,  1116,  1602,  1892,   102,     0,
             0,     0,     0,     0,     0,     0,     0,     0],
        [  101,  1706,  1840,   170, 15826,  1113,  1242,  4454,   117,  1128,
          1169,  1840,  1122,  1114,   170,  2190,   119,   102],
        [  101, 25456,  9066, 15306, 19016,  6475,  1111, 16949,   102,     0,
             0,     0,     0,     0,     0,     0,     0,     0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]])}


In [22]:
# TODO: Decode text
dec_message = tokenizer.decode(enc_message.input_ids[0], skip_special_tokens=True)

print(dec_message)

Big black bug bleeds black blood


## Working with LLMs
Create and instance of the Large Language Model (LLM). We will then create a simple
prompt, tokenize the prompt and feed the tokenized prompt to the LLM. The response
from the LLM will be decoded to human friendly text.

In [19]:
# TODO: Load libraries
from transformers import AutoModelForQuestionAnswering

In [20]:
# TODO: Load question answer model
model = AutoModelForQuestionAnswering.from_pretrained(model_name)


In [23]:
# TODO: Encode context and question
idx = 50
question = ds['train'][idx]['question']
story = ds['train'][idx]['story_structure']
#story = ds['train'][idx]['infilled_story']
expected_ans = ds['train'][idx]['expected_answer']

#TODO: tokenize the question and story


In [24]:
# TODO: Tokenize the inputs
inputs = tokenizer(question, story, return_tensors='pt', padding=True)

print(inputs)


{'input_ids': tensor([[  101,  1130,  1134,  1395,  1674, 23929,  8495,  1389,  1341,  1115,
         20587,  1209,  3403,  1111,  1103,  5444,   136,   102, 23929,  8495,
          1389,  2242,  1103,  1707,  1395,   119, 23929,  8495,  1389,  1427,
          1103,  5444,  1106,  1103, 22823,  2884,   117,  1134,  1110,  1145,
          1388,  1107,  1103,  1707,  1395,   119, 23929,  8495,  1389,  1427,
          1103,  5444,  1106,  1103,  5439,  2068, 20492,   117,  1134,  1110,
          1145,  1388,  1107,  1103,  1707,  1395,   119,  1799,  1142,  2168,
          1108,  5664,   117,  6010,  9491,  1142,  2168,  1107,  3318,   113,
          1105,  1178,  1142,  2168,   114,   119,  6010,  2242,  1103,  1707,
          1395,   119, 20587,  2242,  1103,  1707,  1395,   119,  6010,  1427,
          1103,  5444,  1106,  1103,  5828,  5092,  9055,   117,  1134,  1110,
          1145,  1388,  1107,  1103,  1707,  1395,   119,   102]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1

In [25]:
# Pass the question and story/context to the model
result = model(inputs.input_ids, inputs.attention_mask)

print(result)

QuestionAnsweringModelOutput(loss=None, start_logits=tensor([[ -5.7032,  -7.5931,  -7.8459,  -9.3639, -10.1221,  -8.7499,  -9.9979,
         -10.7957,  -9.2579,  -9.9846,  -9.4543,  -9.9322,  -8.5640,  -9.3167,
          -9.1187,  -9.1940,  -7.6195,  -7.1851,  -3.6156,  -8.7231,  -9.2405,
          -4.0915,   2.3793,   2.5612,  -5.5461,  -6.3208,  -2.5113,  -8.1883,
          -8.5451,  -4.6734,  -5.9906,  -5.5206,  -5.8241,   0.0854,   0.5292,
          -4.8312,  -7.8227,  -7.5964,  -8.1513,  -7.8242,  -6.0995,  -1.7951,
           1.1506,   1.5036,  -5.9132,  -6.2261,  -2.8108,  -8.6988,  -9.0689,
          -5.2408,  -6.5220,  -6.3959,  -7.1263,  -1.7462,  -1.0422,  -4.2214,
          -6.6802,  -8.8318,  -8.5046,  -8.9208,  -8.3934,  -7.5931,  -3.1230,
          -0.1627,   0.3004,  -6.4978,  -7.7543,  -7.3457,  -7.8658,  -8.8552,
         -10.3558,  -9.7628,  -9.0849,  -1.3072,  -8.3199,  -7.7853,  -9.4434,
          -9.0954,  -7.9018,  -9.0172, -10.4548,  -9.4144,  -8.7126, -10.0420,

In [27]:
# Ensure minimum and maximum token length in the answer
import torch
def ensure_size(input_ids, answer, min_length = 2, max_length = 5):
   ans_start = torch.argmax(answer['start_logits'])
   ans_end = torch.argmax(answer['end_logits']) + 1
   ans_length = ans_end - ans_start
   if ans_length < min_length:
      ans_end = min(ans_start + min_length, len(input_ids[0]))
   elif ans_length > max_length:
      ans_end = ans_start + max_length
   ans_start = max(0, ans_start)
   ans_end = min(len(input_ids[0]), ans_end)
   return (ans_start, ans_end)

In [28]:
# TODO Return a minimum of 5 tokens
bounds = ensure_size(inputs.input_ids, result)
print(bounds)

(tensor(23), tensor(25))


In [32]:
# get the answer from the story
enc_answer = inputs.input_ids[0][bounds[0]: bounds[1]]

print(enc_answer)

print(tokenizer.decode(enc_answer))
print(expected_ans)

tensor([1707, 1395])
production room
production room


In [None]:
# TODO: Try this your self

context = """
Dickens wrote A Christmas Carol during a period when the British were exploring and re-evaluating past Christmas traditions, 
including carols, and newer customs such as cards and Christmas trees. He was influenced by the experiences of his own youth and 
by the Christmas stories of other authors, including Washington Irving and Douglas Jerrold. Dickens had written three Christmas 
stories prior to the novella, and was inspired following a visit to the Field Lane Ragged School, one of several establishments for 
London's street children. The treatment of the poor and the ability of a selfish man to redeem himself by transforming into a more 
sympathetic character are the key themes of the story. There is discussion among academics as to whether this is a fully secular 
story or a Christian allegory.
"""

question = "How many stories has Dickens wrote?"

