LLM Take-home assignment round
Programmatically solve for these questions. Try to answer as many questions as possible. Try to complete in 48 hours. Some of these may require exploration, so do try.
1. Use a pre-trained google/flan-t5-small as the model.
2. Verify if the summarization task works.
3. Verify if the Q&A task works.
4. Verify if English to French translation task works.
5. Programmatically print the names of all the model layers and their dimensions.
6. Programmatically print the total number of parameters/weights in this model.
7. Set the tensor in final layer (decoder.final_layer_norm.weight) to all zeros.
8. Verify if the Q&A task works after resettng the weights of the above layer.
9. Replace the decoder.final_layer_norm.weight with a layer of smaller dimensions and adjust all the dependent layers to match the dimension
10. Reload the original google/flan-t5-small model.
11. Train the model for a Q&A task that takes a context as additional input along with the
question. You can use SQuAD dataset (h_ps://rajpurkar.github.io/SQuAD-explorer/ ) or the smaller Topioca dataset (h_ps://mcgill-nlp.github.io/topiocqa/) . Choose an appropriate task prefix/trigger word and justify the choice.
12. Evaluate the quality of the model
Next discussion will be around the solution and geXng deeper into certain algorithms
 

Use a pre-trained google/flan-t5-small as the model.

In [10]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, T5ForConditionalGeneration, T5Tokenizer

In [11]:
MODEL_PATH = "google/flan-t5-small"
# tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
# model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_PATH)
tokenizer = T5Tokenizer.from_pretrained(MODEL_PATH)
model = T5ForConditionalGeneration.from_pretrained(MODEL_PATH)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [5]:
def tokenize_function(prompt, sample, padding="max_length"):
    #add prompt and input:
    inputs = [prompt + sample]
    model_inputs = tokenizer(inputs, max_length=512, padding=padding, truncation=True, return_tensors='pt')
    return model_inputs

Verify if the summarization task works.

In [9]:
# prompt = "We have need to summarize large information and you're given that information  " \
    # "create a short summary of the information that captures key arguments and important information."
# prompt = "Summarize the given text in 'three sentences', focusing on the topic and capture maximum information."
# prompt = "summarize: "
prompt = "summarize the following text : "
# sample = """The sergeant brought out a small, withered monkey's paw, and threw it on the table. "There's magic in what I generally say, so perhaps there may be in this too," he said mysteriously. Mrs. White examined it carefully. It was a dry, shrivelled hand with thick, coarse black hairs. Three wrinkles ran across the palm, and down these ran three more, straight from the wrist. Mrs. White had never seen anything so ugly. "Well, I don't see any good you can get from it," she said as she dropped it back into the little brown box."""
sample = '''The invention of the printing press by Johannes Gutenberg in 1440 marked a revolutionary turning point in human history. This innovation allowed for the mass production of books and other printed materials, which significantly increased literacy rates and the dissemination of knowledge. Before the printing press, information was primarily copied by hand in monasteries, making it a slow and laborious process. Only the wealthy and powerful had access to books and written knowledge. With the arrival of the printing press, information became more accessible to the public, fostering the growth of education, science, and culture. The printing press also played a crucial role in the development of the Renaissance and the Protestant Reformation by enabling the widespread circulation of ideas and religious texts. (Source: Britannica - History of the Printing Press https://www.britannica.com/topic/printing-publishing/The-Gutenberg-press)'''
tokenized_sample = tokenize_function(prompt, sample)

summarized_ids = model.generate(**tokenized_sample, max_new_tokens=256)
summary = tokenizer.decode(summarized_ids[0], skip_special_tokens=True)
summary

'The printing press was a key element in the development of the Renaissance and Protestant Reformation, and was a key factor in the development of the Renaissance.'

In [13]:
#different approach
from transformers import pipeline
summarizer = pipeline("summarization", model=MODEL_PATH)

prompt = "summarize the following text : "
# sample = """The sergeant brought out a small, withered monkey's paw, and threw it on the table. "There's magic in what I generally say, so perhaps there may be in this too," he said mysteriously. Mrs. White examined it carefully. It was a dry, shrivelled hand with thick, coarse black hairs. Three wrinkles ran across the palm, and down these ran three more, straight from the wrist. Mrs. White had never seen anything so ugly. "Well, I don't see any good you can get from it," she said as she dropped it back into the little brown box."""
sample = '''The invention of the printing press by Johannes Gutenberg in 1440 marked a revolutionary turning point in human history. This innovation allowed for the mass production of books and other printed materials, which significantly increased literacy rates and the dissemination of knowledge. Before the printing press, information was primarily copied by hand in monasteries, making it a slow and laborious process. Only the wealthy and powerful had access to books and written knowledge. With the arrival of the printing press, information became more accessible to the public, fostering the growth of education, science, and culture. The printing press also played a crucial role in the development of the Renaissance and the Protestant Reformation by enabling the widespread circulation of ideas and religious texts. (Source: Britannica - History of the Printing Press https://www.britannica.com/topic/printing-publishing/The-Gutenberg-press)'''
sample_summary = summarizer(prompt+sample)
print("summary: " + sample_summary[0]['summary_text'])

Your max_length is set to 200, but your input_length is only 198. Since this is a summarization task, where outputs shorter than the input are typically wanted, you might consider decreasing max_length manually, e.g. summarizer('...', max_length=99)


summary: The printing press was invented by Johannes Gutenberg in 1440, enabling the mass production of books and other printed materials, which significantly increased literacy rates and the dissemination of knowledge.


Verify if the Q&A task works.


In [14]:
prompt = "Answer the following question by reasoning step by step from given context."
question = "What does increased oxygen concentrations in the patient's lungs displace?"
context = """Hyperbaric (high-pressure) medicine uses special oxygen chambers
to increase the partial pressure of O 2 around the patient and, when needed,
the medical staff. Carbon monoxide poisoning, gas gangrene, and decompression
sickness (the 'bends') are sometimes treated using these devices. Increased
O 2 concentration in the lungs helps to displace carbon monoxide from the
heme group of hemoglobin. Oxygen gas is poisonous to the anaerobic bacteria
that cause gas gangrene, so increasing its partial pressure helps kill them.
Decompression sickness occurs in divers who decompress too quickly after
a dive, resulting in bubbles of inert gas, mostly nitrogen and helium, forming
in their blood. Increasing the pressure of O 2 as soon as possible is part
of the treatment."""
sample = "question : " + question + "context : " + context

tokenized_sample = tokenize_function(prompt, sample)
answer_ids = model.generate(**tokenized_sample, max_new_tokens=256)
answer = tokenizer.decode(answer_ids[0], skip_special_tokens=True)
answer

'Carbon monoxide poisoning, gas gangrene, and decompression sickness'

In [18]:
#different approach
from transformers import pipeline
qa = pipeline("question-answering", model=MODEL_PATH)

prompt = "Answer the following question with reasoning step by step from given context."
question = "What does increased oxygen concentrations in the patient's lungs displace?"
context = """Hyperbaric (high-pressure) medicine uses special oxygen chambers
to increase the partial pressure of O.2 around the patient and, when needed,
the medical staff. Carbon monoxide poisoning, gas gangrene, and decompression
sickness (the 'bends') are sometimes treated using these devices. Increased
O.2 concentration in the lungs helps to displace carbon monoxide from the
heme group of hemoglobin. Oxygen gas is poisonous to the anaerobic bacteria
that cause gas gangrene, so increasing its partial pressure helps kill them.
Decompression sickness occurs in divers who decompress too quickly after
a dive, resulting in bubbles of inert gas, mostly nitrogen and helium, forming
in their blood. Increasing the pressure of O.2 as soon as possible is part
of the treatment."""

question = prompt + ' ' + question

answer = qa(question=question, context=context)
print(answer)

Some weights of T5ForQuestionAnswering were not initialized from the model checkpoint at google/flan-t5-small and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


{'score': 0.00010763200407382101, 'start': 522, 'end': 562, 'answer': '\nDecompression sickness occurs in divers'}


Verify if English to French translation task works.

In [16]:
def translate(input_text, src_lang, to_lang):
    prompt = f"Translate {src_lang} to {to_lang}: {input_text}"
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids
    outputs = model.generate(input_ids, max_new_tokens=256)
    model_translation = tokenizer.decode(outputs[0])
    final_translation = model_translation[5:-4]
    return final_translation

sample = """who is the prime minister of India?"""

translated_text = translate(sample, 'English', 'French')
translated_text

" quelle sa premier ministre de l'Inde?"

In [17]:
#different approach
from transformers import pipeline

translator = pipeline('translation_en_to_fr', model=MODEL_PATH)#, src_lang="en", tgt_lang="fr")
# text = "Hello, How is the day?"
text = """who is the prime minister of India?"""


translated_text = translator(text)

print(translated_text[0]['translation_text'])  

qui est le premier ministre de l'Inde?


Programmatically print the names of all the model layers and their dimensions.

In [8]:
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-small")

encoder = model.encoder
decoder = model.decoder

print("Encoder Layers:")
for i, layer in enumerate(encoder.block):
    print(f"Layer {i+1}:")
    print(f"{type(layer).__name__}")
    for name, param in layer.named_parameters():
        print(f"Name: {name}, Dimension: {param.size()}")

Encoder Layers:
Layer 1:
T5Block
Nmae: layer.0.SelfAttention.q.weight, Dimension: torch.Size([384, 512])
Nmae: layer.0.SelfAttention.k.weight, Dimension: torch.Size([384, 512])
Nmae: layer.0.SelfAttention.v.weight, Dimension: torch.Size([384, 512])
Nmae: layer.0.SelfAttention.o.weight, Dimension: torch.Size([512, 384])
Nmae: layer.0.SelfAttention.relative_attention_bias.weight, Dimension: torch.Size([32, 6])
Nmae: layer.0.layer_norm.weight, Dimension: torch.Size([512])
Nmae: layer.1.DenseReluDense.wi_0.weight, Dimension: torch.Size([1024, 512])
Nmae: layer.1.DenseReluDense.wi_1.weight, Dimension: torch.Size([1024, 512])
Nmae: layer.1.DenseReluDense.wo.weight, Dimension: torch.Size([512, 1024])
Nmae: layer.1.layer_norm.weight, Dimension: torch.Size([512])
Layer 2:
T5Block
Nmae: layer.0.SelfAttention.q.weight, Dimension: torch.Size([384, 512])
Nmae: layer.0.SelfAttention.k.weight, Dimension: torch.Size([384, 512])
Nmae: layer.0.SelfAttention.v.weight, Dimension: torch.Size([384, 512])
Nm

In [20]:
print("Decoder Layers:")
for i, layer in enumerate(decoder.block):
  print(f"Layer {i+1}:")
  print(f"{type(layer).__name__}")
  for name, param in layer.named_parameters():
    print(f"Name: {name}, Dimension: {param.size()}")

Decoder Layers:
Layer 1:
T5Block
Name: layer.0.SelfAttention.q.weight, Dimension: torch.Size([384, 512])
Name: layer.0.SelfAttention.k.weight, Dimension: torch.Size([384, 512])
Name: layer.0.SelfAttention.v.weight, Dimension: torch.Size([384, 512])
Name: layer.0.SelfAttention.o.weight, Dimension: torch.Size([512, 384])
Name: layer.0.SelfAttention.relative_attention_bias.weight, Dimension: torch.Size([32, 6])
Name: layer.0.layer_norm.weight, Dimension: torch.Size([512])
Name: layer.1.EncDecAttention.q.weight, Dimension: torch.Size([384, 512])
Name: layer.1.EncDecAttention.k.weight, Dimension: torch.Size([384, 512])
Name: layer.1.EncDecAttention.v.weight, Dimension: torch.Size([384, 512])
Name: layer.1.EncDecAttention.o.weight, Dimension: torch.Size([512, 384])
Name: layer.1.layer_norm.weight, Dimension: torch.Size([512])
Name: layer.2.DenseReluDense.wi_0.weight, Dimension: torch.Size([1024, 512])
Name: layer.2.DenseReluDense.wi_1.weight, Dimension: torch.Size([1024, 512])
Name: layer.2.

Programmatically print the total number of parameters/weights in this model.

In [21]:
total_params = sum(p.numel() for p in model.parameters())

print(f"Total Parameters/Weights in flan-t5-small: {total_params}")

Total Parameters/Weights in flan-t5-small: 76961152


Set the tensor in final layer (decoder.final_layer_norm.weight) to all zeros.
Verify if the Q&A task works after resettng the weights of the above layer.

In [15]:
# for p in model.decoder.parameters():
#     print(p)


In [3]:
decoder = model.decoder
decoder_weight = decoder.final_layer_norm.weight

decoder_weight.data.fill_(0)
model.save_pretrained("modified1_flan-t5-small")

tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
        0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 

In [6]:
model1 = T5ForConditionalGeneration.from_pretrained("modified1_flan-t5-small")

prompt = "Answer the following question by reasoning step by step from given context."
question = "What does increased oxygen concentrations in the patient's lungs displace?"
context = """Hyperbaric (high-pressure) medicine uses special oxygen chambers
to increase the partial pressure of O 2 around the patient and, when needed,
the medical staff. Carbon monoxide poisoning, gas gangrene, and decompression
sickness (the 'bends') are sometimes treated using these devices. Increased
O 2 concentration in the lungs helps to displace carbon monoxide from the
heme group of hemoglobin. Oxygen gas is poisonous to the anaerobic bacteria
that cause gas gangrene, so increasing its partial pressure helps kill them.
Decompression sickness occurs in divers who decompress too quickly after
a dive, resulting in bubbles of inert gas, mostly nitrogen and helium, forming
in their blood. Increasing the pressure of O 2 as soon as possible is part
of the treatment."""
sample = "question : " + question + "context : " + context

tokenized_sample = tokenize_function(prompt, sample)
answer_ids = model1.generate(**tokenized_sample, max_new_tokens=256)
answer = tokenizer.decode(answer_ids[0], skip_special_tokens=True)
answer

''

#After setting the final_layer_norm.weights to 0's , the output for Q&A is EMPTY

Replace the decoder.final_layer_norm.weight with a layer of smaller dimensions and adjust all the dependent layers to match the dimension

In [67]:
model = T5ForConditionalGeneration.from_pretrained(MODEL_PATH)
model.decoder.final_layer_norm.weight.size()

torch.Size([512])

In [57]:
import torch
new_dim = 256 #new dimension that is lower than the dimension of final_layer_norm

In [58]:
# Decreaing the dimension of the weights of the final_layer_norm
final_layer = model.decoder.final_layer_norm
final_layer.weight = torch.nn.Parameter(torch.randn(new_dim))

In [60]:
#Adjusting the weights of the related layers
for i, layer in enumerate(model.decoder.block):
    for name, param in layer.named_parameters():
        if 'layer_norm' in name :
            param.data = torch.nn.Parameter(torch.randn(new_dim))
        if 'SelfAttention.q' in name :
            param.data = torch.nn.Parameter(torch.randn(384, new_dim))
        if 'SelfAttention.k' in name :
            param.data = torch.nn.Parameter(torch.randn(384, new_dim))
        if 'SelfAttention.v' in name :
            param.data = torch.nn.Parameter(torch.randn(384, new_dim))
        if 'SelfAttention.o' in name :
            param.data = torch.nn.Parameter(torch.randn(new_dim, 384))
        if 'EncDecAttention.q' in name :
            param.data = torch.nn.Parameter(torch.randn(384, new_dim))
        if 'EncDecAttention.k' in name :
            param.data = torch.nn.Parameter(torch.randn(384, new_dim))
        if 'EncDecAttention.v' in name :
            param.data = torch.nn.Parameter(torch.randn(384, new_dim))
        if 'EncDecAttention.o' in name :
            param.data = torch.nn.Parameter(torch.randn(new_dim, 384))
        if 'DenseReluDense.wi_0' in name:
            param.data = torch.nn.Parameter(torch.randn(1024, new_dim))
        if 'DenseReluDense.wi_1' in name:
            param.data = torch.nn.Parameter(torch.randn(1024, new_dim))
        if 'DenseReluDense.wo' in name:
            param.data = torch.nn.Parameter(torch.randn(new_dim, 1024))
        
        
        

In [61]:
# Printing Updated dimensions of the decoder layer
print("Decoder Layers:")
for i, layer in enumerate(decoder.block):
    print(f"Layer {i+1}:")
    print(f"{type(layer).__name__}")
    for name, param in layer.named_parameters():
        print(f"Name: {name}, Dimension: {param.size()}")

Decoder Layers:
Layer 1:
T5Block
Name: layer.0.SelfAttention.q.weight, Dimension: torch.Size([384, 256])
Name: layer.0.SelfAttention.k.weight, Dimension: torch.Size([384, 256])
Name: layer.0.SelfAttention.v.weight, Dimension: torch.Size([384, 256])
Name: layer.0.SelfAttention.o.weight, Dimension: torch.Size([256, 384])
Name: layer.0.SelfAttention.relative_attention_bias.weight, Dimension: torch.Size([32, 6])
Name: layer.0.layer_norm.weight, Dimension: torch.Size([256])
Name: layer.1.EncDecAttention.q.weight, Dimension: torch.Size([384, 256])
Name: layer.1.EncDecAttention.k.weight, Dimension: torch.Size([384, 256])
Name: layer.1.EncDecAttention.v.weight, Dimension: torch.Size([384, 256])
Name: layer.1.EncDecAttention.o.weight, Dimension: torch.Size([256, 384])
Name: layer.1.layer_norm.weight, Dimension: torch.Size([256])
Name: layer.2.DenseReluDense.wi_0.weight, Dimension: torch.Size([1024, 256])
Name: layer.2.DenseReluDense.wi_1.weight, Dimension: torch.Size([1024, 256])
Name: layer.2.

In [63]:
# Adjusting the dimensions of the encoder layer as the self attention from encoder are needed by decoder
for i, layer in enumerate(model.encoder.block):
    for name, param in layer.named_parameters():
        if 'SelfAttention.q' in name :
            param.data = torch.nn.Parameter(torch.randn(384, new_dim))
        if 'SelfAttention.k' in name :
            param.data = torch.nn.Parameter(torch.randn(384, new_dim))
        if 'SelfAttention.v' in name :
            param.data = torch.nn.Parameter(torch.randn(384, new_dim))
        if 'SelfAttention.o' in name :
            param.data = torch.nn.Parameter(torch.randn(new_dim, 384))
        if 'layer_norm' in name :
            param.data = torch.nn.Parameter(torch.randn(new_dim))
        if 'DenseReluDense.wi_0' in name:
            param.data = torch.nn.Parameter(torch.randn(1024, new_dim))
        if 'DenseReluDense.wi_1' in name:
            param.data = torch.nn.Parameter(torch.randn(1024, new_dim))
        if 'DenseReluDense.wo' in name:
            param.data = torch.nn.Parameter(torch.randn(new_dim, 1024))

In [64]:
#printing the updated weights of the encoder layer
print("Encoder Layers:")
for i, layer in enumerate(model.encoder.block):
    print(f"Layer {i+1}:")
    print(f"{type(layer).__name__}")
    for name, param in layer.named_parameters():
        print(f"Name: {name}, Dimension: {param.size()}")

Encoder Layers:
Layer 1:
T5Block
Name: layer.0.SelfAttention.q.weight, Dimension: torch.Size([384, 256])
Name: layer.0.SelfAttention.k.weight, Dimension: torch.Size([384, 256])
Name: layer.0.SelfAttention.v.weight, Dimension: torch.Size([384, 256])
Name: layer.0.SelfAttention.o.weight, Dimension: torch.Size([256, 384])
Name: layer.0.SelfAttention.relative_attention_bias.weight, Dimension: torch.Size([32, 6])
Name: layer.0.layer_norm.weight, Dimension: torch.Size([256])
Name: layer.1.DenseReluDense.wi_0.weight, Dimension: torch.Size([1024, 256])
Name: layer.1.DenseReluDense.wi_1.weight, Dimension: torch.Size([1024, 256])
Name: layer.1.DenseReluDense.wo.weight, Dimension: torch.Size([256, 1024])
Name: layer.1.layer_norm.weight, Dimension: torch.Size([256])
Layer 2:
T5Block
Name: layer.0.SelfAttention.q.weight, Dimension: torch.Size([384, 256])
Name: layer.0.SelfAttention.k.weight, Dimension: torch.Size([384, 256])
Name: layer.0.SelfAttention.v.weight, Dimension: torch.Size([384, 256])
Na

In [68]:
model.save_pretrained("new_dim_256_flan-t5-small")

The task of Finetuning the flan-t5-small model is implemented in separate notebook