# Workshop 1 - Summarization 

In [5]:
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM, GenerationConfig

## T5 Models

The <code>flan-t5</code> is a Text-To-Text Transfer Transformer (T5) that is capable of performing zero-shot NLP task such as summary, simple reasoninig, answering questions, etc. 

Some T5 models from Huggingface
- [<code>google/flan-t5-base</code>](https://huggingface.co/google/flan-t5-base)
- [<code>google/flan-t5-small</code>](https://huggingface.co/google/flan-t5-small)
- [<code>google/flan-t5-xl</code>](https://huggingface.co/google/flan-t5-xl)
- [<code>google/flan-t5-xxl</code>](https://huggingface.co/google/flan-t5-xxl) - full model

Complete list of [T5 models](https://huggingface.co/models?search=google/flan) on Huggingface.

In [6]:
model_name = 'google/flan-t5-base'
#model_name = 'sshleifer/distilbart-cnn-12-6'

In [7]:
# TODO: Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

In [8]:
# TODO: Print the model
print(model)

T5ForConditionalGeneration(
  (shared): Embedding(32128, 768)
  (encoder): T5Stack(
    (embed_tokens): Embedding(32128, 768)
    (block): ModuleList(
      (0): T5Block(
        (layer): ModuleList(
          (0): T5LayerSelfAttention(
            (SelfAttention): T5Attention(
              (q): Linear(in_features=768, out_features=768, bias=False)
              (k): Linear(in_features=768, out_features=768, bias=False)
              (v): Linear(in_features=768, out_features=768, bias=False)
              (o): Linear(in_features=768, out_features=768, bias=False)
              (relative_attention_bias): Embedding(32, 12)
            )
            (layer_norm): T5LayerNorm()
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (1): T5LayerFF(
            (DenseReluDense): T5DenseGatedActDense(
              (wi_0): Linear(in_features=768, out_features=2048, bias=False)
              (wi_1): Linear(in_features=768, out_features=2048, bias=False)
              (wo):

In [None]:
text = """ 
Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;

Then took the other, as just as fair,
And having perhaps the better claim,
Because it was grassy and wanted wear;
Though as for that the passing there
Had worn them really about the same,

And both that morning equally lay
In leaves no step had trodden black.
Oh, I kept the first for another day!
Yet knowing how way leads on to way,
I doubted if I should ever come back.

I shall be telling this with a sigh
Somewhere ages and ages hence:
Two roads diverged in a wood, and Iâ€”
I took the one less traveled by,
And that has made all the difference.
"""

In [9]:
text = """ 
When a traveler in north central Massachusetts takes the wrong fork
at the junction of the Aylesbury pike just beyond Dean's Corners he
comes upon a lonely and curious country. The ground gets higher, and
the brier-bordered stone walls press closer and closer against the ruts
of the dusty, curving road. The trees of the frequent forest belts
seem too large, and the wild weeds, brambles, and grasses attain a
luxuriance not often found in settled regions. At the same time the
planted fields appear singularly few and barren; while the sparsely
scattered houses wear a surprizing uniform aspect of age, squalor, and
dilapidation. Without knowing why, one hesitates to ask directions
from the gnarled, solitary figures spied now and then on crumbling
doorsteps or in the sloping, rock-strewn meadows. Those figures are
so silent and furtive that one feels somehow confronted by forbidden
things, with which it would be better to have nothing to do. When a
rise in the road brings the mountains in view above the deep woods,
the feeling of strange uneasiness is increased. The summits are too
rounded and symmetrical to give a sense of comfort and naturalness, and
sometimes the sky silhouettes with especial clearness the queer circles
of tall stone pillars with which most of them are crowned.
"""

In [18]:
text = """ 
Companies and groups backing carbon-capture technology, which critics slam as an excuse to keep burning fossil fuels, have deployed more than 500 participants to the COP30 climate talks, according to a list compiled by an NGO and shared exclusively with AFP.
The list, assembled by the Center for International Environmental Law (CIEL), names oil and gas giants such as ExxonMobil, Shell and BP, along with Brazil's state-owned Petrobras and China National Petroleum Corp.
"""

In [19]:
# TODO: Create a prompt
prompt = f''' 
Write a short summary for this article: {text}
'''

print(prompt)


 
Write a short summary for this article:  
Companies and groups backing carbon-capture technology, which critics slam as an excuse to keep burning fossil fuels, have deployed more than 500 participants to the COP30 climate talks, according to a list compiled by an NGO and shared exclusively with AFP.
The list, assembled by the Center for International Environmental Law (CIEL), names oil and gas giants such as ExxonMobil, Shell and BP, along with Brazil's state-owned Petrobras and China National Petroleum Corp.




In [20]:
# TODO: tokenize the text
enc_prompt = tokenizer(prompt, return_tensors='pt')
print(enc_prompt)

{'input_ids': tensor([[ 8733,     3,     9,   710,  9251,    21,    48,  1108,    10, 11239,
            11,  1637, 16057,  4146,    18,  4010,  2693,   748,     6,    84,
          6800,     7,     3,     7,    40,   265,    38,    46, 10553,    12,
           453,  9706, 15722,  2914,     7,     6,    43, 16163,    72,   145,
          2899,  3008,    12,     8,     3, 25032,  1458,  3298,  6927,     6,
          1315,    12,     3,     9,   570,     3, 16678,    57,    46,   445,
          5577,    11,  2471,  9829,    28,     3, 26487,     5,    37,   570,
             6, 17583,    57,     8,  1166,    21,  1331,  9185,  2402,    41,
          3597,  3577,   201,  3056,  1043,    11,  1807,  6079,     7,   224,
            38,  1881,   226,   106,   329,    32,  3727,     6, 16040,    11,
             3, 11165,     6,   590,    28,  9278,    31,     7,   538,    18,
          9160, 17786,  1939,     7,    11,  1473,   868, 30306, 10052,     5,
             3,     1]]), 'attention_m

In [21]:
# TODO: Generate summary with model 
enc_result = model.generate(enc_prompt.input_ids)

In [22]:
# TODO: Decode the summary
print(enc_result)

summary = tokenizer.decode(enc_result[0], skip_special_tokens=True)
print(summary)

tensor([[    0,     3,  8656,   688,    11,  1637, 16057,  4146,  4105,   748,
            43, 16163,    72,   145,  2899,  3008,    12,     8,     3, 25032,
          1458]])
Several companies and groups backing carbon capture technology have deployed more than 500 participants to the COP30


In [29]:
config = GenerationConfig(
   do_sample=True,
   temperature = .1
)

In [30]:
enc_result = model.generate(enc_prompt.input_ids, generation_config=config)
summary = tokenizer.decode(enc_result[0], skip_special_tokens=True)
print(summary)

More than 500 companies and groups backing carbon capture technology have deployed more than 500 participants to COP30
