# Installing Packages

In [20]:
!pip install torch
!pip install transformers









# Load Model and Tokenizer

In [21]:
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model = AutoModelForSeq2SeqLM.from_pretrained('t5-base')
tokenizer = AutoTokenizer.from_pretrained('t5-base')


For now, this behavior is kept to avoid breaking backwards compatibility when padding/encoding with `truncation is True`.
- Be aware that you SHOULD NOT rely on t5-base automatically truncating your input to 512 when padding/encoding.
- If you want to encode/pad to sequences longer than 512 you can either instantiate this tokenizer with `model_max_length` or pass `max_length` when encoding/padding.


# Input Text

In [22]:
text = """
As Kingdom preachers, we also face the challenge of overcoming a language barrier. 
In the revelation that he gave to the apostle John, Jesus disclosed that in our day the
 good news would overcome that obstacle. (Read Revelation 14:6, 7.) How so? We are giving 
 as many people as possible a chance to respond to the Kingdom message. Today, people around 
 the world can read Bible-based material on our jw.org website because it contains information 
 in over 1,000 languages! Approval was given to translate the Enjoy Life Forever! book​—our main
  study tool for making disciples—​into more than 700 languages! Spiritual food has also been made available by means of video recordings
   for the deaf and publications in braille for the blind. We are seeing Bible prophecies being fulfilled. People “out of all the languages 
   of the nations” are learning to speak the “pure language” of Bible truth. (Zech. 8:23; Zeph. 3:9)
 All of this is being accomplished under the capable oversight of Jesus Christ.
"""


# Tokenize Text

In [23]:
tokens_input = tokenizer.encode("summarize: " + text,
                              return_tensors='pt',
                              max_length=tokenizer.model_max_length,
                              truncation=True)

# Generate Summary

In [24]:
summary_ids = model.generate(tokens_input, min_length=80, 
                           max_length=150, length_penalty=15, 
                           num_beams=2)

In [25]:
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

In [26]:
summary 

'we are giving as many people as possible a chance to respond to the Kingdom message. people around the world can read Bible-based material on our jw.org website. people "out of all the languages of the nations" are learning to speak the "pure language" of Bible truth. we are seeing prophecies being fulfilled and people "out of all the languages of the nations" are learning to speak the "pure language" of Bible truth.'

Since we got this summary by setting min_length=80 and max_length=150, we will try tweaking those parameters to see if we come up with a different output

In [27]:
tokens_input = tokenizer.encode("summarize: " + text,
                              return_tensors='pt',
                              max_length=tokenizer.model_max_length,
                              truncation=True)

In [28]:
summary_ids = model.generate(tokens_input, min_length=100, 
                           max_length=200, length_penalty=15, 
                           num_beams=2)

In [29]:
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

In [30]:
summary

'we are giving as many people as possible a chance to respond to the Kingdom message. people around the world can read Bible-based material on our jw.org website. people "out of all the languages of the nations" are learning to speak the "pure language" of Bible truth. we are seeing prophecies being fulfilled and people "out of all the languages of the nations" are learning to speak the "pure language" of Bible truth.'

By increasing the min_length by 20 and max_length by 50 the output is exactly the same 

In [31]:
tokens_input = tokenizer.encode("summarize: " + text,
                              return_tensors='pt',
                              max_length=tokenizer.model_max_length,
                              truncation=True)

In [32]:
summary_ids = model.generate(tokens_input, min_length=200, 
                           max_length=400, length_penalty=15, 
                           num_beams=2)

In [33]:
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

In [34]:
summary

'we are giving as many people as possible a chance to respond to the Kingdom message. people around the world can read Bible-based material on our jw.org website. people "out of all the languages of the nations" are learning to speak the "pure language" of Bible truth. we are seeing prophecies being fulfilled and people "out of all the languages of the nations" are learning to speak the "pure language" of Bible truth. jw.org: "we are giving as many            .                                                            " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " "'

We tried considerably increasing the min_length and max_length by 100 and 200 respectively and we got an output of meaningless text with alot of noise

In [35]:
tokens_input = tokenizer.encode("summarize: " + text,
                              return_tensors='pt',
                              max_length=tokenizer.model_max_length,
                              truncation=True)

In [36]:
summary_ids = model.generate(tokens_input, min_length=40, 
                           max_length=100, length_penalty=15, 
                           num_beams=2)

In [37]:
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

In [38]:
summary

'we are giving as many people as possible a chance to respond to the Kingdom message. people around the world can read Bible-based material on our jw.org website. people "out of all the languages of the nations" are learning to speak the "pure language" of Bible truth. we are seeing prophecies being fulfilled and people "out of all the languages of the nations" are learning to speak the "pure language" of Bible truth'

By setting min_length = 40 and max_length = 100 we got an output summary of meaningful yet shorter text than the original summarization