<a href="https://colab.research.google.com/github/rakeshdintakurthi/PRODIGY_GA_01/blob/main/textgenerate1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install -q transformers


In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

torch_device = "cuda" if torch.cuda.is_available() else "cpu"

tokenizer = AutoTokenizer.from_pretrained("gpt2")

# add the EOS token as PAD token to avoid warnings
model = AutoModelForCausalLM.from_pretrained("gpt2", pad_token_id=tokenizer.eos_token_id).to(torch_device)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [4]:
# encode context the generation is conditioned on
model_inputs = tokenizer('I am batman of this city', return_tensors='pt').to(torch_device)

# generate 40 new tokens
greedy_output = model.generate(**model_inputs, max_new_tokens=100)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(greedy_output[0], skip_special_tokens=True))


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------
I am batman of this city, and I am a man of the people. I am a man of the people. I am a man of the people. I am a man of the people. I am a man of the people. I am a man of the people. I am a man of the people. I am a man of the people. I am a man of the people. I am a man of the people. I am a man of the people. I am a man of the people. I am


In [6]:
# activate beam search and early_stopping
beam_output = model.generate(
    **model_inputs,
    max_new_tokens=100,
    num_beams=5,
    early_stopping=True
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(beam_output[0], skip_special_tokens=True))


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------
I am batman of this city."

"I am batman of this city."

"I am batman of this city."

"I am batman of this city."

"I am batman of this city."

"I am batman of this city."

"I am batman of this city."

"I am batman of this city."

"I am batman of this city."

"I am batman of this city."


In [8]:
# set no_repeat_ngram_size to 2
beam_output = model.generate(
    **model_inputs,
    max_new_tokens=100,
    num_beams=5,
    no_repeat_ngram_size=2,
    early_stopping=True
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(beam_output[0], skip_special_tokens=True))


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------
I am batman of this city, and I am not afraid of you."

"I will not be afraid," said he, "for I know that you are my friend. You are the only one who can help me. I will do my best to help you, but I cannot do it without your help. If you do not give me your aid, I shall not know what to do with you. But if I do give you my help, you will be glad of it, for you have given me the


In [9]:
# set return_num_sequences > 1
beam_outputs = model.generate(
    **model_inputs,
    max_new_tokens=100,
    num_beams=5,
    no_repeat_ngram_size=2,
    num_return_sequences=5,
    early_stopping=True
)

# now we have 3 output sequences
print("Output:\n" + 100 * '-')
for i, beam_output in enumerate(beam_outputs):
  print("{}: {}".format(i, tokenizer.decode(beam_output, skip_special_tokens=True)))


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------
0: I am batman of this city, and I am not afraid of you."

"I will not be afraid," said he, "for I know that you are my friend. You are the only one who can help me
1: I am batman of this city, and I am not afraid of you."

"I will not be afraid," said he, "for I know that you are my friend. You are the only one who can save me
2: I am batman of this city, and I am not afraid of you."

"I will not be afraid," said he, "for I know that you are my friend. You are the only one who knows that I
3: I am batman of this city, and I am not afraid of you."

"I will not be afraid," said he, "for I know that you are my friend. You are the only one who knows what I
4: I am batman of this city, and I am not afraid of you."

"I will not be afraid," said he, "for I know that you are my friend. You are the only one who knows me.


In [11]:
# set seed to reproduce results. Feel free to change the seed though to get different results
from transformers import set_seed
set_seed(42)

# activate sampling and deactivate top_k by setting top_k sampling to 0
sample_output = model.generate(
    **model_inputs,
    max_new_tokens=100,
    do_sample=True,
    top_k=0
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens=True))


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------
I am batman of this city but what I do you need to know. Therefore I will bring my word with you to restrain you from theespantry and to urge you to avoid this mark: Bombs are only for Arthas lives your honour. The first thing to do is to bring that Banners which I hope you will throw."

Beside him sat the old Dargon such a rattling Palestine smoking this snuffadace, and sat chiefly with both of them often singing to each other—ch


In [13]:
# set seed to reproduce results. Feel free to change the seed though to get different results
set_seed(42)

# use temperature to decrease the sensitivity to low probability candidates
sample_output = model.generate(
    **model_inputs,
    max_new_tokens=100,
    do_sample=True,
    top_k=0,
    temperature=0.6,
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens=True))


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------
I am batman of this city.

Who am I?

I am a man of good will, who is a jester, and a thief.

Who am I?

I am a man of good looks, who is a man of good character.

Who am I?

I am a man of good manners, who is a man of good taste.

Who am I?

I am a man of good will, who is a man of good character.



In [15]:
# set seed to reproduce results. Feel free to change the seed though to get different results
set_seed(42)

# set top_k to 50
sample_output = model.generate(
    **model_inputs,
    max_new_tokens=100,
    do_sample=True,
    top_k=50
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens=True))


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------
I am batman of this city but what I do in that city is different than everything else about my life."

While in London, the man told the Times: 'I feel very ashamed to be called a beggar. I am a journalist, so I can't even speak English. I am still in this society.

'But I am still there in the capital. I am in a very private life. I have not come out on my own; I am an independent citizen.

'I am


In [16]:
# set seed to reproduce results. Feel free to change the seed though to get different results
set_seed(42)

# set top_k to 50
sample_output = model.generate(
    **model_inputs,
    max_new_tokens=40,
    do_sample=True,
    top_p=0.92,
    top_k=0
)

print("Output:\n" + 100 * '-')
print(tokenizer.decode(sample_output[0], skip_special_tokens=True))


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------
I am batman of this city but what I do you need to know. Therefore I will bring my word with you to restrain you from the evil spread and to urge you to avoid this mark:

The Americans, this is


In [17]:
# set seed to reproduce results. Feel free to change the seed though to get different results
set_seed(42)

# set top_k = 50 and set top_p = 0.95 and num_return_sequences = 3
sample_outputs = model.generate(
    **model_inputs,
    max_new_tokens=40,
    do_sample=True,
    top_k=50,
    top_p=0.95,
    num_return_sequences=3,
)

print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(sample_outputs):
  print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Output:
----------------------------------------------------------------------------------------------------
0: I am batman of this city but what I do in that city is different than everything else about my life."

While in London, the man told the Times: "I feel very ashamed."

A spokesman for the
1: I am batman of this city. I am the best and I am the best who will succeed you. I am the greatest of men and I am my master. I am the captain, I am the commander, I am the
2: I am batman of this city, and will do so in no time," added the president. "To make it go away was my responsibility as leader of this country."

Cuba's current vice-president, Jose A
