# 以Transformers套件實作文字生成(Text Generation)功能

In [1]:
from transformers import pipeline

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
text_generator = pipeline("text-generation")

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
(…)ingface.co/gpt2/resolve/main/config.json: 100%|█████████████████████████████████| 665/665 [00:00<00:00, 332kB/s]
model.safetensors: 100%|████████████████████████████████████████████████████████| 548M/548M [00:13<00:00, 41.3MB/s]
(…)gpt2/resolve/main/generation_config.json: 100%|████████████████████████████████| 124/124 [00:00<00:00, 61.9kB/s]
(…)gingface.co/gpt2/resolve/main/vocab.json: 100%|████████████████████████████| 1.04M/1.04M [00:00<00:00, 1.64MB/s]
(…)gingface.co/gpt2/resolve/main/merges.txt: 100%|██████████████████████████████| 456k/456k [00:00<00:00, 5.60MB/s]
(…)face.co/gpt2/resolve/main/tokenizer.json: 100%|████████████████████████████| 1.36M/1.36M [00:00<00:00, 6.76MB/s]
Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the f

In [4]:
print(text_generator("As far as I am concerned, I will", 
                     max_length=50, do_sample=False))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'As far as I am concerned, I will be the first to admit that I am not a fan of the idea of a "free market." I think that the idea of a free market is a bit of a stretch. I think that the idea'}]


In [8]:
print(text_generator("As far as I am concerned, I will", 
                     max_length=50, do_sample=True))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "As far as I am concerned, I will have to do it. I just hope it's not for me because I don't want it to be my only option because of the way there are two very different things he and I are experiencing on Earth"}]


# 結合Tokenizer

In [9]:
# 載入相關套件
from transformers import AutoModelForCausalLM, AutoTokenizer

# 結合分詞器(Tokenizer)
model = AutoModelForCausalLM.from_pretrained("xlnet-base-cased")
tokenizer = AutoTokenizer.from_pretrained("xlnet-base-cased")

(…)lnet-base-cased/resolve/main/config.json: 100%|█████████████████████████████████| 760/760 [00:00<00:00, 383kB/s]
pytorch_model.bin: 100%|████████████████████████████████████████████████████████| 467M/467M [00:07<00:00, 66.4MB/s]
(…)ased/resolve/main/generation_config.json: 100%|████████████████████████████████| 137/137 [00:00<00:00, 45.4kB/s]
(…)net-base-cased/resolve/main/spiece.model: 100%|██████████████████████████████| 798k/798k [00:00<00:00, 8.07MB/s]
(…)t-base-cased/resolve/main/tokenizer.json: 100%|████████████████████████████| 1.38M/1.38M [00:00<00:00, 5.70MB/s]


# 短文與提示

In [10]:
# 短文
PADDING_TEXT = """In 1991, the remains of Russian Tsar Nicholas II and his family
(except for Alexei and Maria) are discovered.
The voice of Nicholas's young son, Tsarevich Alexei Nikolaevich, narrates the
remainder of the story. 1883 Western Siberia,
a young Grigori Rasputin is asked by his father and a group of men to perform magic.
Rasputin has a vision and denounces one of the men as a horse thief. Although his
father initially slaps him for making such an accusation, Rasputin watches as the
man is chased outside and beaten. Twenty years later, Rasputin sees a vision of
the Virgin Mary, prompting him to become a priest. Rasputin quickly becomes famous,
with people, even a bishop, begging for his blessing. <eod> </s> <eos>"""

# 提示
prompt = "Today the weather is really nice and I am planning on "

# 推測答案

In [11]:
inputs = tokenizer(PADDING_TEXT + prompt, add_special_tokens=False, 
                   return_tensors="pt")["input_ids"]

prompt_length = len(tokenizer.decode(inputs[0]))
outputs = model.generate(inputs, max_length=250, do_sample=True, 
                         top_p=0.95, top_k=60)
generated = prompt + tokenizer.decode(outputs[0])[prompt_length + 1 :]

print(generated)

This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (-1). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.


Today the weather is really nice and I am planning on flying to India for 4 weeks. I will be staying in the Bangalore-Lampin, the first part of a few weeks in India before being to India. I think that will be a great plan to go back for a few weeks before I get back. I think of it as a plan of study for three weeks, leaving the city of Bangalore and going back
