# 以Transformers套件实作文字生成(Text Generation)功能

In [1]:
# 载入相关套件
from transformers import pipeline

In [2]:
# 载入模型
text_generator = pipeline("text-generation")

Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [8]:
# 测试
print(text_generator("As far as I am concerned, I will", 
                     max_length=50, do_sample=False))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'As far as I am concerned, I will be the first to admit that I am not a fan of the idea of a "free market." I think that the idea of a free market is a bit of a stretch. I think that the idea'}]


In [6]:
# 测试
print(text_generator("As far as I am concerned, I will", 
                     max_length=50, do_sample=True))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'As far as I am concerned, I will not be using the name \'Archer\', even though it\'d make all of me cry!\n\n"I\'ll wait until they leave me, you know, on this little ship, of course,'}]


## 结合Tokenizer

In [9]:
# 载入相关套件
from transformers import TFAutoModelWithLMHead, AutoTokenizer

# 结合分词器(Tokenizer)
model = TFAutoModelWithLMHead.from_pretrained("xlnet-base-cased")
tokenizer = AutoTokenizer.from_pretrained("xlnet-base-cased")



Downloading:   0%|          | 0.00/760 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/565M [00:00<?, ?B/s]

All model checkpoint layers were used when initializing TFXLNetLMHeadModel.

All the layers of TFXLNetLMHeadModel were initialized from the model checkpoint at xlnet-base-cased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFXLNetLMHeadModel for predictions without further training.


Downloading:   0%|          | 0.00/798k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.38M [00:00<?, ?B/s]

In [10]:
# 针对短提示， XLNet 通常要补充说明(Padding) 
PADDING_TEXT = """In 1991, the remains of Russian Tsar Nicholas II and his family
(except for Alexei and Maria) are discovered.
The voice of Nicholas's young son, Tsarevich Alexei Nikolaevich, narrates the
remainder of the story. 1883 Western Siberia,
a young Grigori Rasputin is asked by his father and a group of men to perform magic.
Rasputin has a vision and denounces one of the men as a horse thief. Although his
father initially slaps him for making such an accusation, Rasputin watches as the
man is chased outside and beaten. Twenty years later, Rasputin sees a vision of
the Virgin Mary, prompting him to become a priest. Rasputin quickly becomes famous,
with people, even a bishop, begging for his blessing. <eod> </s> <eos>"""

# 提示
prompt = "Today the weather is really nice and I am planning on "

In [11]:
# 推测答案
inputs = tokenizer.encode(PADDING_TEXT + prompt, add_special_tokens=False,
                          return_tensors="tf")
prompt_length = len(tokenizer.decode(inputs[0], skip_special_tokens=True,
                                     clean_up_tokenization_spaces=True))
outputs = model.generate(inputs, max_length=250, do_sample=True, top_p=0.95,
                         top_k=60)
generated = prompt + tokenizer.decode(outputs[0])[prompt_length:]

print(generated)

Today the weather is really nice and I am planning on anning on getting some good photos. I need to take some long-running pictures of the past few weeks and "in the moment."<eop> We are on a beach, right on the coast of Alaska. It is beautiful. It is peaceful. It is very quiet. It is peaceful. I am trying not to be too self-centered. But if the sun doesn
