# 部署你的第一个语言模型。
在之前的章节中，我们已经了解 Hugging Face 中 AutoModel 系列的不同类。现在，我们将使用一个参数量较小的模型进行演示。

In [8]:
import torch
from transformers import AutoTokenizer,AutoModelForCausalLM
import os
os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"
# 精简版GPT-2模型
model_name = "distilgpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)


In [9]:
# 把模型移到GPU上
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)


GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-5): 6 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D(nf=2304, nx=768)
          (c_proj): Conv1D(nf=768, nx=768)
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D(nf=3072, nx=768)
          (c_proj): Conv1D(nf=768, nx=3072)
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)

In [11]:
# 进行推理
# 设置模型为评估模式
model.eval()
input_text="Hello GPT"
# 编码输入文本
inputs = tokenizer(input_text, return_tensors='pt')
inputs= {key: value.to(device) for key, value in inputs.items()}
beam_width=[1,3,5]
# 生成文本
for beam in beam_width:
    with torch.no_grad():
        outputs = model.generate(
            **inputs, # 输入
            max_length=200, # 最大长度
            num_beams=beam, # 最优的 5 个候选序列,Beam Search 的数量，提高生成文本的质量
            no_repeat_ngram_size=2, # 禁止重复的n-gram大小,防止生成重复的 n-gram
            early_stopping=True # 早停,当使用 Beam Search 时，若所有候选序列都生成了结束标记（如 <eos>），则提前停止生成，这有助于生成更自然和适当长度的文本。
        )
# 解码输出文本
    generated_text=tokenizer.decode(outputs[0], skip_special_tokens=True)
    print(f"束宽 {beam} 的生成结果：")
    print(generated_text)
    print("-"*50)    


The following generation flags are not valid and may be ignored: ['early_stopping']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


束宽 1 的生成结果：
Hello GPT is a free and open source software project that aims to provide a platform for developers to build and use GPGP-based GPSP based GPCs. GPP is an open-source software development platform that is designed to allow developers and developers of GGPP to use the GPRP protocol.

GPP has been developed by the OpenPGp Foundation and is based on the open standards of the GNU/Linux Foundation. The GP Protocol is the standard protocol for GPs that are used by GIPP and GSPP. It is also the protocol used for the use of a GTP protocol, GSMTP, and other GMPP protocols.
--------------------------------------------------
束宽 3 的生成结果：
Hello GPT.

This article is part of a series of articles on the topic, and will be updated as more information becomes available.
--------------------------------------------------
束宽 5 的生成结果：
Hello GPT.

This article was originally published on The Conversation. Read the original article.
--------------------------------------------------
