# 二、自然语言处理之模型应用----文本摘要

HuggingFace有一个巨大的模型库，其中一些是已经非常成熟的经典模型，这些模型即使不进行任何训练也能直接得出比较好的预测结果，也就是常说的Zero Shot Learning。

In [1]:
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

### 1) 下载模型

In [3]:
# 下载模型
#!HF_ENDPOINT=https://hf-mirror.com hf download Falconsai/text_summarization --local-dir ../models/Falconsai/text_summarization

### 2) 使用pipeline加载模型

使用管道工具时，调用者需要做的只是告诉管道工具要进行的任务类型，管道工具会自动分配合适的模型，直接给出预测结果，如果这个预测结果对于调用者已经可以满足需求，则不再需要再训练。

管道工具的API非常简洁，隐藏了大量复杂的底层代码，即使是非专业人员也能轻松使用。

In [4]:
# 文本摘要
from transformers import pipeline
summarizer = pipeline(task="summarization",
                      model="../models/Falconsai/text_summarization",
                      device=device)

Device set to use cuda


### 3) 查看模型的配置信息

In [5]:
# 查看模型的配置信息
print(summarizer.model.config)

T5Config {
  "architectures": [
    "T5ForConditionalGeneration"
  ],
  "classifier_dropout": 0.0,
  "d_ff": 2048,
  "d_kv": 64,
  "d_model": 512,
  "decoder_start_token_id": 0,
  "dense_act_fn": "relu",
  "dropout_rate": 0.1,
  "dtype": "float32",
  "eos_token_id": 1,
  "feed_forward_proj": "relu",
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "is_gated_act": false,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "n_positions": 512,
  "num_decoder_layers": 6,
  "num_heads": 8,
  "num_layers": 6,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_max_distance": 128,
  "relative_attention_num_buckets": 32,
  "task_specific_params": {
    "summarization": {
      "early_stopping": true,
      "length_penalty": 2.0,
      "max_length": 200,
      "min_length": 30,
      "no_repeat_ngram_size": 3,
      "num_beams": 4
    },
    "translation_en_to_de": {
      "early_stopping": true,
      "max_length": 300,
      "num_beams": 4,
      "prefix": "transl

### 4) 定义一段文本

In [6]:
ARTICLE = """ Long long ago， there lived a king. He loved horses. One day he asked 
an artist to draw him a beautiful horse. The artist said, "All right, but you must 
wait." So the king waited. He waited and waited. At last, after a year he could not 
wait any longer. He went to see the artist himself. Quickly the artist brought out 
paper and a brush. In five minutes he finished drawing a very bea，utiful horse. 
The king was angry. "You can draw a good horse in five minutes, yet you kept me 
waiting for a year. Why?" "Come with me, please." said the artist. They went to the 
artist's workroom. There they saw piles and piles of paper. On every piece of paper 
was a picture of a horse. "It took me more than a year to learn to draw a beautiful 
horse in five minutes." the artist said.
"""

"""
很久很久以前，有一位国王。他喜欢马。一天，他请一位画家给他画一匹漂亮的马。艺术家说:“好吧，
但你必须等一等。”于是国王等待着。他等了又等。一年之后，他终于等不下去了。他亲自去看了那位
艺术家。画家很快拿出了纸和画笔。不到五分钟，他就画好了一匹又大又漂亮的马。国王很生气。“你
五分钟就能画一匹好马，可你却让我等了一年。为什么?”“请跟我来。”艺术家说。他们去了艺术家的
工作室。他们在那里看到了成堆的纸张。每张纸上都画着一匹马。这位艺术家说:“我花了一年多的时
间才学会在五分钟内画出一匹漂亮的马。”
"""

'\n很久很久以前，有一位国王。他喜欢马。一天，他请一位画家给他画一匹漂亮的马。艺术家说:“好吧，\n但你必须等一等。”于是国王等待着。他等了又等。一年之后，他终于等不下去了。他亲自去看了那位\n艺术家。画家很快拿出了纸和画笔。不到五分钟，他就画好了一匹又大又漂亮的马。国王很生气。“你\n五分钟就能画一匹好马，可你却让我等了一年。为什么?”“请跟我来。”艺术家说。他们去了艺术家的\n工作室。他们在那里看到了成堆的纸张。每张纸上都画着一匹马。这位艺术家说:“我花了一年多的时\n间才学会在五分钟内画出一匹漂亮的马。”\n'

### 5) 文本摘要

In [7]:
result = summarizer(ARTICLE, max_length=100, min_length=30, do_sample=False)
print(result)

Both `max_new_tokens` (=256) and `max_length`(=100) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


[{'summary_text': 'The king asked an artist to draw him a beautiful horse in five minutes . After a year he could not wait any longer . "Come with me, please," said the artist .'}]
