# HF Transformers 核心模块学习：Pipelines

**Pipelines**（管道）是使用模型进行推理的一种简单易上手的方式。

这些管道是抽象了 Transformers 库中大部分复杂代码的对象，提供了一个专门用于多种任务的简单API，包括**命名实体识别、掩码语言建模、情感分析、特征提取和问答**等。


| Modality                    | Task                         | Description                                                | Pipeline API                                  |
| --------------------------- | ---------------------------- | ---------------------------------------------------------- | --------------------------------------------- |
| Audio                       | Audio classification         | 为音频文件分配一个标签                                     | pipeline(task=“audio-classification”)         |
|                             | Automatic speech recognition | 将音频文件中的语音提取为文本                               | pipeline(task=“automatic-speech-recognition”) |
| Computer vision             | Image classification         | 为图像分配一个标签                                         | pipeline(task=“image-classification”)         |
|                             | Object detection             | 预测图像中目标对象的边界框和类别                           | pipeline(task=“object-detection”)             |
|                             | Image segmentation           | 为图像中每个独立的像素分配标签（支持语义、全景和实例分割） | pipeline(task=“image-segmentation”)           |
| Natural language processing | Text classification          | 为给定的文本序列分配一个标签                               | pipeline(task=“sentiment-analysis”)           |
|                             | Token classification         | 为序列里的每个 token 分配一个标签（人, 组织, 地址等等）    | pipeline(task=“ner”)                          |
|                             | Question answering           | 通过给定的上下文和问题, 在文本中提取答案                   | pipeline(task=“question-answering”)           |
|                             | Summarization                | 为文本序列或文档生成总结                                   | pipeline(task=“summarization”)                |
|                             | Translation                  | 将文本从一种语言翻译为另一种语言                           | pipeline(task=“translation”)                  |
| Multimodal                  | Document question answering  | 根据给定的文档和问题回答一个关于该文档的问题。             | pipeline(task=“document-question-answering”)  |
|                             | Visual Question Answering    | 给定一个图像和一个问题，正确地回答有关图像的问题           | pipeline(task=“vqa”)                          |


Pipelines 已支持的完整任务列表：https://huggingface.co/docs/transformers/task_summary


## Pipeline API

**Pipeline API** 是对所有其他可用管道的包装。它可以像任何其他管道一样实例化，并且降低AI推理的学习和使用成本。-

- NLP(自然语言处理)
    - Text Classification(文本分类任务)：将一个文本序列（可以是句子级别、段落或者整篇文章）标记为预定义的类别集合之一。
        - 情感分析：根据某种极性（如积极或消极）对文本进行标记，以在政治、金融和市场等领域支持决策制定。
        - 内容分类：根据某个主题对文本进行标记，以帮助组织和过滤新闻和社交媒体信息流中的信息（天气、体育、金融等）。
    - Token Classification(Token分类任务)：将每个token分配一个来自预定义类别集的标签。
        - 命名实体识别（NER）：根据实体类别（如组织、人员、位置或日期）对token进行标记。NER在生物医学设置中特别受欢迎，可以标记基因、蛋白质和药物名称。
        - 词性标注（POS）：根据其词性（如名词、动词或形容词）对标记进行标记。POS对于帮助翻译系统了解两个相同的单词如何在语法上不同很有用（作为名词的银行与作为动词的银行）。
    - Question Answering(问答任务)：另一个token-level的任务，返回一个问题的答案，有时带有上下文（开放领域），有时不带上下文（封闭领域）。每当我们向虚拟助手提出问题时，例如询问一家餐厅是否营业，就会发生这种情况。它还可以提供客户或技术支持，并帮助搜索引擎检索您要求的相关信息。
        - 提取式：给定一个问题和一些上下文，模型必须从上下文中提取出一段文字作为答案
        - 生成式：给定一个问题和一些上下文，答案是根据上下文生成的
    - Summarization(文本摘要）：从较长的文本中创建一个较短的版本，同时尽可能保留原始文档的大部分含义。摘要是一个序列到序列的任务；它输出比输入更短的文本序列。有许多长篇文档可以进行摘要，以帮助读者快速了解主要要点。法案、法律和财务文件、专利和科学论文等文档可以摘要，以节省读者的时间并作为阅读辅助工具。
        - 提取式：从原始文本中识别和提取最重要的句子
        - 生成式：从原始文本中生成目标摘要（可能包括输入文件中没有的新单词）
- Audio 音频处理任务
    - Audio classification(音频分类)：是一项将音频数据从预定义的类别集合中进行标记的任务。
    - Automatic speech recognition（自动语音识别）：将语音转录为文本。
- Computer Vision 计算机视觉
    - Image Classificaiton(图像分类)：将整个图像从预定义的类别集合中进行标记。 
    - Object Detection(目标检测):与图像分类不同，目标检测在图像中识别多个对象以及这些对象在图像中的位置（由边界框定义）。

## 对比不同模型在相同任务上的性能表现

针对NLP(自然语言处理)的中文处理进行任务对比

### Text Classification(文本分类任务)

In [11]:
# 中文和英文情感分类对比
from transformers import pipeline
sentiment_task = pipeline(
    model="lxyuan/distilbert-base-multilingual-cased-sentiments-student", 
    top_k=1
)
sentiment_task("今儿北京可真冷啊!")

[[{'label': 'negative', 'score': 0.8286268711090088}]]

In [12]:
sentiment_task("Today Beijing is really cold.")

[[{'label': 'negative', 'score': 0.7933650612831116}]]

In [13]:
sentiment_task("你学东西真的好快，理论课一讲就明白了!")

[[{'label': 'positive', 'score': 0.953176736831665}]]

In [14]:
sentiment_task("You learn things really quickly. You understand the theory class as soon as it is taught.")

[[{'label': 'positive', 'score': 0.7639099359512329}]]

### Token Classification(Token分类任务)

In [19]:
# 中文Token分类
from transformers import AutoTokenizer,AutoModelForTokenClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-large-finetuned-conll03-english")
model = AutoModelForTokenClassification.from_pretrained("xlm-roberta-large-finetuned-conll03-english")
classifier = pipeline("ner", model=model, tokenizer=tokenizer)
classifier("中国的首都是北京，去北京可以到天安门广场上看升国旗，可以通过百度的搜索更多详细信息。")

Some weights of the model checkpoint at xlm-roberta-large-finetuned-conll03-english were not used when initializing XLMRobertaForTokenClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing XLMRobertaForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing XLMRobertaForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'entity': 'I-LOC',
  'score': 0.9879116,
  'index': 2,
  'word': '中国的',
  'start': 0,
  'end': 3},
 {'entity': 'I-LOC',
  'score': 0.99999654,
  'index': 5,
  'word': '北京',
  'start': 6,
  'end': 8},
 {'entity': 'I-LOC',
  'score': 0.9999956,
  'index': 8,
  'word': '北京',
  'start': 10,
  'end': 12},
 {'entity': 'I-LOC',
  'score': 0.999691,
  'index': 11,
  'word': '天',
  'start': 15,
  'end': 16},
 {'entity': 'I-LOC',
  'score': 0.99995625,
  'index': 12,
  'word': '安',
  'start': 16,
  'end': 17},
 {'entity': 'I-LOC',
  'score': 0.9998363,
  'index': 13,
  'word': '门',
  'start': 17,
  'end': 18},
 {'entity': 'I-ORG',
  'score': 0.9885372,
  'index': 22,
  'word': '百度',
  'start': 30,
  'end': 32}]

In [20]:
# 合并实体后，天安门可以整体识别
classifier = pipeline("ner", model=model, tokenizer=tokenizer, grouped_entities=True)
classifier("中国的首都是北京，去北京可以到天安门广场上看升国旗，可以通过百度的搜索更多详细信息。")

[{'entity_group': 'LOC',
  'score': 0.9879116,
  'word': '中国的',
  'start': 0,
  'end': 3},
 {'entity_group': 'LOC',
  'score': 0.99999654,
  'word': '北京',
  'start': 6,
  'end': 8},
 {'entity_group': 'LOC',
  'score': 0.9999956,
  'word': '北京',
  'start': 10,
  'end': 12},
 {'entity_group': 'LOC',
  'score': 0.99982786,
  'word': '天安门',
  'start': 15,
  'end': 18},
 {'entity_group': 'ORG',
  'score': 0.9885372,
  'word': '百度',
  'start': 30,
  'end': 32}]

In [35]:
# 英文合并实体对比
classifier("Hugging Face is a French company based in New York City.")

[{'entity_group': 'ORG',
  'score': 0.9999173,
  'word': 'Hugging Face',
  'start': 0,
  'end': 12},
 {'entity_group': 'MISC',
  'score': 0.9999932,
  'word': 'French',
  'start': 18,
  'end': 24},
 {'entity_group': 'LOC',
  'score': 0.99999696,
  'word': 'New York City',
  'start': 42,
  'end': 55}]

### Question Answering(问答任务)

In [44]:
# 中文问答，提取式
from transformers import AutoModelForQuestionAnswering,AutoTokenizer,pipeline
model = AutoModelForQuestionAnswering.from_pretrained('uer/roberta-base-chinese-extractive-qa')
tokenizer = AutoTokenizer.from_pretrained('uer/roberta-base-chinese-extractive-qa')
QA = pipeline('question-answering', model=model, tokenizer=tokenizer)
QA_input = {'question': "什么时候去天安门广场可以看到升国旗？",'context': "北京天安门广场每天早上6点左右会有升国旗仪式。"}
preds = QA(QA_input)
print(
    f"score: {round(preds['score'], 4)}, start: {preds['start']}, end: {preds['end']}, answer: {preds['answer']}"
)

score: 0.2947, start: 7, end: 15, answer: 每天早上6点左右


In [47]:
# 英文问答对比，没有回答出来
QA_input = {'question': "What is the capital of China?",'context': "On 1 October 1949, CCP Chairman Mao Zedong formally proclaimed the People's Republic of China in Tiananmen Square, Beijing."}
preds = QA(QA_input)
print(
    f"score: {round(preds['score'], 4)}, start: {preds['start']}, end: {preds['end']}, answer: {preds['answer']}"
)

score: 0.0, start: 3, end: 4, answer: 1


In [45]:
# 英文问答对比，回答正确
QA_input = {'question': "What is the name of the repository?",'context': "The name of the repository is huggingface/transformers"}
preds = QA(QA_input)
print(
    f"score: {round(preds['score'], 4)}, start: {preds['start']}, end: {preds['end']}, answer: {preds['answer']}"
)

score: 0.0312, start: 30, end: 54, answer: huggingface/transformers


### Summarization(文本摘要）

In [34]:
# 中文摘要，试了几个效果不理想
import re
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

WHITESPACE_HANDLER = lambda k: re.sub('\s+', ' ', re.sub('\n+', ' ', k.strip()))

article_text = """小米汽车su7在续航方面，根据不同配置，分别搭载来自襄阳弗迪的磷酸铁锂电池组（比亚迪），以及来自宁德时代的三元锂电池组，电池容量分别为73.6千瓦时、94.3千瓦时和101千瓦时，其中73.6千瓦时对应的纯电续航里程700km，94.3千瓦时对应的纯电续航为830km，101千瓦时对应的纯电续航里程分别为800km（均为CLTC工况）。另外，小米SU7搭载了800V高压平台，配合宁德时代的麒麟电池，可以做到充电5分钟，220km续航，充电15分钟，510km续航的表现。"""

model_name = "csebuetnlp/mT5_m2m_crossSum_enhanced"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

get_lang_id = lambda lang: tokenizer._convert_token_to_id(
    model.config.task_specific_params["langid_map"][lang][1]
) 

target_lang = "chinese_simplified" # for a list of available language names see below

input_ids = tokenizer(
    [WHITESPACE_HANDLER(article_text)],
    return_tensors="pt",
    padding="max_length",
    truncation=True,
    max_length=512
)["input_ids"]

output_ids = model.generate(
    input_ids=input_ids,
    decoder_start_token_id=get_lang_id(target_lang),
    max_length=84,
    no_repeat_ngram_size=2,
    num_beams=4,
)[0]

summary = tokenizer.decode(
    output_ids,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False
)

print(summary)


<extra_id_59> 中国小米汽车su7在续航方面,搭载来自宁德时代的三元锂电池组。


In [50]:
# 英文摘要对比，总结关注重点不同
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

WHITESPACE_HANDLER = lambda k: re.sub('\s+', ' ', re.sub('\n+', ' ', k.strip()))

article_text = """In this work, we presented the Transformer, the first sequence transduction model based entirely on attention, 
    replacing the recurrent layers most commonly used in encoder-decoder architectures with multi-headed self-attention. 
    For translation tasks, the Transformer can be trained significantly faster than architectures based on recurrent or convolutional layers. 
    On both WMT 2014 English-to-German and WMT 2014 English-to-French translation tasks, we achieve a new state of the art. 
    In the former task our best model outperforms even all previously reported ensembles."""

model_name = "csebuetnlp/mT5_m2m_crossSum_enhanced"
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

get_lang_id = lambda lang: tokenizer._convert_token_to_id(
    model.config.task_specific_params["langid_map"][lang][1]
) 

target_lang = "english" # for a list of available language names see below

input_ids = tokenizer(
    [WHITESPACE_HANDLER(article_text)],
    return_tensors="pt",
    padding="max_length",
    truncation=True,
    max_length=512
)["input_ids"]

input_ids = tokenizer(
    [WHITESPACE_HANDLER(article_text)],
    return_tensors="pt",
    padding="max_length",
    truncation=True,
    max_length=512
)["input_ids"]

output_ids = model.generate(
    input_ids=input_ids,
    decoder_start_token_id=get_lang_id(target_lang),
    max_length=84,
    no_repeat_ngram_size=2,
    num_beams=4,
)[0]

summary = tokenizer.decode(
    output_ids,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False
)

print(summary)

<extra_id_69> The first sequence transduction model based on attention has been revealed in the WMT 2014 English-to-French translation task.


### Automatic speech recognition（自动语音识别）

In [1]:
model_name_or_path = "openai/whisper-large-v2"
model_dir = "./models/whisper-large-v2-asr-int8"

language = "Chinese (China)"
language_decode = "chinese"
task = "transcribe"

#### openai/whisper-large-v2基础模型测试
识别结果内容正确，但是为英文，没有输出中文

In [2]:
from transformers import AutoModelForSpeechSeq2Seq, AutoTokenizer, AutoProcessor

base_model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_name_or_path, load_in_8bit=True, device_map="auto"
)

base_model.eval

<bound method Module.eval of WhisperForConditionalGeneration(
  (model): WhisperModel(
    (encoder): WhisperEncoder(
      (conv1): Conv1d(80, 1280, kernel_size=(3,), stride=(1,), padding=(1,))
      (conv2): Conv1d(1280, 1280, kernel_size=(3,), stride=(2,), padding=(1,))
      (embed_positions): Embedding(1500, 1280)
      (layers): ModuleList(
        (0-31): 32 x WhisperEncoderLayer(
          (self_attn): WhisperSdpaAttention(
            (k_proj): Linear8bitLt(in_features=1280, out_features=1280, bias=False)
            (v_proj): Linear8bitLt(in_features=1280, out_features=1280, bias=True)
            (q_proj): Linear8bitLt(in_features=1280, out_features=1280, bias=True)
            (out_proj): Linear8bitLt(in_features=1280, out_features=1280, bias=True)
          )
          (self_attn_layer_norm): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
          (activation_fn): GELUActivation()
          (fc1): Linear8bitLt(in_features=1280, out_features=5120, bias=True)
      

In [3]:
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, language=language, task=task)
processor = AutoProcessor.from_pretrained(model_name_or_path, language=language, task=task)
feature_extractor = processor.feature_extractor

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [4]:
test_audio = "./data/audio/20240405_002217.mp3"

In [5]:
from transformers import AutomaticSpeechRecognitionPipeline

pipeline_base = AutomaticSpeechRecognitionPipeline(model=base_model, tokenizer=tokenizer, feature_extractor=feature_extractor)

forced_decoder_ids = processor.get_decoder_prompt_ids(language=language_decode, task=task)

In [6]:
import torch
with torch.cuda.amp.autocast():
    text_base = pipeline_base(test_audio, max_new_tokens=255)["text"]
text_base



' Use complete data to compare the change of Twin Loss and Validation Loss in the training set. After the training is completed, use the test set to perform model assessment.'

#### openai/whisper-large-v2经过mozilla-foundation/common_voice_11_0中文语料进行lora微调后的测试
经过中文语料进行lora微调后，可以输出中文

In [7]:
from peft import PeftModel

peft_model = PeftModel.from_pretrained(base_model, model_dir)

peft_model.eval

<bound method Module.eval of PeftModel(
  (base_model): LoraModel(
    (model): WhisperForConditionalGeneration(
      (model): WhisperModel(
        (encoder): WhisperEncoder(
          (conv1): Conv1d(80, 1280, kernel_size=(3,), stride=(1,), padding=(1,))
          (conv2): Conv1d(1280, 1280, kernel_size=(3,), stride=(2,), padding=(1,))
          (embed_positions): Embedding(1500, 1280)
          (layers): ModuleList(
            (0-31): 32 x WhisperEncoderLayer(
              (self_attn): WhisperSdpaAttention(
                (k_proj): Linear8bitLt(in_features=1280, out_features=1280, bias=False)
                (v_proj): lora.Linear8bitLt(
                  (base_layer): Linear8bitLt(in_features=1280, out_features=1280, bias=True)
                  (lora_dropout): ModuleDict(
                    (default): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (default): Linear(in_features=1280, out_features=4, bias=False)
   

In [8]:
pipeline_base_peft = AutomaticSpeechRecognitionPipeline(model=peft_model, tokenizer=tokenizer, feature_extractor=feature_extractor)

In [9]:
import torch
with torch.cuda.amp.autocast():
    text_base_peft = pipeline_base_peft(test_audio, max_new_tokens=255)["text"]
text_base_peft



'使用完整的数据训练及对比Train Loss和Validation Loss变化，训练完成后使用测试级进行模型评估。'