## 查看Pipeline支持的任务类型

In [1]:
from transformers.pipelines import SUPPORTED_TASKS

In [2]:
i=1
for k,v in SUPPORTED_TASKS.items():
    print(i,k,v["type"])
    i+=1

1 audio-classification audio
2 automatic-speech-recognition multimodal
3 text-to-audio text
4 feature-extraction multimodal
5 text-classification text
6 token-classification text
7 question-answering text
8 table-question-answering text
9 visual-question-answering multimodal
10 document-question-answering multimodal
11 fill-mask text
12 summarization text
13 translation text
14 text2text-generation text
15 text-generation text
16 zero-shot-classification text
17 zero-shot-image-classification multimodal
18 zero-shot-audio-classification multimodal
19 image-classification image
20 image-feature-extraction image
21 image-segmentation multimodal
22 image-to-text multimodal
23 object-detection multimodal
24 zero-shot-object-detection multimodal
25 depth-estimation image
26 video-classification video
27 mask-generation multimodal
28 image-to-image image


## Pipeline的创建与使用方式

In [3]:
from transformers import pipeline

In [4]:
pipe = pipeline("text-classification")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


In [5]:
pipe("very good")

[{'label': 'POSITIVE', 'score': 0.9998520612716675}]

In [6]:
pipe("stupid")

[{'label': 'NEGATIVE', 'score': 0.9997732043266296}]

In [7]:
pipe(["very good!", "vary bad!"])

[{'label': 'POSITIVE', 'score': 0.9998525381088257},
 {'label': 'NEGATIVE', 'score': 0.9991207718849182}]

## 指定任务类型，再指定模型，创建基于指定模型的Pipeline

In [8]:
pipe = pipeline("text-classification", model="uer/roberta-base-finetuned-dianping-chinese")

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


In [9]:
pipe("很好")

[{'label': 'positive (stars 4 and 5)', 'score': 0.9320553541183472}]

## 预先加载模型，再创建pipeline

In [10]:
from transformers import AutoModelForSequenceClassification,AutoTokenizer

In [11]:
# 这种方式，必须同时指定model和tokenizer
model = AutoModelForSequenceClassification.from_pretrained("uer/roberta-base-finetuned-dianping-chinese")
tokenizer = AutoTokenizer.from_pretrained("uer/roberta-base-finetuned-dianping-chinese")
pipe = pipeline("text-classification",model=model,tokenizer=tokenizer)
# pipe = pipeline("sentiment-analysis",model=model,tokenizer=tokenizer)

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


In [12]:
pipe("好")

[{'label': 'positive (stars 4 and 5)', 'score': 0.9181749820709229}]

## cpu和gpu时间对比  

In [13]:
pipe.model.device  # 默认是cpu

device(type='cpu')

CPU

In [15]:
import time
import torch
times = []
for i in range(100):
    torch.cuda.synchronize()
    start = time.time()
    pipe("我觉得不太行！")
    torch.cuda.synchronize()
    times.append(time.time() - start)
print(sum(times)/100)

0.06038113594055176


GPU

In [16]:
# pipe = pipeline("text-classification",model=model,tokenizer=tokenizer,device="cuda")  # cpu
pipe = pipeline("text-classification",model=model,tokenizer=tokenizer,device=0)  # 都行

In [19]:
times = []
for i in range(100):
    torch.cuda.synchronize()
    start = time.time()
    pipe("我觉得不太行！")
    torch.cuda.synchronize()
    times.append(time.time() - start)
print(sum(times)/100)

0.014055802822113037


## 确定Pipeline参数

In [20]:
qa_pipe = pipeline("question-answering", model="uer/roberta-base-chinese-extractive-qa")

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


In [21]:
qa_pipe

<transformers.pipelines.question_answering.QuestionAnsweringPipeline at 0x23c64115a20>

In [22]:
from transformers import QuestionAnsweringPipeline 

In [23]:
qa_pipe(question="中国的首都是哪里？", context="中国的首都是北京", max_answer_len=1)   # 指定了max_answer_len=1

{'score': 0.002287399722263217, 'start': 6, 'end': 7, 'answer': '北'}

## Pipeline背后的实现

In [24]:
from transformers import *
import torch



In [25]:
tokenizer = AutoTokenizer.from_pretrained("uer/roberta-base-finetuned-dianping-chinese")
model = AutoModelForSequenceClassification.from_pretrained("uer/roberta-base-finetuned-dianping-chinese")

loading configuration file config.json from cache at C:\Users\MingyueHu\.cache\huggingface\hub\models--uer--roberta-base-finetuned-dianping-chinese\snapshots\25faf1874b21e76db31ea9c396ccf2a0322e0071\config.json
Model config BertConfig {
  "_name_or_path": "uer/roberta-base-finetuned-dianping-chinese",
  "architectures": [
    "BertForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "negative (stars 1, 2 and 3)",
    "1": "positive (stars 4 and 5)"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "negative (stars 1, 2 and 3)": 0,
    "positive (stars 4 and 5)": 1
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.42.4",
  "

In [26]:
input_text = "我觉得不太行！"
inputs = tokenizer(input_text, return_tensors="pt")
inputs

{'input_ids': tensor([[ 101, 2769, 6230, 2533,  679, 1922, 6121, 8013,  102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1]])}

In [27]:
res = model(**inputs)
res

SequenceClassifierOutput(loss=None, logits=tensor([[ 1.7376, -1.8681]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

In [28]:
logits = res.logits
logits = torch.softmax(logits, dim=-1)
logits

tensor([[0.9736, 0.0264]], grad_fn=<SoftmaxBackward0>)

In [29]:
pred = torch.argmax(logits).item()
pred

0

In [30]:
model.config.id2label

{0: 'negative (stars 1, 2 and 3)', 1: 'positive (stars 4 and 5)'}

In [31]:
result = model.config.id2label.get(pred)
result

'negative (stars 1, 2 and 3)'