Using pipeline

In [1]:
from transformers import pipeline

In [10]:
sentiment_pipeline = pipeline("sentiment-analysis",model='yiyanghkust/finbert-tone', framework="tf")

config.json:   0%|          | 0.00/533 [00:00<?, ?B/s]

tf_model.h5:   0%|          | 0.00/439M [00:00<?, ?B/s]

Some layers from the model checkpoint at yiyanghkust/finbert-tone were not used when initializing TFBertForSequenceClassification: ['dropout_37']
- This IS expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertForSequenceClassification were initialized from the model checkpoint at yiyanghkust/finbert-tone.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForSequenceClassification for predictions without further training.


vocab.txt:   0%|          | 0.00/226k [00:00<?, ?B/s]

Device set to use 0


In [11]:
result = sentiment_pipeline(["I love my college!",'I hate people opinion about AI'])
print(result)

[{'label': 'Neutral', 'score': 0.9842071533203125}, {'label': 'Negative', 'score': 0.9997913241386414}]


In [12]:
type(result[0])

dict

What if there is no pipeline and i want to use a different tokenizer and model

In [25]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

In [26]:
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")


In [27]:
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

In [32]:
model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-multilingual-cased")

All PyTorch model weights were used when initializing TFBertForSequenceClassification.

Some weights or buffers of the TF 2.0 model TFBertForSequenceClassification were not initialized from the PyTorch model and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [33]:
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [34]:
text = "I love learning about NLP!"
tokens = tokenizer(text, return_tensors="pt")


In [30]:
print("Tokenized Words:", tokenizer.tokenize(text))
print("Token IDs:", tokens["input_ids"].numpy().tolist()[0])

Tokenized Words: ['i', 'love', 'learning', 'about', 'nl', '##p', '!']
Token IDs: [101, 1045, 2293, 4083, 2055, 17953, 2361, 999, 102]


In [36]:
outputs = model(**tokens)
print(outputs)

SequenceClassifierOutput(loss=None, logits=tensor([[-0.1579,  0.2654]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)


In [38]:
import tensorflow as tf

logits = outputs.logits
probs = tf.nn.softmax(logits.detach().numpy(), axis=-1)  # Convert to NumPy first

print(probs)


tf.Tensor([[0.39572778 0.60427225]], shape=(1, 2), dtype=float32)


In [39]:
label_idx = tf.argmax(probs, axis=-1).numpy()[0]
labels = ["NEGATIVE", "POSITIVE"]
print(f"Predicted Sentiment: {labels[label_idx]}, Probability: {probs.numpy().max():.4f}")


Predicted Sentiment: POSITIVE, Probability: 0.6043


Other examples of pipelines

In [40]:
generator = pipeline("text-generation", model="gpt2")

prompt = "There once lived a king"
result = generator(prompt, max_length=30, num_return_sequences=1)

print(result)

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'There once lived a king who had built a palace from gold, as it had in some of his previous residences, with gold on top, and gold'}]


In [41]:
qa_pipeline = pipeline("question-answering")

context = """The Great Wall of China is a historic fortification that stretches
over 13,000 miles. It was primarily built to protect against invasions and
was constructed during the Ming Dynasty."""

question = "Who built the Great Wall of China?"

result = qa_pipeline(question=question, context=context)

print(result)

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

Device set to use cpu


{'score': 0.44635364413261414, 'start': 171, 'end': 183, 'answer': 'Ming Dynasty'}
