# **Loading Models and Inference with Hugging Face Inferences with and without pipeline**

We will explore the Hugging Face transformers library for NLP tasks. it start by manually implementing text classification and generation using models like DistilBERT and GPT-2, handling model loading, tokenization, inference, and output processing. Then, we will learn how the pipeline() function simplifies these tasks, achieving the same results with minimal code. By comparing both methods, we will see how pipeline() streamlines NLP implementation, saving time and effort.

# 1- Perform text classification and text generation using DistilBERT and GPT-2 models without pipeline().

In [2]:
%pip install torch
%pip install transformers

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.
Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


In [3]:
from transformers import pipeline
from transformers import DistilBertForSequenceClassification, DistilBertTokenizer
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# You can also use this section to suppress warnings generated by your code:
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn
warnings.filterwarnings('ignore')

# Text classification with DistilBERT

## Load the model and tokenizer

Let's start by initializing a DistilBERT tokenizer and model fine-tuned on the SST-2 dataset for sentiment analysis. This setup enables efficient sentiment classification of text using a pretrained transformer model.



In [4]:
# Load the tokenizer and model

tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

#  Preprocess the Input Text
This code takes the input text, tokenizes it, and converts it into a PyTorch tensor format ("pt"), which is suitable for the model. Make sure to replace text with your actual input string.


In [13]:
# Sample text
# Define the text
#text = "Your text here"
# example
text = "I'm so excited for my upcoming vacation to Hawaii!"

# Tokenize the input text
inputs = tokenizer(text, return_tensors="pt")

print(inputs)

{'input_ids': tensor([[  101,  1045,  1005,  1049,  2061,  7568,  2005,  2026,  9046, 10885,
          2000,  7359,   999,   102]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}


The token IDs represent the token indexes, while the attention_mask helps the model distinguish between actual content and padding, ensuring efficient computation and accurate processing of input data, even when no tokens are explicitly masked.

###  Perform inference
To run inference, we'll use the  `torch.no_grad()` context manager to disable gradient calculation, reducing memory usage and speeding up computation since gradients aren't needed when the model isn't being trained.
The **inputs syntax unpacks the dictionary of keyword arguments, allowing us to pass the tokenized inputs directly to the model:


In [14]:
# Perform inference
with torch.no_grad():
    outputs = model(**inputs)

Another method is `input_ids`, and `attention_mask` is their own parameter. Instead of using **inputs to unpack the dictionary, you can pass the input_ids and attention_mask as separate parameters:



In [None]:
#with torch.no_grad():
#    logits = model(input_ids=inputs['input_ids'], attention_mask=inputs['attention_mask']).logits

#### Get the logits
The logits represent the model's raw, unnormalized predictions. We can extract these logits from the model's output for further processing, like determining the predicted class or calculating probabilities.


In [16]:
logits = outputs.logits
logits.shape

torch.Size([1, 2])

## Post-process the output
Convert the logits to probabilities and get the predicted class:


In [17]:
# Convert logits to probabilities
probs = torch.softmax(logits, dim=-1)

# Get the predicted class
predicted_class = torch.argmax(probs, dim=-1)

# Map the predicted class to the label
labels = ["NEGATIVE", "POSITIVE"]
predicted_label = labels[predicted_class]

print(f"Predicted label: {predicted_label}")

Predicted label: POSITIVE


# Text generation with GPT-2 

## Load tokenizer
Load the pre-trained GPT-2 tokenizer, which converts text into tokens that the model can process

In [19]:
# Load the tokenizer and model
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

Load the pre-trained GPT-2 model with a language modeling head, which generates text based on input tokens.

In [20]:
# Load the tokenizer and model
model = GPT2LMHeadModel.from_pretrained("gpt2")

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

## Preprocess the input text  
Tokenize the input text and convert it into a model-friendly format, resulting in token indexes or input IDs

In [21]:
# Prompt
prompt = "I love generative AI class"

# Tokenize the input text
inputs = tokenizer(prompt, return_tensors="pt")
inputs

{'input_ids': tensor([[  40, 1842, 1152,  876, 9552, 1398]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1]])}

## Perform inference  
Generate text using the model with temperature sampling

- **inputs:** Input token IDs from the tokenizer  
- **attention_mask:** Mask indicating which tokens to attend to  
- **pad_token_id:** Padding token ID set to the end-of-sequence token ID  
- **max_length:** Maximum length of the generated sequences  
- **num_return_sequences:** Number of sequences to generate  
- **do_sample:** Enables sampling instead of greedy decoding  
- **temperature:** Controls randomness in generation (lower = more focused, higher = more random)  


### Notes on Sampling Parameters

- **temperature = 1.0** → baseline (no scaling)  
- **temperature < 1.0** → more focused / less random  
  - Example: `0.7` makes outputs safer and more deterministic  
- **temperature > 1.0** → more diverse / random  
  - Example: `1.2 – 1.5` encourages creativity but may reduce coherence  

You can also combine with:  
- **top_k = 50** → sample only from the top 50 tokens  
- **top_p = 0.9** → nucleus sampling (tokens making up 90% probability mass)  


In [24]:
output_ids = model.generate(
    inputs.input_ids, 
    attention_mask=inputs.attention_mask,
    pad_token_id=tokenizer.eos_token_id,
    max_length=50, 
    num_return_sequences=1,
    do_sample=True,        # enable sampling
    temperature=0.7        # controls randomness (lower = more deterministic, higher = more random)
)

output_ids



tensor([[   40,  1842,  1152,   876,  9552,  1398,  6637,    11,   290,   314,
          1101,  1016,   284,   307, 11065,   428,   757,    11,   523,   314,
          1101,  2111,   284,  1394,   428,  7243,   287,  2000,    13,   198,
           198,    40,   716,  3058,  1762,   319,   257,   649,  1398,  7483,
           329,   257,  1398,  7483,   329,   617,   286,   262,  1152,   876]])

or

```python
with torch.no_grad():
    outputs = model(**inputs) 

outputs


## Post-process the output  
Decode the generated tokens to get the text:


In [25]:
# Decode the generated text
generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)

print(generated_text)

I love generative AI classifications, and I'm going to be studying this again, so I'm trying to keep this topic in mind.

I am currently working on a new classifier for a classifier for some of the generative


# 2- Perform text classification and text generation using DistilBERT and GPT-2 models with pipeline().

# Hugging Face `pipeline()` function

The `pipeline()` function from the Hugging Face `transformers` library is a high-level API designed to simplify the usage of pretrained models for various natural language processing (NLP) tasks. It abstracts the complexities of model loading, tokenization, inference, and post-processing, allowing users to perform complex NLP tasks with just a few lines of code.

## Definition

```python
transformers.pipeline(
    task: str,
    model: Optional = None,
    config: Optional = None,
    tokenizer: Optional = None,
    feature_extractor: Optional = None,
    framework: Optional = None,
    revision: str = 'main',
    use_fast: bool = True,
    model_kwargs: Dict[str, Any] = None,
    **kwargs
)
```

## Parameters

- **task**: `str`
  - The task to perform, such as "text-classification", "text-generation", "question-answering", etc.
  - Example: `"text-classification"`

- **model**: `Optional`
  - The model to use. This can be a string (model identifier from Hugging Face model hub), a path to a directory containing model files, or a pre-loaded model instance.
  - Example: `"distilbert-base-uncased-finetuned-sst-2-english"`

- **config**: `Optional`
  - The configuration to use. This can be a string, a path to a directory, or a pre-loaded config object.
  - Example: `{"output_attentions": True}`

- **tokenizer**: `Optional`
  - The tokenizer to use. This can be a string, a path to a directory, or a pre-loaded tokenizer instance.
  - Example: `"bert-base-uncased"`

- **feature_extractor**: `Optional`
  - The feature extractor to use for tasks that require it (e.g., image processing).
  - Example: `"facebook/detectron2"`

- **framework**: `Optional`
  - The framework to use, either `"pt"` for PyTorch or `"tf"` for TensorFlow. If not specified, it will be inferred.
  - Example: `"pt"`

- **revision**: `str`, default `'main'`
  - The specific model version to use (branch, tag, or commit hash).
  - Example: `"v1.0"`

- **use_fast**: `bool`, default `True`
  - Whether to use the fast version of the tokenizer if available.
  - Example: `True`

- **model_kwargs**: `Dict[str, Any]`, default `None`
  - Additional keyword arguments passed to the model during initialization.
  - Example: `{"output_hidden_states": True}`

- **kwargs**: `Any`
  - Additional keyword arguments passed to the pipeline components.

## Task types

The `pipeline()` function supports a wide range of NLP tasks. Here are some of the common tasks:

1. **Text Classification**: `text-classification`
   - **Purpose**: Classify text into predefined categories.
   - **Use Cases**: Sentiment analysis, spam detection, topic classification.

2. **Text Generation**: `text-generation`
   - **Purpose**: Generate coherent text based on a given prompt.
   - **Use Cases**: Creative writing, dialogue generation, story completion.

3. **Question Answering**: `question-answering`
   - **Purpose**: Answer questions based on a given context.
   - **Use Cases**: Building Q&A systems, information retrieval from documents.

4. **Named Entity Recognition (NER)**: `ner` (or `token-classification`)
   - **Purpose**: Identify and classify named entities (like people, organizations, locations) in text.
   - **Use Cases**: Extracting structured information from unstructured text.

5. **Summarization**: `summarization`
   - **Purpose**: Summarize long pieces of text into shorter, coherent summaries.
   - **Use Cases**: Document summarization, news summarization.

6. **Translation**: `translation_xx_to_yy` (e.g., `translation_en_to_fr`)
   - **Purpose**: Translate text from one language to another.
   - **Use Cases**: Language translation, multilingual applications.

7. **Fill-Mask**: `fill-mask`
   - **Purpose**: Predict masked words in a sentence (useful for masked language modeling).
   - **Use Cases**: Language modeling tasks, understanding model predictions.

8. **Zero-Shot Classification**: `zero-shot-classification`
   - **Purpose**: Classify text into categories without needing training data for those categories.
   - **Use Cases**: Flexible and adaptable classification tasks.

9. **Feature Extraction**: `feature-extraction`
   - **Purpose**: Extract hidden state features from text.
   - **Use Cases**: Downstream tasks requiring text representations, such as clustering, similarity, or further custom model training.


### Example 1: Text classification using `pipeline()`

This example demonstrates text classification using the  `pipeline()` function. We'll load a pre-trained model and classify a sample text.

#### Load the model:
Initialize the pipeline for text-classification with the "distilbert-base-uncased-finetuned-sst-2-english" model, fine-tuned for sentiment analysis.

#### Classify the sample text:
Use the classifier to evaluate the text: "I'm so excited for my upcoming vacation to Hawaii!" The classifier returns and prints the classification result.



In [26]:
# Load a general text classification model
classifier = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")

# Classify a sample text
result = classifier("Congratulations! You've won a free ticket to the Bahamas. Reply WIN to claim.")
print(result)

Device set to use cpu


[{'label': 'POSITIVE', 'score': 0.9997586607933044}]


#### Output

The output will be a list of dictionaries, where each dictionary contains:

- `label`: The predicted label (e.g., "POSITIVE" or "NEGATIVE").
- `score`: The confidence score for the prediction.


### Example 2: Language detection using `pipeline()`

In this example, you will use the `pipeline()` function to perform language detection. You will load a pretrained language detection model and use it to identify the language of a sample text.

#### Load the language detection model:
We initialize the pipeline for the `text-classification` task, specifying the model `"papluca/xlm-roberta-base-language-detection"`. This model is fine-tuned for language detection.

#### Classify the sample text:
We use the classifier to detect the language of a sample text: "من خیلی برای تعطیلات آینده‌ام در هاوایی هیجان‌زده‌ام!" The `classifier` function returns the classification result, which is then printed.


In [27]:
from transformers import pipeline

classifier = pipeline("text-classification", model="papluca/xlm-roberta-base-language-detection")
result = classifier("من خیلی برای تعطیلات آینده‌ام در هاوایی هیجان‌زده‌ام!")
print(result)

config.json: 0.00B [00:00, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/1.11G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/502 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

Device set to use cpu


[{'label': 'ur', 'score': 0.18315279483795166}]


In [36]:
from transformers import pipeline

classifier = pipeline("text-classification", model="Mike0307/multilingual-e5-language-detection")
result = classifier("من خیلی برای تعطیلات آینده‌ام در هاوایی هیجان‌زده‌ام!")
print(result)


Device set to use cpu


[{'label': 'LABEL_30', 'score': 0.9999822378158569}]


In [37]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

model_name = "Mike0307/multilingual-e5-language-detection"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Languages listed in the model card, in order (index == label id)
languages = [
    "Arabic","Basque","Breton","Catalan","Chinese_China","Chinese_Hongkong",
    "Chinese_Taiwan","Chuvash","Czech","Dhivehi","Dutch","English",
    "Esperanto","Estonian","French","Frisian","Georgian","German","Greek",
    "Hakha_Chin","Indonesian","Interlingua","Italian","Japanese","Kabyle",
    "Kinyarwanda","Kyrgyz","Latvian","Maltese","Mongolian","Persian","Polish",
    "Portuguese","Romanian","Romansh_Sursilvan","Russian","Sakha","Slovenian",
    "Spanish","Swedish","Tamil","Tatar","Turkish","Ukranian","Welsh"
]

clf = pipeline("text-classification", model=model, tokenizer=tokenizer, device=-1)

text = "من خیلی برای تعطیلات آینده‌ام در هاوایی هیجان‌زده‌ام!"
out = clf(text)[0]               # e.g. {'label': 'LABEL_30', 'score': 0.9999}
idx = int(out["label"].split("_")[1])
lang = languages[idx]

print(f"Detected language: {lang} (id={idx}), score={out['score']:.4f}")


Device set to use cpu


Detected language: Persian (id=30), score=1.0000


### Example 3: Text generation with `pipeline()`

This example uses the `pipeline()`function for text generation. We'll load a pre-trained model and generate text based on a prompt.
#### Initialize the text generation model:
Initialize the pipeline for  `text-generation` with the `"gpt2"` model, a popular choice for text generation tasks.

In [39]:
# Initialize the text generation pipeline with GPT-2
generator = pipeline("text-generation", model="gpt2")

Device set to use cpu


#### Generate text based on a given prompt:
Use the generator with the prompt "Once upon a time" and specify parameters:
`max_length=50` to limit the output to 50 tokens
`truncation=True` for text truncation
`num_return_sequences=1` to generate one sequence
The generator returns and prints the generated text.


In [40]:
# Generate text based on a given prompt
prompt = "I love generative AI class"
result = generator(prompt, max_length=50, num_return_sequences=1, truncation=True)

# Print the generated text
print(result[0]['generated_text'])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=50) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


I love generative AI classifiers, and I also love the fact that it's easy to use and maintain, though I've never used it in my own codebase.

Using generative AI, it's easy to run a classifier from a custom generated class, and it's easy to pick a class based on data, like a value of type int, and run a class based on a value of type type int.

class C ( classname int ) { static int c = 0 ; } class D ( classname int ) { static int c = 0 ; } class E ( classname int ) { static int c = 0 ; } function main ( ) { //... }

This is where generative AI really comes into its own. It's not just that C is able to run a classifier, but it can also run classifiers from a custom generated class. It's easier to use, and less to worry about.

The main advantage of generative AI is that it can use just as many parameters as you want, instead of trying to find the best way to fit it. It also makes it easy to write your own classifiers, which is usually a good thing.

class C ( classname int ) { static i

### Example 4: Text-to-Text Generation with T5 and pipeline() `pipeline()`

This example uses the `pipeline()`  function for text-to-text generation with the T5 model. We'll load a pre-trained T5 model and translate a sentence from English to French based on a prompt. 

#### Load the Model:
Initialize the pipeline for text2text-generation with the "t5-small" model, a versatile model capable of various text-to-text tasks, including translation.


In [45]:
# Initialize the text generation pipeline with T5
generator = pipeline("text2text-generation", model="t5-small")

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

Device set to use cpu


#### Generate Translation:
Use the generator to translate "How are you?" from English to French with the prompt "translate English to French: How are you?". Specify:
`max_length=50` to limit output to 50 tokens
`num_return_sequences=1` for a single translation
The generator returns and prints the translated text.

In [46]:
# Generate text based on a given prompt
prompt = "translate English to French: How are you?"
result = generator(prompt, max_length=50, num_return_sequences=1)

# Print the generated text
print(result[0]['generated_text'])

Both `max_new_tokens` (=256) and `max_length`(=50) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


Comment êtes-vous?


# Benefits of pipeline()
Less Code: Reduces boilerplate code for NLP tasks.
Readability: Improves code readability and expressiveness.
Time-Saving: Automates model loading, tokenization, inference, and post-processing.
Consistent API: Enables easy experimentation and prototyping with a unified API.
Framework Flexibility: Handles underlying frameworks (TensorFlow or PyTorch) automatically.
# Use pipeline() for:
Rapid Prototyping: Quickly test NLP applications or models.
Simple Tasks: Common NLP tasks well-supported by pipeline().
Deployment: Environments requiring simplicity and ease of use.
# Avoid pipeline() for:
Custom Tasks: Tasks requiring high customization not supported by pipeline().
Performance Tuning: Cases needing fine-grained control over models and tokenization for optimization.


# Fill-mask task with BERT with `pipeline()`

Use the `pipeline()` function to perform a fill-mask task with the BERT model. Load a pre-trained BERT model to predict the masked word in a given sentence.


### Instructions

1. **Initialize the fill-mask pipeline** with the BERT model.
2. **Create a prompt** with a masked token.
3. **Generate text** by filling in the masked token.
4. **Print the generated text** with the predictions.


In [53]:
from transformers import pipeline

# Whole-word masking BERT tends to give nicer fills
fill_mask = pipeline("fill-mask", model="bert-large-uncased-whole-word-masking")

mask = fill_mask.tokenizer.mask_token  # "[MASK]" for BERT, "<mask>" for RoBERTa
#prompt = f"The capital of the United States of America is {mask}."
prompt = f"The capital of the Iran is {mask}."
preds = fill_mask(prompt, top_k=5)
for p in preds:
    print(f"{p['token_str']!r:>12}  score={p['score']:.4f}  -> {p['sequence']}")


Some weights of the model checkpoint at bert-large-uncased-whole-word-masking were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


    'tehran'  score=0.9968  -> the capital of the iran is tehran.
      'iran'  score=0.0015  -> the capital of the iran is iran.
   'yerevan'  score=0.0006  -> the capital of the iran is yerevan.
    'kerman'  score=0.0005  -> the capital of the iran is kerman.
      'baku'  score=0.0001  -> the capital of the iran is baku.
