<a href="https://colab.research.google.com/github/gouthamkallempudi/Deep-Learning/blob/master/HuggingFace_Pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
import warnings
warnings.filterwarnings('ignore')

In [4]:
from transformers import pipeline

Lets start with sentiment analysis

In [5]:
nlp = pipeline(
    task = "sentiment-analysis",
    model = "distilbert-base-uncased-finetuned-sst-2-english",
    device = 0)

print(nlp("I love this movie!"))

Device set to use cuda:0


[{'label': 'POSITIVE', 'score': 0.9998775720596313}]


Named Entity Recognition (NER)

In [7]:
from transformers import AutoConfig

config = AutoConfig.from_pretrained("dbmdz/bert-large-cased-finetuned-conll03-english")

ner = pipeline(
    task = "ner",
    model = "dbmdz/bert-large-cased-finetuned-conll03-english",
    config = config,
    device = 0
)

print(ner("Hugging face Inc. is based in New York."))

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


[{'entity': 'I-ORG', 'score': np.float32(0.98959446), 'index': 1, 'word': 'Hu', 'start': 0, 'end': 2}, {'entity': 'I-ORG', 'score': np.float32(0.8172372), 'index': 2, 'word': '##gging', 'start': 2, 'end': 7}, {'entity': 'I-ORG', 'score': np.float32(0.9829742), 'index': 3, 'word': 'face', 'start': 8, 'end': 12}, {'entity': 'I-ORG', 'score': np.float32(0.99925417), 'index': 4, 'word': 'Inc', 'start': 13, 'end': 16}, {'entity': 'I-LOC', 'score': np.float32(0.9987984), 'index': 9, 'word': 'New', 'start': 30, 'end': 33}, {'entity': 'I-LOC', 'score': np.float32(0.9988224), 'index': 10, 'word': 'York', 'start': 34, 'end': 38}]


In [8]:
print(config)

BertConfig {
  "_num_labels": 9,
  "architectures": [
    "BertForTokenClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "directionality": "bidi",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "id2label": {
    "0": "O",
    "1": "B-MISC",
    "2": "I-MISC",
    "3": "B-PER",
    "4": "I-PER",
    "5": "B-ORG",
    "6": "I-ORG",
    "7": "B-LOC",
    "8": "I-LOC"
  },
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "label2id": {
    "B-LOC": 7,
    "B-MISC": 1,
    "B-ORG": 5,
    "B-PER": 3,
    "I-LOC": 8,
    "I-MISC": 2,
    "I-ORG": 6,
    "I-PER": 4,
    "O": 0
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "pad_token_id": 0,
  "pooler_fc_size": 768,
  "pooler_num_attention_heads": 12,
  "pooler_num_fc_layers": 3,
  "pooler_size_per_head": 128,
  "pooler_type": "first_token_transform",
  "positi

Fill Mask Task

In [10]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")

fill_mask = pipeline(
    task = "fill-mask",
    model = "bert-base-cased",
    tokenizer = tokenizer,
    device = 0
)

print(fill_mask("The Capital of Germany is [MASK]."))

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


[{'score': 0.1705125868320465, 'token': 3206, 'token_str': 'Berlin', 'sequence': 'The Capital of Germany is Berlin.'}, {'score': 0.09991458803415298, 'token': 12312, 'token_str': 'Cologne', 'sequence': 'The Capital of Germany is Cologne.'}, {'score': 0.08590444177389145, 'token': 8339, 'token_str': 'Hamburg', 'sequence': 'The Capital of Germany is Hamburg.'}, {'score': 0.07850676774978638, 'token': 9529, 'token_str': 'Frankfurt', 'sequence': 'The Capital of Germany is Frankfurt.'}, {'score': 0.06017882004380226, 'token': 13269, 'token_str': 'Stuttgart', 'sequence': 'The Capital of Germany is Stuttgart.'}]


tokenizer → for text

feature_extractor → for images/audio

Image Classification

In [22]:
from transformers import AutoFeatureExtractor
from PIL import Image
import requests

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cats.png"
image = Image.open(requests.get(url, stream=True).raw)

# Load the feature extractor (optional)
feature_extractor = AutoFeatureExtractor.from_pretrained("google/vit-base-patch16-224")

image_classifier = pipeline(
    task="image-classification",
    model="google/vit-base-patch16-224",
    feature_extractor=feature_extractor,
    device=0
)

print(image_classifier(image))

Fast image processor class <class 'transformers.models.vit.image_processing_vit_fast.ViTImageProcessorFast'> is available for this model. Using slow image processor class. To use the fast image processor class set `use_fast=True`.
Device set to use cuda:0


[{'label': 'tabby, tabby cat', 'score': 0.27686837315559387}, {'label': 'tiger cat', 'score': 0.2763683497905731}, {'label': 'Egyptian cat', 'score': 0.14028170704841614}, {'label': 'hay', 'score': 0.0253145769238472}, {'label': 'wool, woolen, woollen', 'score': 0.019932707771658897}]


In [14]:
from transformers import AutoImageProcessor
# Load image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cats.png"
image = Image.open(requests.get(url, stream=True).raw)

# Load image processor
image_processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224")

image_pipe = pipeline(
    task="image-classification",
    model="google/vit-base-patch16-224",
    image_processor=image_processor,
    device=0
)

print(image_pipe(image))

Fast image processor class <class 'transformers.models.vit.image_processing_vit_fast.ViTImageProcessorFast'> is available for this model. Using slow image processor class. To use the fast image processor class set `use_fast=True`.
Device set to use cuda:0


[{'label': 'tabby, tabby cat', 'score': 0.27686837315559387}, {'label': 'tiger cat', 'score': 0.2763683497905731}, {'label': 'Egyptian cat', 'score': 0.14028170704841614}, {'label': 'hay', 'score': 0.0253145769238472}, {'label': 'wool, woolen, woollen', 'score': 0.019932707771658897}]


Multimodal Models

Using processor arguments

Text + images (like BLIP, Flamingo, Donut)

Audio + text (like Whisper)

Text + speech + vision (like some MetaAI and Google models)



In [16]:
# image to text

from transformers import BlipProcessor

processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")

image_to_text = pipeline(
    task="image-to-text",
    model="Salesforce/blip-image-captioning-base",
    processor=processor,
    device=0
)


url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cats.png"
image = Image.open(requests.get(url, stream=True).raw)

print(image_to_text(image))


Device set to use cuda:0


[{'generated_text': 'a cat laying on a pile of yarn'}]


In [21]:
asr_pipeline = pipeline(
    task="automatic-speech-recognition",
    model="openai/whisper-base",
    device=0  # Use GPU
)

# Load an audio file (must be .wav or supported format)
# You can replace this with a local file path if needed
audio_file = "https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac"

# Run speech-to-text
print(asr_pipeline(audio_file))

Device set to use cuda:0


{'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.'}


"Visual Q&A and Captioning Tool"

Accepts an image from the user.

Generates:

A caption (image-to-text).

Answers to a textual question about the image (visual question answering / VQA).

In [20]:

# Load models
caption_pipeline = pipeline(
    task="image-to-text",
    model="Salesforce/blip-image-captioning-base",
    device=0
)

vqa_pipeline = pipeline(
    task="vqa",
    model="dandelin/vilt-b32-finetuned-vqa",
    device=0
)

# Load image
image_path = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cats.png"


image = Image.open(requests.get(url, stream=True).raw)

# Generate caption
caption_result = caption_pipeline(image)
caption = caption_result[0]['generated_text']
print(f"\n🖼️  Caption: {caption}")

# Ask user question about image
question = input("\n❓ Ask a question about this image: ")

# Run VQA
vqa_result = vqa_pipeline(image, question=question)
answer = vqa_result[0]['answer']
print(f"\n🤖 Answer: {answer}")


Device set to use cuda:0
Device set to use cuda:0



🖼️  Caption: a cat laying on a pile of yarn

❓ Ask a question about this image: whats the color of the cat

🤖 Answer: gray and white
