# Hugging Face Pipeline Demonstration
This notebook demonstrates how to use Hugging Face's `transformers` library, focusing on pipelines for various NLP tasks. We will cover the following topics:

1. Sentiment Analysis
2. Named Entity Recognition (NER)
3. Question Answering
4. Text Generation


## Installation

Ensure that you have the necessary package (`transformers`) installed. If you don't have it yet, uncomment and run the following cell:

In [1]:
# !pip install transformers

The `pipeline` method in Hugging Face allows easy access to pre-trained models for tasks like sentiment analysis, text generation, and more.

In [2]:
from transformers import pipeline

---

## 1. Sentiment Analysis with Hugging Face Pipelines
In this section, we'll use the sentiment analysis pipeline, which analyzes whether a given text expresses a positive or negative sentiment.

In [3]:
# Create a sentiment-analysis pipeline
classifier = pipeline('sentiment-analysis', model='distilbert-base-uncased-finetuned-sst-2-english')

# Analyze sentiment of the given text
result = classifier("I love using Hugging Face transformers!")

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]



In [4]:
print(f"Sentiment Analysis: {result}")

Sentiment Analysis: [{'label': 'POSITIVE', 'score': 0.9971315860748291}]


The model confidently predicts a positive sentiment with a score of 0.997.

---

## 2. Named Entity Recognition (NER)

We can use the `ner` pipeline to identify named entities (e.g., organizations, locations) in a text.

In [5]:
# Create a NER pipeline
ner = pipeline("ner", model='dbmdz/bert-large-cased-finetuned-conll03-english', aggregation_strategy="simple")

# Identify named entities in the text
result = ner("Apple is looking at buying a startup in San Francisco.")

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [6]:
print(f"NER: {result}")

NER: [{'entity_group': 'ORG', 'score': 0.9992366, 'word': 'Apple', 'start': 0, 'end': 5}, {'entity_group': 'LOC', 'score': 0.99951744, 'word': 'San Francisco', 'start': 40, 'end': 53}]


The model correctly identifies "Apple" as an organization (ORG) and "San Francisco" as a location (LOC), both with high confidence.

---

## 3. Question Answering

We can use the `question-answering` pipeline to extract answers from a given context based on a question.

In [7]:
# Create a question-answering pipeline
qa_pipeline = pipeline("question-answering", model='distilbert-base-cased-distilled-squad')

# Provide a question and context
result = qa_pipeline({
    'question': "Where is Hugging Face based?",
    'context': "Hugging Face is based in New York City."
})

config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

In [8]:
print(f"Question Answering: {result}")

Question Answering: {'score': 0.9694607853889465, 'start': 25, 'end': 38, 'answer': 'New York City'}


The model accurately identifies "New York City" as the answer with a confidence score of 0.969.

---

## 4. Text Generation

The `text-generation` pipeline can generate text based on a given prompt. By default, it uses the GPT-2 model if no other model is specified.

In [9]:
# Create a text generation pipeline
generator = pipeline("text-generation", model='gpt2')

# Generate two sequences of text, each with a maximum length of 40 tokens
results = generator("Once upon a time,", max_length=40, num_return_sequences=2)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [10]:
for result in results:
    print(f"Text Generation: {result}")

Text Generation: {'generated_text': 'Once upon a time, we do not know how to communicate properly to someone that a system that has such information would send. And these are times of transition. Whether and how they learn of a message'}
Text Generation: {'generated_text': 'Once upon a time, when God would call the universe to a halt by throwing a bomb or by shooting somebody, we would be confronted with what comes to pass when our species, in a state of'}


In [12]:
# Display the number of parameters in the model
print(f"Model parameter count: {generator.model.num_parameters():,}")

Model parameter count: 124,439,808


This number represents the trainable parameters in the base version of the GPT-2 model. Larger versions of GPT-2, such as GPT-2 Medium, Large, and XL, have significantly more parameters, allowing them to capture more complex patterns and generate higher-quality text but requiring more computational resources.