# Cracking Open the Hugging Face Transformers Library

Code authored by: Shawhin Talebi <br>
Blog: https://medium.com/towards-data-science/cracking-open-the-hugging-face-transformers-library-350aa0ef0161

### import modules

In [1]:
import sys

major, minor, micro = sys.version_info[:3]
print(f"Your Python version is {major}.{minor}.{micro}")

Your Python version is 3.10.0


In [2]:
from transformers import pipeline, Conversation
import gradio as gr

  from .autonotebook import tqdm as notebook_tqdm


### code

#### Sentiment Analysis

In [3]:
# toy example 1
pipeline(task="sentiment-analysis")("Love this!")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
model.safetensors: 100%|██████████| 268M/268M [00:46<00:00, 5.73MB/s] 
tokenizer_config.json: 100%|██████████| 48.0/48.0 [00:00<00:00, 83.3kB/s]
vocab.txt: 100%|██████████| 232k/232k [00:00<00:00, 695kB/s]


[{'label': 'POSITIVE', 'score': 0.9998745918273926}]

In [4]:
# toy example 2
pipeline(task="sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")("Love this!")

[{'label': 'POSITIVE', 'score': 0.9998745918273926}]

#### More Sentiment Analysis

In [5]:
# defining classifier
classifier = pipeline(task="sentiment-analysis", model="distilbert-base-uncased-finetuned-sst-2-english")

In [6]:
classifier("Hate this.")

[{'label': 'NEGATIVE', 'score': 0.9997110962867737}]

In [7]:
# we can also pass in a list to classifier
text_list = ["This is great", \
             "Thanks for nothing", \
             "You've got to work on your face", \
             "You're beautiful, never change!"]

classifier(text_list)

[{'label': 'POSITIVE', 'score': 0.9998785257339478},
 {'label': 'POSITIVE', 'score': 0.9680055975914001},
 {'label': 'NEGATIVE', 'score': 0.8776113986968994},
 {'label': 'POSITIVE', 'score': 0.9998120665550232}]

In [8]:
# if there are multiple target labels, we can return them all
classifier = pipeline(task="text-classification", model="SamLowe/roberta-base-go_emotions", top_k=None)

config.json: 100%|██████████| 1.92k/1.92k [00:00<00:00, 5.61MB/s]
model.safetensors: 100%|██████████| 499M/499M [01:36<00:00, 5.19MB/s] 
tokenizer_config.json: 100%|██████████| 380/380 [00:00<00:00, 825kB/s]
vocab.json: 100%|██████████| 798k/798k [00:00<00:00, 852kB/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 718kB/s]
tokenizer.json: 100%|██████████| 2.11M/2.11M [00:01<00:00, 1.61MB/s]
special_tokens_map.json: 100%|██████████| 280/280 [00:00<00:00, 1.32MB/s]


In [8]:
classifier(text_list[0])  # "This is great"

[[{'label': 'admiration', 'score': 0.9526104927062988},
  {'label': 'approval', 'score': 0.03047208860516548},
  {'label': 'neutral', 'score': 0.015236231498420238},
  {'label': 'excitement', 'score': 0.006063772831112146},
  {'label': 'gratitude', 'score': 0.005296189337968826},
  {'label': 'joy', 'score': 0.004475208930671215},
  {'label': 'curiosity', 'score': 0.004322327673435211},
  {'label': 'realization', 'score': 0.004089603666216135},
  {'label': 'optimism', 'score': 0.004077219869941473},
  {'label': 'disapproval', 'score': 0.004076569341123104},
  {'label': 'annoyance', 'score': 0.003528739558532834},
  {'label': 'surprise', 'score': 0.0029730673413723707},
  {'label': 'disappointment', 'score': 0.0027346380520612},
  {'label': 'love', 'score': 0.0026945790741592646},
  {'label': 'amusement', 'score': 0.002486741403117776},
  {'label': 'confusion', 'score': 0.002360740676522255},
  {'label': 'pride', 'score': 0.002101337304338813},
  {'label': 'sadness', 'score': 0.001773052

#### Summarization

In [9]:
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

config.json: 100%|██████████| 1.58k/1.58k [00:00<00:00, 2.41MB/s]
model.safetensors: 100%|██████████| 1.63G/1.63G [04:47<00:00, 5.64MB/s]
generation_config.json: 100%|██████████| 363/363 [00:00<00:00, 791kB/s]
vocab.json: 100%|██████████| 899k/899k [00:00<00:00, 1.06MB/s]
merges.txt: 100%|██████████| 456k/456k [00:00<00:00, 717kB/s]
tokenizer.json: 100%|██████████| 1.36M/1.36M [00:00<00:00, 4.78MB/s]


In [14]:
text = """
Hugging Face is an AI company that has become a major hub for open-source machine learning. 
Their platform has 3 major elements which allow users to access and share machine learning resources. 
First, is their rapidly growing repository of pre-trained open-source machine learning models for things such as natural language processing (NLP), computer vision, and more. 
Second, is their library of datasets for training machine learning models for almost any task. 
Third, and finally, is Spaces which is a collection of open-source ML apps.

The power of these resources is that they are community generated, which leverages all the benefits of open source i.e. cost-free, wide diversity of tools, high quality resources, and rapid pace of innovation. 
While these make building powerful ML projects more accessible than before, there is another key element of the Hugging Face ecosystem—their Transformers library.
"""
summarized_text = summarizer(text, min_length=5, max_length=140)[0]['summary_text']
summarized_text 

'Hugging Face is an AI company that has become a major hub for open-source machine learning. They have 3 major elements which allow users to access and share machine learning resources.'

In [15]:
classifier(summarized_text)

[[{'label': 'neutral', 'score': 0.9101782441139221},
  {'label': 'approval', 'score': 0.08781375735998154},
  {'label': 'realization', 'score': 0.023256298154592514},
  {'label': 'annoyance', 'score': 0.006623796187341213},
  {'label': 'admiration', 'score': 0.004981068894267082},
  {'label': 'disapproval', 'score': 0.004730131011456251},
  {'label': 'optimism', 'score': 0.0033590758685022593},
  {'label': 'disappointment', 'score': 0.002619005972519517},
  {'label': 'confusion', 'score': 0.001953981351107359},
  {'label': 'excitement', 'score': 0.0012417063117027283},
  {'label': 'disgust', 'score': 0.0011407796991989017},
  {'label': 'joy', 'score': 0.0010540119837969542},
  {'label': 'amusement', 'score': 0.0009572377894073725},
  {'label': 'love', 'score': 0.000887105125002563},
  {'label': 'desire', 'score': 0.0008553270599804819},
  {'label': 'curiosity', 'score': 0.000826105650048703},
  {'label': 'anger', 'score': 0.0007336381822824478},
  {'label': 'caring', 'score': 0.0006971

#### Conversational

In [16]:
chatbot = pipeline(model="facebook/blenderbot-400M-distill")

config.json: 100%|██████████| 1.57k/1.57k [00:00<00:00, 6.93MB/s]
pytorch_model.bin: 100%|██████████| 730M/730M [02:42<00:00, 4.50MB/s] 
generation_config.json: 100%|██████████| 347/347 [00:00<00:00, 1.20MB/s]
tokenizer_config.json: 100%|██████████| 1.15k/1.15k [00:00<00:00, 2.69MB/s]
vocab.json: 100%|██████████| 127k/127k [00:00<00:00, 365kB/s]
merges.txt: 100%|██████████| 62.9k/62.9k [00:00<00:00, 273kB/s]
added_tokens.json: 100%|██████████| 16.0/16.0 [00:00<00:00, 36.8kB/s]
special_tokens_map.json: 100%|██████████| 772/772 [00:00<00:00, 1.30MB/s]
tokenizer.json: 100%|██████████| 310k/310k [00:00<00:00, 471kB/s]


In [17]:
conversation = Conversation("Hi I'm Sahil, how are you?")
conversation = chatbot(conversation)


No chat template is defined for this tokenizer - using the default template for the BlenderbotTokenizerFast class. If the default is not appropriate for your model, please set `tokenizer.chat_template` to an appropriate template. See https://huggingface.co/docs/transformers/main/chat_templating for more information.



In [18]:
conversation

Conversation id: 0f6afd47-5192-4cfe-b2a6-86a4490ad7dc
user: Hi I'm Sahil, how are you?
assistant:  I'm doing well. How are you doing this evening? I just got home from work.

In [19]:
conversation.add_user_input("Where do you work?")
conversation = chatbot(conversation)

In [20]:
conversation

Conversation id: 0f6afd47-5192-4cfe-b2a6-86a4490ad7dc
user: Hi I'm Sahil, how are you?
assistant:  I'm doing well. How are you doing this evening? I just got home from work.
user: Where do you work?
assistant:  I work at a grocery store. What do you do for a living? Do you have any hobbies?

### Deploy Chatbot UI

#### Text Sentiment Chatbot

In [22]:
def top3_text_classes(message, history):
    return str(classifier(message)[0][:3]).replace('}, {', '\n').replace('[{', '').replace('}]', '')


In [23]:
demo_sentiment = gr.ChatInterface(top3_text_classes, title="Text Sentiment Chatbot", description="Enter your text, and the chatbot will classify the sentiment.")


In [24]:
demo_sentiment.launch()


Running on local URL:  http://127.0.0.1:7861

To create a public link, set `share=True` in `launch()`.




#### Summarizer Chatbot

In [27]:
def summarizer_bot(message, history):
    return summarizer(message, min_length=5, max_length=10)[0]['summary_text']

In [28]:

demo_summarizer = gr.ChatInterface(summarizer_bot, title="Summarizer Chatbot", description="Enter your text, and the chatbot will return the summarized version.")

demo_summarizer.launch()

Running on local URL:  http://127.0.0.1:7863

To create a public link, set `share=True` in `launch()`.




#### Vanilla Chatbot

In [29]:
message_list = []
response_list = []

def vanilla_chatbot(message, history):
    conversation = Conversation(text=message, past_user_inputs=message_list, generated_responses=response_list)
    conversation = chatbot(conversation)

    return conversation.generated_responses[-1]

demo_chatbot = gr.ChatInterface(vanilla_chatbot, title="Vanilla Chatbot", description="Enter text to start chatting.")

demo_chatbot.launch()

Running on local URL:  http://127.0.0.1:7864

To create a public link, set `share=True` in `launch()`.


