# **Introduction to HuggingFace**
---
---

## **Load Transformer**

In [1]:
pip install transformers

Collecting transformers
  Downloading transformers-4.35.0-py3-none-any.whl (7.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.9/7.9 MB[0m [31m48.6 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.16.4 (from transformers)
  Downloading huggingface_hub-0.19.0-py3-none-any.whl (311 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m311.2/311.2 kB[0m [31m30.1 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers<0.15,>=0.14 (from transformers)
  Downloading tokenizers-0.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m97.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers)
  Downloading safetensors-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m75.0 MB/s[0m eta [36m0:00:00[0m
Col

## **Pipeline**

In [2]:
from transformers import pipeline
classifier = pipeline('sentiment-analysis')

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

In [3]:
classifier("We are very happy to show you the HuggingFace Transformers library")

[{'label': 'POSITIVE', 'score': 0.9997480511665344}]

In [4]:
classifier("Pizza is not that good")

[{'label': 'NEGATIVE', 'score': 0.9997690320014954}]

In [5]:
classifier("Pizza is not that good but Toppings are awesome")

[{'label': 'POSITIVE', 'score': 0.9997850060462952}]

**Pass List of Sentences and show output for each sentences**

In [6]:
results = classifier(["We are happy to show you the HuggingFace Library",
                      "Hope you don't hate it",
                      "Please rate 5* if you are happy",
                      "if Not happy share your feedback"])

for result in results:
  print(f"Label : {result['label']}, with score : {result['score']}")

Label : POSITIVE, with score : 0.9998206496238708
Label : POSITIVE, with score : 0.9816809296607971
Label : POSITIVE, with score : 0.9995607733726501
Label : NEGATIVE, with score : 0.5089950561523438


## **nlptown/bert-base-multilingual-uncased-sentiment**

This model is intended for direct use as a sentiment analysis model for product reviews in any of the six languages above or for further finetuning on related sentiment analysis tasks

In [7]:
classifier = pipeline('sentiment-analysis', model ='nlptown/bert-base-multilingual-uncased-sentiment')

Downloading (…)lve/main/config.json:   0%|          | 0.00/953 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/669M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/39.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/872k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

In [9]:
classifier("Esperamos que no lo odie.")

[{'label': '3 stars', 'score': 0.33688199520111084}]

## **Creation of Model and Tokenizer**
**from_pretrained allow us to do this activity**

In [10]:
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification

In [12]:
model_name = "nlptown/bert-base-multilingual-uncased-sentiment"

model = TFAutoModelForSequenceClassification.from_pretrained(model_name, from_pt = True)
# This model only exists in PyTorch, so we use the `from_pt` flag to import that model in TensorFlow

tokenizer = AutoTokenizer.from_pretrained(model_name)

classifier = pipeline('sentiment-analysis', model=model, tokenizer = tokenizer)

All PyTorch model weights were used when initializing TFBertForSequenceClassification.

All the weights of TFBertForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForSequenceClassification for predictions without further training.


In [13]:
classifier("I am a Good Boy")

[{'label': '4 stars', 'score': 0.4229269027709961}]

# **distilbert-base-uncased-finetuned-sst-2-english**

This model can be used for topic classification.

In [14]:
model_name = "distilbert-base-uncased-finetuned-sst-2-english"

tf_model = TFAutoModelForSequenceClassification.from_pretrained(model_name)

tokenizer = AutoTokenizer.from_pretrained(model_name)

All PyTorch model weights were used when initializing TFDistilBertForSequenceClassification.

All the weights of TFDistilBertForSequenceClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.


## **Tokenizer**

In [15]:
inputs = tokenizer("We are very happy to show you the hugging face library")
inputs

{'input_ids': [101, 2057, 2024, 2200, 3407, 2000, 2265, 2017, 1996, 17662, 2227, 3075, 102], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}

## **Tokenizer with Padding**

In [20]:
tf_batch = tokenizer(["We are very happy to show you the hugging face library.", "We hope you don't hate it."],
                     padding = True,
                     truncation = True,
                     max_length = 512,
                     return_tensors = "tf")

In [23]:
for key,value in tf_batch.items():
  print(f"{key}: {value.numpy().tolist()}")

input_ids: [[101, 2057, 2024, 2200, 3407, 2000, 2265, 2017, 1996, 17662, 2227, 3075, 1012, 102], [101, 2057, 3246, 2017, 2123, 1005, 1056, 5223, 2009, 1012, 102, 0, 0, 0]]
attention_mask: [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0]]


## **Using Model**

In [24]:
tf_outputs = tf_model(tf_batch)
print(tf_outputs)

TFSequenceClassifierOutput(loss=None, logits=<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[-4.229283  ,  4.541509  ],
       [ 0.08181807, -0.04179327]], dtype=float32)>, hidden_states=None, attentions=None)


## **Predictions**

In [25]:
import tensorflow as tf
tf_predictions = tf.nn.softmax(tf_outputs[0], axis = 1)
print(tf_predictions)

tf.Tensor(
[[1.5517653e-04 9.9984479e-01]
 [5.3086352e-01 4.6913645e-01]], shape=(2, 2), dtype=float32)


In [26]:
import tensorflow as tf
tf_outputs = tf_model(tf_batch, labels = tf.constant([1, 0]))

In [27]:
print(tf_outputs)

TFSequenceClassifierOutput(loss=<tf.Tensor: shape=(2,), dtype=float32, numpy=array([1.5519845e-04, 6.3325030e-01], dtype=float32)>, logits=<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[-4.229283  ,  4.541509  ],
       [ 0.08181807, -0.04179327]], dtype=float32)>, hidden_states=None, attentions=None)


## **Save the Model**

In [None]:
tokenizer.save_pretrained(save_directory)
model.save_pretrained(save_directory)

## **Load Saved Model**

In [None]:
tokenizer = AutoTokenizer.from_pretrained(save_directory)
model = TFAutoModel.from_pretrained(save_directory, from_pt=True)