### Transfomers with Hugging Face

In this chapter, we will be discussing the Hugging Face Transformers library in Python and go through examples to demonstrate how to use pretrained models to perform NLP tasks.

Topics:

- Hugging Face
- NLP applications:
  - Sentiment analysis
  - Named Entity Recognition - Automatically pull key terms from text
  - Zero-Shot Classifications -Classify text without labels
  - Text Summarization
  - Text Generation
  - Document Similarity

#### Hugging Face

Hugging Face is the company that created the Transformers Python Library. It is popular because it makes it easy for data professionals to access and utilize pretrained LLMs.

They also host the Model Hub, which contains over 1 million pretrained, open-source models (in addition to base models, there are variants, fine-tuned models, experimental models, etc.).

Note: GitHub is the community for uploading and sharing code, and Model Hub is the community for uploading and sharing pretrained models.

Hugging Face workflow:

- Determine the goal:  Different types of tasks use different types of LLMs.
  - Encoder-Only:  Sentiment Analysis, Named Entity Recognition
  - Decoder-Only: Text Generalization
  - Encoder-Decoder: Zero-Shot Classification, Text Summarization
  - Embedding: Document Similarity

- Identify a pretrained model (from Hugging Face's Model Hub): In Model Hub, we can sort models based on popularity, downloads, and so on.


- Specify input data (a single string, a series/column of text,...)

- Apply the pretrained model (input data and view the output(s))

Note: There are additional steps in case we want to optimize the model for our data.

For the rest of the work, we are creating a new environment called "nlp_transformers" and install following packages:

- python
- Jupyter Notebook
- pandas
- numpy
- scikit-learn
- openpyxl (to read Excel files)
- transformers
- PyTorch (to run transformers. Alternative: Tensorflow)

Note: If you are running into any issues when installing packages, we can try the 'pip' command to install Jupyter Notebook, transformers, and pytorch.

#### Sentiment Analysis with LLMs

Remember that with sentiment analysis, we are determining the positivity/negativity of text.

The default LLM for sentiment analysis is `DistilBERT` (encoder-only). This is a variant of `BERT,` and as a new version, it has fewer parameters and run fast.

Syntax:

`from transformers import pipeline`

`sentiment_analyzer = pipeline('sentiment-analysis',`
                                `model = "distilBERT/distilbert-...",`
                                `device=-1)`

The pipeline module allows us to specify the task we are planning to perform.

`'sentiment-analysis'`: task

`model = "distilBERT/distilbert-..."`: The long code is coming from Hugging Face Model Hub (We are choosing a particular pretrained model)

`device=-1`: This means we are only using the CPU in the computer. We can switch this to use the GPU in the computer.

In [1]:
import pandas as pd

In [2]:
### Read Data

df = pd.read_excel('Chapter6_Popchip_Reviews_Sentiment.xlsx')
df.head(3)

Unnamed: 0,Id,UserId,Rating,Priority,Title,Text,Sentiment_VADER
0,23689,A21SYVGVNG8RAS,5,Low,Yummy snacks!,Popchips are the bomb!! I use the parmesan ga...,0.9244
1,23690,AQJYXC0MPRQJL,5,Low,Great chip that is different from the rest,I like the puffed nature of this chip that mak...,0.7269
2,23691,A30NYUHEDLWI0Y,5,High,Great Alternative to Potato Chips,I just love these chips! I was always a big f...,0.979


In [4]:
### Note: Only part of the "Text" is visible. We can change the column width to show complete text.

pd.set_option('display.max_colwidth', None)

In [5]:
df.head(3)

Unnamed: 0,Id,UserId,Rating,Priority,Title,Text,Sentiment_VADER
0,23689,A21SYVGVNG8RAS,5,Low,Yummy snacks!,Popchips are the bomb!! I use the parmesan garlic to scoop up cottage cheese as a healthy alternative to chips and dip. My healthy eating program is saved.,0.9244
1,23690,AQJYXC0MPRQJL,5,Low,Great chip that is different from the rest,"I like the puffed nature of this chip that makes it more unique in the chip market. I ordered the Salt and Vinegar and absolutely love that flavor, hands down my favorite chip ever. I have tried the cheddar and regular flavors as well. The cheddar is about a 4/5 and the regular is about a 3/5 because I prefer strong flavors and obviously that would not be the case for the regular. The Salt and Vinegar is kind of weak compared to some regular S&V chips, but is quite flavorful and makes you wanting to come back for more.",0.7269
2,23691,A30NYUHEDLWI0Y,5,High,Great Alternative to Potato Chips,"I just love these chips! I was always a big fan of potato chips, but haven't had one since I discovered popchips. They are great for dipping or all alone. I am constantly re-ordering them. One note however-if you are on a low salt diet these chips are probably not for you. They are high in sodium. We go through a case every two months. If you love them it pays to join the subscribe and save program through Amazon. You save money and stay supplied!",0.979
