# Hugging Face Crash Course

Pre-training and fine-tuning is one of the dominant paradgims in natural language processing. Here, models are developed in two phases

1. train a large language model to develop generalizable language abilities
2. add a classifier on top (typically) that you train to be good at your specific objective 


This notebook is meant to help you get started with Hugging Face and the transformers library. This library is fairly easy to use, but does provide a high level of abstraction. If you plan to dive deeper into Hugging Faces, I recommend brushing up on NLP concepts to make sure you understand what's going on under the hood.

Also note that you will most likely run up against cloud.gov's memory limits when you install these packages and run this notebook.

In [1]:
from transformers import pipeline, AutoModelForSequenceClassification

## Auto Models

Auto Models are a generic class that will be instantiated with the pre-trained model you specify. In other words, this gives you easy access to BERT, RoBERTA, LLAMA, and other powerful LLMs

In [2]:
NUM_LABELS = 2

For demonstration purposes, we're telling the Auto Model class to add a classification layer with two possible labels. ```NUM_LABELS``` can be set to an arbitrary number of classes. 

In [3]:
bert_model = AutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels = NUM_LABELS)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [4]:
bert_model

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e-12,

In [5]:
distilbert = AutoModelForSequenceClassification.from_pretrained('distilbert/distilbert-base-uncased', num_labels = NUM_LABELS)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [6]:
distilbert

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
 

In [7]:
gpt = AutoModelForSequenceClassification.from_pretrained('openai-community/gpt2', num_labels = NUM_LABELS)

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at openai-community/gpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [8]:
gpt

GPT2ForSequenceClassification(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (score): Linear(in_features=768, out_features=2, bias=False)
)

## Pipelines

Pipelines are a quick way to use models for inference. These are helpful when you don't need to modify an off the shelf model, and just want to use it for inference. 

In addition to general-purpose LLMs, Hugging Face also hosts many models that have already been fine-tuned for specific tasks (e.g. sentiment classification, masked language modeling, text generation). 

Note that anyone can host a model on Hugging Face, make sure you know where your model is coming from, what kind of data it was trained on and its limitations.

### Sentiment Classification

In [9]:
sentiment = pipeline('text-classification', model = 'distilbert-base-uncased-finetuned-sst-2-english')

In [10]:
sentiment('I love natural language processing')

[{'label': 'POSITIVE', 'score': 0.9998213648796082}]

In [11]:
sentiment('I hate natural language processing')

[{'label': 'NEGATIVE', 'score': 0.9997296929359436}]

### Named Entity Recognition

In [12]:
ner = pipeline('ner', model = 'dbmdz/bert-large-cased-finetuned-conll03-english')

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [13]:
ner('I live in Washington DC in an apartment')

[{'entity': 'I-LOC',
  'score': 0.99932003,
  'index': 4,
  'word': 'Washington',
  'start': 10,
  'end': 20},
 {'entity': 'I-LOC',
  'score': 0.9992067,
  'index': 5,
  'word': 'DC',
  'start': 21,
  'end': 23}]

In [15]:
ner('President Joe Biden signed the AI Executive Order in October 2023')

[{'entity': 'I-PER',
  'score': 0.99860793,
  'index': 2,
  'word': 'Joe',
  'start': 10,
  'end': 13},
 {'entity': 'I-PER',
  'score': 0.9977203,
  'index': 3,
  'word': 'B',
  'start': 14,
  'end': 15},
 {'entity': 'I-PER',
  'score': 0.9935376,
  'index': 4,
  'word': '##iden',
  'start': 15,
  'end': 19},
 {'entity': 'I-MISC',
  'score': 0.9788887,
  'index': 7,
  'word': 'AI',
  'start': 31,
  'end': 33},
 {'entity': 'I-MISC',
  'score': 0.9170989,
  'index': 8,
  'word': 'Executive',
  'start': 34,
  'end': 43},
 {'entity': 'I-MISC',
  'score': 0.99059474,
  'index': 9,
  'word': 'Order',
  'start': 44,
  'end': 49}]