<a href="https://colab.research.google.com/github/gulabpatel/Anomaly_Detection_Python/blob/main/HuggingFace_Transformer_Pipeline_5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![Hugging Face](https://miro.medium.com/max/2000/1*Z4mGaMsu34LfyE76QAi9qA.png)


# Using HuggingFace

In this tutorial, we will learn how to use the various functionalities offered by [HuggingFace](https://huggingface.co/).

## About Hugging Face

- HuggingFace is 'On a mission to solve NLP, One commit at a time', as per their tagline. 

- The HuggingFace Transformer library is closing on 26K Stars on GitHub now and provides state-of-the-art Transformer Based Models, their pretrained weights and a lots more (as we will see today)

- They recently released their Tokenisers library


They have originally used Rust, so that's an added advantage.


Help is available [here](https://www.youtube.com/watch?v=B5M_F9dYHOM)

In [1]:
!pip install transformers



# Pipeline

https://github.com/huggingface/transformers#quick-tour-of-pipelines

In [2]:
from transformers import pipeline

## Sentiment Analysis

In [3]:
nlp = pipeline('sentiment-analysis')
nlp('We are very happy to include pipeline into the transformers repository.')

[{'label': 'POSITIVE', 'score': 0.9978193640708923}]

## Question Answering

In [4]:
nlp = pipeline('question-answering')
nlp({
    'question': 'What is my name ?',
    'context': 'My name is Rachit, I work at HuggingFace'
})

{'answer': 'Rachit', 'end': 17, 'score': 0.9968027472496033, 'start': 11}

## Predicting Masks

In [5]:
nlp = pipeline('fill-mask')
nlp('I hope you <mask> this video')

[{'score': 0.7073913812637329,
  'sequence': 'I hope you enjoyed this video',
  'token': 3776,
  'token_str': ' enjoyed'},
 {'score': 0.1367352455854416,
  'sequence': 'I hope you enjoy this video',
  'token': 2254,
  'token_str': ' enjoy'},
 {'score': 0.1335318684577942,
  'sequence': 'I hope you liked this video',
  'token': 6640,
  'token_str': ' liked'},
 {'score': 0.005779118277132511,
  'sequence': 'I hope you like this video',
  'token': 101,
  'token_str': ' like'},
 {'score': 0.005615219473838806,
  'sequence': 'I hope you appreciated this video',
  'token': 10874,
  'token_str': ' appreciated'}]

## NER

In [6]:
nlp = pipeline('ner')
nlp('It is me, Rachit, I work at HuggingFace')

[{'end': 12,
  'entity': 'I-PER',
  'index': 5,
  'score': 0.9992461800575256,
  'start': 10,
  'word': 'Ra'},
 {'end': 15,
  'entity': 'I-PER',
  'index': 6,
  'score': 0.988210916519165,
  'start': 12,
  'word': '##chi'},
 {'end': 16,
  'entity': 'I-PER',
  'index': 7,
  'score': 0.991302490234375,
  'start': 15,
  'word': '##t'},
 {'end': 30,
  'entity': 'I-ORG',
  'index': 12,
  'score': 0.9990648031234741,
  'start': 28,
  'word': 'Hu'},
 {'end': 35,
  'entity': 'I-ORG',
  'index': 13,
  'score': 0.9934505820274353,
  'start': 30,
  'word': '##gging'},
 {'end': 36,
  'entity': 'I-ORG',
  'index': 14,
  'score': 0.9981080889701843,
  'start': 35,
  'word': '##F'},
 {'end': 39,
  'entity': 'I-ORG',
  'index': 15,
  'score': 0.9968755841255188,
  'start': 36,
  'word': '##ace'}]

In [7]:
nlp.model.save_pretrained('...')

# Text Generation

In [8]:
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel

In [9]:
tokeniser = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')

## Single Word Prediction

In [10]:
text = "let us see how this turns"
indexed_tokens = tokeniser.encode(text)
tokens_tensor = torch.tensor([indexed_tokens])
model.eval()
tokens_tensor = tokens_tensor.to('cuda')
model.to('cuda')

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0): Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
      (1): Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): Laye

In [11]:
with torch.no_grad():
  outputs = model(tokens_tensor)
  predictions = outputs[0]

print(outputs[0].shape)

predicted_index = torch.argmax(predictions[0, -1, :]).item()
predicted_text = tokeniser.decode(indexed_tokens + [predicted_index])

print(predicted_text)

torch.Size([1, 6, 50257])
let us see how this turns out


## Looping for multi-word

In [12]:
chars = 0
text = "i am very exited to present to you this"
while chars<50:
  chars += 1
  indexed_tokens = tokeniser.encode(text)
  tokens_tensors = torch.tensor([indexed_tokens])
  tokens_tensors = tokens_tensors.to('cuda')
#  model = model.to('cuda')
  with torch.no_grad():
    outputs = model(tokens_tensors)
    predictions = outputs[0]
  predicted_index = torch.argmax(predictions[0,-1,:]).item()
  text = tokeniser.decode(indexed_tokens + [predicted_index])

print(text)

i am very exited to present to you this new book. I am very excited to share with you the first chapter of the book, "The Secret of the Soul."


The Secret of the Soul is a book that I have been reading for over a year now. I have been


## OR

In [13]:
!git clone https://github.com/huggingface/pytorch-transformers.git

fatal: destination path 'pytorch-transformers' already exists and is not an empty directory.


In [14]:
!python pytorch-transformers/examples/text-generation/run_generation.py \
    --model_type=gpt2 \
    --length=100 \
    --model_name_or_path=gpt2 \

04/03/2021 15:58:14 - INFO - __main__ -   Namespace(device=device(type='cuda'), fp16=False, k=0, length=100, model_name_or_path='gpt2', model_type='gpt2', n_gpu=1, no_cuda=False, num_return_sequences=1, p=0.9, padding_text='', prefix='', prompt='', repetition_penalty=1.0, seed=42, stop_token=None, temperature=1.0, xlm_language='')
Model prompt >>> i am very excited to present to you this
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
=== GENERATED SEQUENCE 1 ===
2021-04-03 15:58:30.511049: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
i am very excited to present to you this blend!

"From the moment you had the opportunity to sample the original Mjuushin no Čve ( A New World Tea), you immediately became hooked!"

This 3oz. glass bottle from Punglish brings a massive candy coating to this beast of a lemonade stand, something that piques your interest. In the past I've tried the unsweetened f

# Summarising text using HuggingFace

In [15]:
text = 'Shakespeare occupies a position unique in world literature. Other poets, such as Homer and Dante, and novelists, such as Leo Tolstoy and Charles Dickens, have transcended national barriers, but no writer’s living reputation can compare to that of Shakespeare, whose plays, written in the late 16th and early 17th centuries for a small repertory theatre, are now performed and read more often and in more countries than ever before. The prophecy of his great contemporary, the poet and dramatist Ben Jonson, that Shakespeare “was not of an age, but for all time,” has been fulfilled. It may be audacious even to attempt a definition of his greatness, but it is not so difficult to describe the gifts that enabled him to create imaginative visions of pathos and mirth that, whether read or witnessed in the theatre, fill the mind and linger there. He is a writer of great intellectual rapidity, perceptiveness, and poetic power. Other writers have had these qualities, but with Shakespeare the keenness of mind was applied not to abstruse or remote subjects but to human beings and their complete range of emotions and conflicts. Other writers have applied their keenness of mind in this way, but Shakespeare is astonishingly clever with words and images, so that his mental energy, when applied to intelligible human situations, finds full and memorable expression, convincing and imaginatively stimulating. As if this were not enough, the art form into which his creative energies went was not remote and bookish but involved the vivid stage impersonation of human beings, commanding sympathy and inviting vicarious participation. Thus, Shakespeare’s merits can survive translation into other languages and into cultures remote from that of Elizabethan England.'

In [16]:
from transformers import BartTokenizer, BartForConditionalGeneration, BartConfig

model = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn')

In [17]:
inputs = tokenizer.batch_encode_plus([text], max_length=1024, return_tensors='pt')

summary_ids = model.generate(inputs['input_ids'], num_beams=4, max_length=100, early_stopping=False)

for ids in summary_ids:
    short = tokenizer.decode(ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)

    print(len(text), len(short))
    print(short)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


1761 341
Shakespeare occupies a position unique in world literature. He is a writer of great intellectual rapidity, perceptiveness, and poetic power. Other writers have had these qualities, but with Shakespeare the keenness of mind was applied not to abstruse or remote subjects but to human beings and their complete range of emotions and conflicts.


HuggingFace Tranformers: https://github.com/huggingface/transformers

BART: https://arxiv.org/abs/1910.13461

Curious Case of Neural Text Degeneration: https://arxiv.org/abs/1904.09751