# DistilBERT / BERT - huggingface

***Frank Xu***

The DistilBERT model was proposed in the blog post Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT, and the paper DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. DistilBERT is a small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than bert-base-uncased, runs 60% faster while preserving over 95% of BERT’s performances as measured on the GLUE language understanding benchmark.

For specific documentation please refer to: https://huggingface.co/transformers/model_doc/distilbert.html

***Environment requirement:***

***Python*** with following libraries:

torch

torchvision

transformers (installed from github source)

In [1]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, BartTokenizer, BartForConditionalGeneration

ARTICLE_TO_SUMMARIZE = "The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct."
ARTICLE_TO_SUMMARIZE

'The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct.'

**Model 1: DistilBERT trained from cnn_dailymail dataset**

Model info: https://huggingface.co/sshleifer/distilbart-cnn-12-6

In [2]:
tokenizer1 = AutoTokenizer.from_pretrained("sshleifer/distilbart-cnn-12-6")  
model1 = AutoModelForSeq2SeqLM.from_pretrained("sshleifer/distilbart-cnn-12-6")

In [3]:
inputs = tokenizer1([ARTICLE_TO_SUMMARIZE], return_tensors='pt')
summary1 = model1.generate(inputs['input_ids'], early_stopping=True)
print([tokenizer1.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary1])

[' The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building . It was the first structure to reach a height of 300 metres . Excluding transmitters, it is the second tallest free-standing structure in France after the Millau Viaduct .']


**Model 2: trained from xsum dataset** (about 610 MB)

Model info: https://huggingface.co/sshleifer/distilbart-xsum-12-6

In [4]:
tokenizer2 = AutoTokenizer.from_pretrained("sshleifer/distilbart-xsum-12-6")  
model2 = AutoModelForSeq2SeqLM.from_pretrained("sshleifer/distilbart-xsum-12-6")

In [5]:
inputs = tokenizer2([ARTICLE_TO_SUMMARIZE], return_tensors='pt')
summary2 = model2.generate(inputs['input_ids'], early_stopping=True)
print([tokenizer2.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary2])

[' The Eiffel Tower in Paris has officially opened its doors to the public.']


**Model 3: Facebook's BART trained from cnn_dailymail dataset** (about 1.63 GB)

Model info: https://huggingface.co/facebook/bart-large-cnn

In [6]:
tokenizer3 = BartTokenizer.from_pretrained('facebook/bart-large-cnn')
model3 = BartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')

In [7]:
inputs = tokenizer3([ARTICLE_TO_SUMMARIZE], return_tensors='pt')
summary3 = model3.generate(inputs['input_ids'], early_stopping=True)
print([tokenizer3.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary3])

['The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world.']
