# Using pretrained models (PyTorch)

The explanation of this notebook is in the Hugging Face course, chapter 4, section 2: [Using pretrained models](https://huggingface.co/course/chapter4/2?fw=pt)

The original code of this notebook is in the Hugging Face's SageMaker repository: [section2_pt.ipynb](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter4/section2_pt.ipynb)

## Run conditions

This notebook has been tested in the following environment:
- Environment: Project created in [Paperspace Gradient](https://gradient.paperspace.com) with Python 3.9.13.
- Machine: P5000 (30GiB RAM 8 CPU 16GiB GPU) (more details on [Paperspace Machines](https://docs.paperspace.com/gradient/machines/)).
- IDE: Visual Studio Code using remote Jupyter server.

## Install dependencies

Install the Transformers, Datasets, and Evaluate libraries to run this notebook.

In [2]:
# Install the libraries datasets v2.7.1, evaluate v0.3.0, and transformers v4.25.1 with quiet and upgrade flags.
%pip install -q datasets==2.7.1 evaluate==0.3.0 transformers==4.25.1 --upgrade

Note: you may need to restart the kernel to use updated packages.


## Using the pipeline() function

In [3]:
# Import pipeline from Transformers
from transformers import pipeline

# Create a pipeline for fill mask with Camembert model.
fill_mask = pipeline(
    "fill-mask",
    model="camembert-base"
)
# Fill the mask with a sentence.
fill_mask("Le camembert est un fromage <mask>.")


  from .autonotebook import tqdm as notebook_tqdm
Downloading: 100%|██████████| 508/508 [00:00<00:00, 146kB/s]
Downloading: 100%|██████████| 445M/445M [00:12<00:00, 35.7MB/s] 
Downloading: 100%|██████████| 811k/811k [00:00<00:00, 1.34MB/s]
Downloading: 100%|██████████| 1.40M/1.40M [00:00<00:00, 2.50MB/s]


[{'score': 0.1250426024198532,
  'token': 4364,
  'token_str': 'suisse',
  'sequence': 'Le camembert est un fromage suisse.'},
 {'score': 0.08812045305967331,
  'token': 430,
  'token_str': 'français',
  'sequence': 'Le camembert est un fromage français.'},
 {'score': 0.06494113802909851,
  'token': 5060,
  'token_str': 'traditionnel',
  'sequence': 'Le camembert est un fromage traditionnel.'},
 {'score': 0.037365105003118515,
  'token': 19456,
  'token_str': 'fermier',
  'sequence': 'Le camembert est un fromage fermier.'},
 {'score': 0.035099998116493225,
  'token': 875,
  'token_str': 'blanc',
  'sequence': 'Le camembert est un fromage blanc.'}]

In [4]:
# Import CamembertTokenizer and CamembertForMaskedLM from Transformers.
from transformers import CamembertTokenizer, CamembertForMaskedLM

# Load the tokenizer and the model.
tokenizer = CamembertTokenizer.from_pretrained("camembert-base")
model = CamembertForMaskedLM.from_pretrained("camembert-base")