<a href="https://colab.research.google.com/github/rahiakela/huggingface-transformers-practice/blob/main/huggingface-course/04-sharing-model-and-tokenizers/1_using_pretrained_models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using pretrained models

The Model Hub makes selecting the appropriate model simple, so that using it in any downstream library can be done in a few lines of code. Let’s take a look at how to actually use one of these models, and how to contribute back to the community.

Let’s say we’re looking for a French-based model that can perform mask filling.


<img src='https://huggingface.co/course/static/chapter4/camembert.gif?raw=1' width='800'/>

We select the `camembert-base` checkpoint to try it out. The identifier `camembert-base` is all we need to start using it! As you’ve seen in previous chapters, we can instantiate it using a `pipeline`:

Install the Transformers and Datasets libraries to run this notebook.

In [None]:
!pip -q install datasets transformers[sentencepiece]

In [4]:
from transformers import pipeline 

camembert_fill_mask  = pipeline("fill-mask", model="camembert-base")
results = camembert_fill_mask("Le camembert est <mask> :)")

results

[{'score': 0.49091005325317383,
  'sequence': 'Le camembert est délicieux :)',
  'token': 7200,
  'token_str': 'délicieux'},
 {'score': 0.1055697426199913,
  'sequence': 'Le camembert est excellent :)',
  'token': 2183,
  'token_str': 'excellent'},
 {'score': 0.03453313186764717,
  'sequence': 'Le camembert est succulent :)',
  'token': 26202,
  'token_str': 'succulent'},
 {'score': 0.0330314114689827,
  'sequence': 'Le camembert est meilleur :)',
  'token': 528,
  'token_str': 'meilleur'},
 {'score': 0.03007650189101696,
  'sequence': 'Le camembert est parfait :)',
  'token': 1654,
  'token_str': 'parfait'}]

As you can see, loading a model within a `pipeline` is extremely simple. The only thing you need to watch out for is that the chosen checkpoint is suitable for the task it’s going to be used for. 

For example, here we are loading the `camembert-base` checkpoint in the `fill-mask` pipeline, which is completely fine. But if we were to load this checkpoint in the `text-classification` pipeline, the results would not make any sense because the head of `camembert-base` is not suitable for this task! We recommend using the task selector in the Hugging Face Hub interface in order to select the appropriate checkpoints:

<img src='https://huggingface.co/course/static/chapter4/tasks.png?raw=1' width='800'/>

You can also instantiate the checkpoint using the model architecture directly:

In [5]:
from transformers import CamembertTokenizer, CamembertForMaskedLM 

tokenizer = CamembertTokenizer.from_pretrained("camembert-base")
model = CamembertForMaskedLM.from_pretrained("camembert-base")

However, we recommend using the `Auto*` classes instead, as these are by design architecture-agnostic. While the previous code sample limits users to checkpoints loadable in the CamemBERT architecture, using the `Auto*` classes makes switching checkpoints simple:

In [6]:
from transformers import AutoTokenizer, AutoModelForMaskedLM 

tokenizer = AutoTokenizer.from_pretrained("camembert-base")
model = AutoModelForMaskedLM.from_pretrained("camembert-base")

>When using a pretrained model, make sure to check how it was trained, on which datasets, its limits, and its biases. All of this information should be indicated on its model card.