# Auto Class

* [Auto Classes](https://huggingface.co/docs/transformers/model_doc/auto) 

> In many cases, the architecture you want to use can be guessed from the name or the path of the pretrained model to the from_pretrained() method. Instantiating one of AutoConfig, AutoModel, and AutoTokenizer will directly create a class of the relevant architecture. For instance

NOTE: ```AutoModel``` is for PyTorch. because Huggingface started its implementation with PyTorch and later added TensorFlow support.

* [How to load a pretrained TF model using AutoModel? #2773](https://github.com/huggingface/transformers/issues/2773)

> It seems that ```AutoModel``` defaultly loads the pretrained PyTorch models, but how can I use it to load a pretrained TF model?  
> you should use ```TFAutoModel``` instead

```
# For PyTorch
model = AutoModel.from_pretrained("bert-base-cased")

# For TensorFlow
model = TFAutoModel.from_pretrained("bert-base-cased")

```


## Auto Tokenizer

You can use ```AutoTokenizer``` for **both** PyTorch and TensorFlow. Use **Fast** tokenizer wherever possible, not the Python tokenizers.


* [AutoTokenizer](https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoTokenizer)

> This is a generic tokenizer class that will be instantiated as one of the tokenizer classes of the library when created with the AutoTokenizer.from_pretrained() class method.

* [AutoTokenizer.from_pretrained](https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoTokenizer.from_pretrained)

> ```
> from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
> ```



In [1]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(
    pretrained_model_name_or_path='bigscience/bloom-560m', 
    use_fast=True
)

Downloading (…)okenizer_config.json:   0%|          | 0.00/222 [00:00<?, ?B/s]

Downloading tokenizer.json:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

# Configurations

## call

[```__call__```](https://huggingface.co/transformers/internal/tokenization_utils.html#transformers.tokenization_utils_base.PreTrainedTokenizerBase.__call__) method or ```tokenizer(input)``` geneates the ```token ids``` and ```attention masks``` to feed into the model.

Attention Masks are the flags to tell the model if the token should be used or ignored.




In [2]:
tokenized = tokenizer("A tokenizer is in charge of preparing the inputs for a model.")
print(type(tokenized))
print(tokenized)

<class 'transformers.tokenization_utils_base.BatchEncoding'>
{'input_ids': [36, 31346, 13502, 632, 361, 17817, 461, 105814, 368, 59499, 613, 267, 5550, 17], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}


## encode

[encode](https://huggingface.co/transformers/internal/tokenization_utils.html#transformers.tokenization_utils_base.PreTrainedTokenizerBase.encode) method generates the ```token ids``` without the Attention Masks.

In [3]:
ids = tokenizer.encode("A tokenizer is in charge of preparing the inputs for a model.")
print(ids)

[36, 31346, 13502, 632, 361, 17817, 461, 105814, 368, 59499, 613, 267, 5550, 17]


## decode

* [decode](https://huggingface.co/transformers/internal/tokenization_utils.html#transformers.tokenization_utils_base.PreTrainedTokenizerBase.decode) methods reverts the ```ids``` back to strings. 

In [4]:
tokenizer.decode(ids)

'A tokenizer is in charge of preparing the inputs for a model.'