# Different Precision

<div class="alert alert-info">

This tutorial is available as an IPython notebook at [Malaya/example/different-precision](https://github.com/huseinzol05/Malaya/tree/master/example/different-precision).
    
</div>

Read more at https://huggingface.co/docs/diffusers/optimization/fp16#half-precision-weights

In [1]:
%%time

import malaya
import logging
logging.basicConfig(level = logging.INFO)

CPU times: user 2.88 s, sys: 3.46 s, total: 6.34 s
Wall time: 2.21 s


  self.tok = re.compile(r'({})'.format('|'.join(pipeline)))
  self.tok = re.compile(r'({})'.format('|'.join(pipeline)))


In [2]:
import torch

In [3]:
# https://discuss.pytorch.org/t/finding-model-size/130275

def get_model_size_mb(model):
    param_size = 0
    for param in model.model.parameters():
        param_size += param.nelement() * param.element_size()
    buffer_size = 0
    for buffer in model.model.buffers():
        buffer_size += buffer.nelement() * buffer.element_size()
    return (param_size + buffer_size) / 1024**2

### Load default precision, FP32

In [5]:
model = malaya.translation.huggingface(model = 'mesolitica/translation-t5-small-standard-bahasa-cased')

Loading the tokenizer from the `special_tokens_map.json` and the `added_tokens.json` will be removed in `transformers 5`,  it is kept for forward compatibility, but it is recommended to update your `tokenizer_config.json` by uploading it again. You will see the new `added_tokens_decoder` attribute that will store the relevant information.
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. If you see this, DO NOT PANIC! This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [6]:
get_model_size_mb(model)

230.765625

In [7]:
model.generate(['i like chicken'])



['Saya suka ayam']

### Load FP16

**Only worked on GPU**.

In [9]:
model = malaya.translation.huggingface(model = 'mesolitica/translation-t5-small-standard-bahasa-cased',
                                            torch_dtype=torch.float16)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [10]:
get_model_size_mb(model)

139.3828125

### Load INT8

Required latest version `accelerate` and `bitsandbytes`,

```bash
pip3 install accelerate bitsandbytes
```

**Only worked on GPU**.

In [12]:
model = malaya.translation.huggingface(model = 'mesolitica/translation-t5-small-standard-bahasa-cased',
                                            load_in_8bit=True, device_map='auto')

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [13]:
get_model_size_mb(model)

109.3828125

In [14]:
model.generate(['i like chicken'])

['Saya suka ayam']

### Load INT4

Required latest version `accelerate` and `bitsandbytes`,

```bash
pip3 install accelerate bitsandbytes
```

**Only worked on GPU**.

In [15]:
model = malaya.translation.huggingface(model = 'mesolitica/translation-t5-small-standard-bahasa-cased',
                                            load_in_4bit=True, device_map='auto')

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [16]:
get_model_size_mb(model)

94.3828125

In [17]:
model.generate(['i like chicken'])

['Saya suka ayam']