#  bf16, fp16 or fp32 pretraining detection

The goal is to autodetect if a model has been trained in bf16, fp16 or fp32 precision. We want this since we know that bf16-pretrained models tend to overflow when consequently finetuned with fp16 (mixed).

We know that fp16's max number is `2**16=65536`, so it should be easy to look at the weights and if they are large then the model has most likely be trained in other than fp16 precision (mixed or not).

Let's write a script to look at the absolute min/max values of any model's weights.

In [1]:
import torch
import logging
import transformers

In [2]:
from transformers import AutoModel

## Module weights abs min/max analyser

In [3]:
def abs_min_max(modules, verbose=True):
    """
    modules is a list of sub-modules to search recursively. 
    
    this can be the whole model, but sometimes only some submodules want to be inspected
    """
    if verbose:
        print("\nSearching:")
        print("module | params")
    abs_min, abs_max = 10000, 0
    for i,m in enumerate(modules):
        for j,p in enumerate(m.parameters(recurse=True)):
            p_abs = p.abs()
            p_abs_max = p_abs.max().item()
            p_abs_min = p_abs.min().item()
            if p_abs_min < abs_min: abs_min = p_abs_min
            if p_abs_max > abs_max: abs_max = p_abs_max
        if verbose:
            print(f"{i:>6} | {j}")
    return abs_min, abs_max

the only concern I have here is that some models when trained in mixed precision may have some segment trained in fp32 and may end up with larger weights, though it is very unlikely since these then have to interact with the rest of the system. But more thought is needed.

In [11]:
# Let's look at t5-small in verbose mode
model = AutoModel.from_pretrained("t5-small")

# let's look at just transformer blocks
abs_min, abs_max = abs_min_max([model.encoder.block, model.decoder.block])
print("\nResults:")
print("abs min   | abs max")
print(f"{abs_min:.3e} | {abs_max:.3e}")

# now the whole model
abs_min, abs_max = abs_min_max([model])
print("\nResults:")
print("abs min   | abs max")
print(f"{abs_min:.3e} | {abs_max:.3e}")

del model


Searching:
module | params
     0 | 48
     1 | 78

Results:
abs min   | abs max
5.442e-09 | 6.850e+01

Searching:
module | params
     0 | 130

Results:
abs min   | abs max
5.442e-09 | 7.920e+02


## Multiple model weights abs min/max analyser

Now let's write a nice wrapper to process many models

In [5]:
def models_abs_min_max(mnames):
    transformers.logging.set_verbosity_error() # be quiet
    print(f"{'name':^40} | {'abs min':^9} | {'abs max':^9} ")
    print(f"{'-'*40}-|-{'-'*9}-|-{'-'*9}-")
    for mname in mnames:
        model = AutoModel.from_pretrained(mname)
        abs_min, abs_max = abs_min_max([model], verbose=False)
        print(f"{mname:<40} | {abs_min:.3e} | {abs_max:.3e}")
        del model

## bf16 models

Let's look at bf16-pretrained models

In [6]:
# bf16-pretrained models
mnames = ["t5-small", "t5-base", "t5-large", "google/mt5-small", "google/mt5-base", "google/mt5-large",
          "google/bigbird-pegasus-large-arxiv", "google/pegasus-cnn_dailymail", "google/pegasus-large", "google/pegasus-multi_news", "google/pegasus-xsum"
]
models_abs_min_max(mnames)

                  name                   |  abs min  |  abs max  
-----------------------------------------|-----------|-----------
t5-small                                 | 5.442e-09 | 7.920e+02
t5-base                                  | 1.273e-10 | 5.600e+02
t5-large                                 | 3.638e-11 | 5.200e+02
google/mt5-small                         | 3.201e-09 | 1.140e+02
google/mt5-base                          | 1.848e-09 | 1.135e+02
google/mt5-large                         | 1.892e-10 | 1.750e+02
google/bigbird-pegasus-large-arxiv       | 0.000e+00 | 2.424e+02
google/pegasus-cnn_dailymail             | 0.000e+00 | 2.416e+02
google/pegasus-large                     | 0.000e+00 | 2.417e+02
google/pegasus-multi_news                | 0.000e+00 | 2.412e+02
google/pegasus-xsum                      | 0.000e+00 | 2.418e+02


We can see big abs max weight values - pretty consistently - so perhaps if the max weight > 1e2 it's a good candidate for bf16 group.

## fp16 models

Let's look at fp16-pretrained models

In [7]:
# fp16-pretrained models
mnames = ["allenai/longformer-base-4096", "allenai/longformer-large-4096", 
          "allenai/led-base-16384", "allenai/led-large-16384", "lvwerra/codeparrot"
         ]
models_abs_min_max(mnames)

                  name                   |  abs min  |  abs max  
-----------------------------------------|-----------|-----------
allenai/longformer-base-4096             | 0.000e+00 | 1.510e+00
allenai/longformer-large-4096            | 0.000e+00 | 1.146e+00
allenai/led-base-16384                   | 0.000e+00 | 1.600e+01
allenai/led-large-16384                  | 0.000e+00 | 2.320e+01
lvwerra/codeparrot                       | 6.578e-12 | 1.671e+00


So we can see the fp16 abs max weights are quite small - they are in the range of 1e0 - 1e1.

"led" ones are oddly pretty high. they are supposed to be the same as longformer, which are fp16.

## fp32 models

Let's look at fp32-pretrained models

In [None]:
# fp32-pretrained models
mnames = ["gsarti/it5-small", "gsarti/it5-base", "gsarti/it5-base-oscar", "gsarti/it5-large", "EleutherAI/gpt-neo-2.7B", ]
models_abs_min_max(mnames)

                  name                   |  abs min  |  abs max  
-----------------------------------------|-----------|-----------
gsarti/it5-small                         | 6.114e-08 | 4.693e+02
gsarti/it5-base                          | 1.068e-08 | 1.598e+03
gsarti/it5-base-oscar                    | 3.638e-12 | 2.092e+01
gsarti/it5-large                         | 2.094e-09 | 4.388e+04


Big abs max numbers

XXX: I suspect "EleutherAI/gpt-neo-2.7B" is in the wrong category as its abs max is very low. need more data inputs.

XXX: need more inputs

## Unknown models

Let's look at some uknown models

In [9]:
# fp32? (XXX: need to check)
mnames = ["bigscience/T0_3B"] 
# mnames = ["bigscience/T0pp", "bigscience/T0_3B"] "bigscience/T0pp" is huge!
models_abs_min_max(mnames)

                  name                   |  abs min  |  abs max  
-----------------------------------------|-----------|-----------
bigscience/T0_3B                         | 5.755e-13 | 1.680e+02


need to check how it was trained - looks like bf16 to me

In [14]:
# fp32? (XXX: need to check)
#mnames = ["google/pegasus-pubmed"] 
#mnames = ["EleutherAI/gpt-neo-1.3B"] 
# mnames = ["bigscience/T0pp", "bigscience/T0_3B"] "bigscience/T0pp" is huge!

#mnames = [""] 
#models_abs_min_max(mnames)