[model loading] framework-agnostic dtype parameter #13246

stas00 · 2021-08-24T22:37:45Z

This is a split off from one of the discussions at #13209:

It all started with trying to load torch models under either the desired dtype or the the dtype of the pretrained model - and thus avoid 2x memory usage needs e.g. if the model needs to be just fp16. So we added torch_dtype to from_pretrained and from_config.
Then we started storing torch_dtype in the config file for future possibly automatic loading model in the optimal "regime".
This resulted in a discrepancy where the same symbol sometimes means torch.dtype at other times a string like "float32" as we can't store torch.dtype in json.
then in fix AutoModel.from_pretrained(..., torch_dtype=...) #13209 (comment) we started discussing how dtype is really the same across pt/tf/flux and perhaps we should just use dtype in the config and variables and have it consistently to be a string ("float32") and convert it to the right dtype object of the desired framework at the point of use, e.g. getattr(torch, "float32")

A possible solution is to deprecate torch_dtype and replace it with dtype string both in config and in the function argument.

Possible conflicts with the naming:

we already have the dtype attribute in modeling_utils, which returns torch.dtype based on the first param's dtype.

https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_utils.py#L205

The context is different, but still this is something to consider to avoid ambiguity.

I may have missed some other areas. So please share if something else needs to be added.

Additional notes:

wrt flux: fix AutoModel.from_pretrained(..., torch_dtype=...) #13209 (comment)

#13098 - the idea of the PR is exactly to disentangle parameter dtype from matmul/computation dtype. In Flax, it's common practice that the dtype parameter defines the matmul/computation dtype, see: https://flax.readthedocs.io/en/latest/_autosummary/flax.linen.Dense.html#flax.linen.Dense.dtype instead of the parameter dtype and not the parameter dtype.
So for Flax, I don't really think it would make sense to use a config.dtype to define weights dtype as it would be quite confusing with Flax's computation dtype parameter.

@LysandreJik, @sgugger, @patrickvonplaten

The text was updated successfully, but these errors were encountered:

LysandreJik · 2021-08-25T15:41:29Z

Would like to ping @Rocketknight1 regarding the TensorFlow management of types, and @patil-suraj for flax

Rocketknight1 · 2021-08-26T17:34:10Z

This should work in Tensorflow too - you can use tf.dtypes.as_dtype(dtype_string) to turn strings into TF dtype objects.

Joy-Lunkad · 2021-09-02T12:53:49Z

@Rocketknight1 Sorry, but can you please elaborate on how to load the model in Tensorflow or point me in the right direction? I am new to hugging face and I have been looking all over for instructions on how to do it. Thank you.

stas00 mentioned this issue Aug 24, 2021

fix AutoModel.from_pretrained(..., torch_dtype=...) #13209

Merged

huggingface deleted a comment from github-actions bot Sep 27, 2021

stas00 added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Sep 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[model loading] framework-agnostic dtype parameter #13246

[model loading] framework-agnostic dtype parameter #13246

stas00 commented Aug 24, 2021

LysandreJik commented Aug 25, 2021

Rocketknight1 commented Aug 26, 2021

Joy-Lunkad commented Sep 2, 2021 •

edited

Loading

[model loading] framework-agnostic dtype parameter #13246

[model loading] framework-agnostic dtype parameter #13246

Comments

stas00 commented Aug 24, 2021

LysandreJik commented Aug 25, 2021

Rocketknight1 commented Aug 26, 2021

Joy-Lunkad commented Sep 2, 2021 • edited Loading

Joy-Lunkad commented Sep 2, 2021 •

edited

Loading