Safetensors serialization by default #27064

LysandreJik · 2023-10-25T09:54:39Z

This PR aims to do one thing but is larger than expected. I'm happy to break it down into smaller PRs if it helps for reviewing.

This PR aims to switch safe serialization to True by default for torch models. In doing so, it revealed a few bugs in the existing implementation and safetensors support that this PR fixes.

Additionally, support for safetensors for Flax models is added so that models saved from PyTorch after merging this PR can be used in both TensorFlow and Flax, and for models saved from TensorFlow/Flax to be loaded in PyTorch models.

The following should be worked on shortly to enable switching to safetensors by default for TensorFlow and Flax as well:

There is no support for sharded weights in TensorFlow
There is no support for sharded weights in Flax

Additionally, I'll contribute some documentation making the following clear:

TensorFlow models can load models in safetensors saved from PyTorch and TensorFlow, but it cannot load them from Flax. This can be eventually worked on; meanwhile, I'll write this in the docs with workarounds to get models saved in Flax to work in TensorFlow for those interested.
Same, but for Flax models loaded from TensorFlow

Thanks, @Rocketknight1, for the help on TensorFlow's side.

HuggingFaceDocBuilderDev · 2023-10-25T10:16:49Z

The documentation is not available anymore as the PR was closed or merged.

LysandreJik · 2023-10-30T14:48:11Z

src/transformers/modeling_utils.py

+        if (
+            is_safetensors_available()
+            and isinstance(resolved_archive_file, str)
+            and resolved_archive_file.endswith(".safetensors")
+        ):
+            with safe_open(resolved_archive_file, framework="pt") as f:
+                metadata = f.metadata()
+
+            if metadata.get("format") == "pt":
+                pass
+            elif metadata.get("format") == "tf":
+                from_tf = True
+                logger.info("A TensorFlow safetensors file is being loaded in a PyTorch model.")
+            elif metadata.get("format") == "flax":
+                from_flax = True
+                logger.info("A Flax safetensors file is being loaded in a PyTorch model.")
+            else:
+                raise ValueError(
+                    f"Incompatible safetensors file. File metadata is not ['pt', 'tf', 'flax'] but {metadata.get('format')}"
+                )
+
+        from_pt = not (from_tf | from_flax)


this is necessary to enable loading safetensors files saved from TensorFlow/Jax into PyTorch models

src/transformers/modeling_flax_utils.py

Narsil

Very nice !

Not that bad for such a big change.

For the testing part I see loading in PT from PT/TF/Flax, but not the other ways

TF -TF
TF - Flax
TF - Pt
Flax - Flax
Flax - TF
Flax - Pt.

From you initial comment I understand it's not possible, but it's not entirely clear for me as to why (you mention sharded weights, is it the only restriction? If yes, from what I read it should be okay-ish to be able to at least load for those, no ?)

tests/models/auto/test_modeling_tf_auto.py

tests/test_modeling_utils.py

sanchit-gandhi

Thanks for working on this @LysandreJik!

src/transformers/modeling_flax_pytorch_utils.py

src/transformers/modeling_flax_utils.py

tests/test_modeling_flax_utils.py

src/transformers/modeling_flax_utils.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

LysandreJik · 2023-10-31T10:30:46Z

@Narsil, this is what is currently supported and not supported:

TF -TF - Supported, tested here:

transformers/tests/test_modeling_tf_utils.py

Lines 501 to 509 in a79600f

    
           def test_safetensors_tf_from_tf(self): 
        
               model = TFBertModel.from_pretrained("hf-internal-testing/tiny-bert-tf-only") 
        
               with tempfile.TemporaryDirectory() as tmp_dir: 
        
                   model.save_pretrained(tmp_dir, safe_serialization=True) 
        
                   new_model = TFBertModel.from_pretrained(tmp_dir) 
        
               for p1, p2 in zip(model.weights, new_model.weights): 
        
                   self.assertTrue(np.allclose(p1.numpy(), p2.numpy()))

TF - Flax - Not supported

TF - Pt - Supported, tested here:

transformers/tests/test_modeling_tf_utils.py

Lines 511 to 522 in a79600f

    
           @require_safetensors 
        
           @is_pt_tf_cross_test 
        
           def test_safetensors_tf_from_torch(self): 
        
               hub_model = TFBertModel.from_pretrained("hf-internal-testing/tiny-bert-tf-only") 
        
               model = BertModel.from_pretrained("hf-internal-testing/tiny-bert-pt-only") 
        
               with tempfile.TemporaryDirectory() as tmp_dir: 
        
                   model.save_pretrained(tmp_dir, safe_serialization=True) 
        
                   new_model = TFBertModel.from_pretrained(tmp_dir) 
        
               for p1, p2 in zip(hub_model.weights, new_model.weights): 
        
                   self.assertTrue(np.allclose(p1.numpy(), p2.numpy()))

Flax - Flax - Supported, tested here:

transformers/tests/test_modeling_flax_utils.py

Lines 236 to 244 in a79600f

    
           @require_safetensors 
        
           def test_safetensors_flax_from_flax(self): 
        
               model = FlaxBertModel.from_pretrained("hf-internal-testing/tiny-bert-flax-only") 
        
               with tempfile.TemporaryDirectory() as tmp_dir: 
        
                   model.save_pretrained(tmp_dir, safe_serialization=True) 
        
                   new_model = FlaxBertModel.from_pretrained(tmp_dir) 
        
               self.assertTrue(check_models_equal(model, new_model))

Flax - TF - Not supported

Flax - Pt - Supported, tested here:

transformers/tests/test_modeling_flax_utils.py

Lines 246 to 256 in a79600f

    
           @require_safetensors 
        
           @require_torch 
        
           def test_safetensors_flax_from_torch(self): 
        
               hub_model = FlaxBertModel.from_pretrained("hf-internal-testing/tiny-bert-flax-only") 
        
               model = BertModel.from_pretrained("hf-internal-testing/tiny-bert-pt-only") 
        
               with tempfile.TemporaryDirectory() as tmp_dir: 
        
                   model.save_pretrained(tmp_dir, safe_serialization=True) 
        
                   new_model = FlaxBertModel.from_pretrained(tmp_dir) 
        
               self.assertTrue(check_models_equal(hub_model, new_model))

From you initial comment I understand it's not possible, but it's not entirely clear for me as to why (you mention sharded weights, is it the only restriction? If yes, from what I read it should be okay-ish to be able to at least load for those, no ?)

I mention this in the PR description:

TensorFlow models can load models in safetensors saved from PyTorch and TensorFlow, but it cannot load them from Flax. This can be eventually worked on; meanwhile, I'll write this in the docs with workarounds to get models saved in Flax to work in TensorFlow for those interested.

It should be pretty straightforward to enable it, but I suspect extremely little usage for a TF <> Flax conversion where no PyTorch conversion exists. I'm planning to add this to the documentation and IMO we can work on it afterwards if there are requests.

amyeroberts

Very nice! 🔥

Just some small nits and Qs for my own understanding

src/transformers/models/vision_encoder_decoder/modeling_tf_vision_encoder_decoder.py

src/transformers/modeling_tf_utils.py

amyeroberts · 2023-10-31T16:32:38Z

tests/test_modeling_flax_utils.py

+    def test_safetensors_flax_from_sharded_msgpack_with_sharded_safetensors_hub(self):
+        # This should not raise even if there are two types of sharded weights
+        # This should discard the safetensors weights in favor of the msgpack sharded weights
+        FlaxBertModel.from_pretrained("hf-internal-testing/tiny-bert-flax-safetensors-msgpack-sharded")


Just to make sure I've understood correctly: TF and Flax models can't load sharded weights from safetensors. So, if this passes, we know the model has successfully loaded the msgpack sharded weights?

Yes that's exactly right! This was raised by Sanchit as a previous version of the implementation priorized safetensors, realized they were sharded, and errored-out; but if sharded msgpack are also in the repo, we would want to load these first

Thanks for explaining!

tests/test_modeling_tf_utils.py

src/transformers/modeling_flax_utils.py

amyeroberts · 2023-10-31T16:47:11Z

src/transformers/modeling_flax_utils.py

        # init random models
        model = cls(config, *model_args, _do_init=_do_init, **model_kwargs)

-        if from_pt:
+        if from_pt or safetensors_from_pt:


Exposing my lack of knowledge about safe tensors here: if safetensors_from_pt is True here then is the reason we do load_pytorch_checkpoint_in_flax_state_dict(model, resolved_archive_file, is_sharded) because the serialized weights are in "pytorch format" and therefore can't be loaded using cls.load_flax_weights?

Yes, that's correct! This way we call load_pytorch_checkpoint_in_flax_state_dict with the safetensors file, and that method checks if the file ends with .safetensors to load it the pytorch-way

Thanks for explaining!

amyeroberts · 2023-10-31T16:55:37Z

src/transformers/modeling_tf_utils.py

@@ -986,15 +988,15 @@ def load_tf_weights_from_safetensors(model, resolved_archive_file, ignore_mismat
    # Read the safetensors file
    with safe_open(resolved_archive_file, framework="tf") as safetensors_archive:
        mismatched_layers = []
-        weight_names = [format_weight_name(w.name, _prefix=_prefix) for w in model.weights]
+        weight_names = [strip_model_name_and_prefix(w.name, _prefix=_prefix) for w in model.weights]


As:

The previous function format_weight_name added _prefix, whereas the new function strip_model_name_and_prefix removes _prefix if it's present.

weight_names are compared to those in safetensors_archive

Can we load in previously saved safetensors (before this PR) into our TF models?

cc @Rocketknight1

@amyeroberts I believe the previous code was just completely incorrect! Essentially, the relevant workflow (saving TF composite encoder-decoder models as safetensors and then reloading the checkpoint in TF) was not being tested, and actually didn't work because of the name prefix bug.

As such, I don't think there's a backwards compatibility issue here, because previous checkpoints weren't working at all. My suspicion is that not many people were saving encoder-decoder models in TF, and not many TF users were saving safetensors, and so the intersection of that venn diagram was tiny enough that no-one noticed the bug for a long time!

Also, just to clarify: _prefix is almost always None or "" when this function is called, in which case the behaviour is unchanged after this bugfix. _prefix is only defined when loading composite models like EncoderDecoder, which is the workflow that was broken before this.

Thanks for explaining!

src/transformers/modeling_tf_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

LysandreJik · 2023-10-31T18:06:51Z

I will proceed to merge this and write a small explanatory doc tomorrow. I would like for the slow tests to run on this before the release.

Narsil · 2023-11-01T17:23:35Z

Awesome ! Thanks a LOT for this.

* Safetensors serialization by default * First pass on the tests * Second pass on the tests * Third pass on the tests * Fix TF weight loading from TF-format safetensors * Specific encoder-decoder fixes for weight crossloading * Add VisionEncoderDecoder fixes for TF too * Change filename test for pt-to-tf * One missing fix for TFVisionEncoderDecoder * Fix the other crossload test * Support for flax + updated tests * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Sanchit's comments * Sanchit's comments 2 * Nico's comments * Fix tests * cleanup * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: Matt <rocketknight1@gmail.com> Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Safetensors serialization by default

aa973a4

LysandreJik and others added 3 commits October 25, 2023 12:58

First pass on the tests

208ca04

Second pass on the tests

0583271

Third pass on the tests

e4ebba7

LysandreJik force-pushed the safetensors-by-default branch from 4398da0 to e4ebba7 Compare October 25, 2023 14:07

Rocketknight1 added 6 commits October 26, 2023 14:51

Fix TF weight loading from TF-format safetensors

ddacff6

Specific encoder-decoder fixes for weight crossloading

b6c1d2c

Add VisionEncoderDecoder fixes for TF too

2fcda30

Change filename test for pt-to-tf

522ced8

One missing fix for TFVisionEncoderDecoder

b5699ba

Fix the other crossload test

42534ea

LysandreJik force-pushed the safetensors-by-default branch from f86a14f to 66a896a Compare October 30, 2023 10:45

Support for flax + updated tests

4c09b62

LysandreJik force-pushed the safetensors-by-default branch from a680bd1 to 4c09b62 Compare October 30, 2023 14:45

LysandreJik requested review from Narsil, patrickvonplaten and sanchit-gandhi October 30, 2023 14:46

LysandreJik commented Oct 30, 2023

View reviewed changes

LysandreJik marked this pull request as ready for review October 30, 2023 14:50

LysandreJik commented Oct 30, 2023

View reviewed changes

src/transformers/modeling_flax_utils.py Show resolved Hide resolved

Narsil approved these changes Oct 30, 2023

View reviewed changes

tests/models/auto/test_modeling_tf_auto.py Show resolved Hide resolved

tests/test_modeling_utils.py Outdated Show resolved Hide resolved

sanchit-gandhi approved these changes Oct 30, 2023

View reviewed changes

LysandreJik and others added 3 commits October 31, 2023 10:01

Apply suggestions from code review

3ae3c62

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

Sanchit's comments

1bd29e0

Sanchit's comments 2

a79600f

LysandreJik added 3 commits October 31, 2023 11:31

Nico's comments

c979b85

Fix tests

a9636e2

cleanup

fa60654

amyeroberts approved these changes Oct 31, 2023

View reviewed changes

Apply suggestions from code review

db20c18

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

LysandreJik merged commit 113ebf8 into main Oct 31, 2023
21 checks passed

LysandreJik deleted the safetensors-by-default branch October 31, 2023 18:16

ydshieh mentioned this pull request Nov 8, 2023

Fix tiny model script: not using from_pt=True #27372

Merged

jitto mentioned this pull request Nov 9, 2023

save_split seems to be broken after transformers made safetensor serialization default aws-neuron/transformers-neuronx#55

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Safetensors serialization by default #27064

Safetensors serialization by default #27064

LysandreJik commented Oct 25, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 25, 2023 •

edited

Loading

LysandreJik Oct 30, 2023

Narsil left a comment

sanchit-gandhi left a comment

LysandreJik commented Oct 31, 2023

amyeroberts left a comment

amyeroberts Oct 31, 2023

LysandreJik Oct 31, 2023

amyeroberts Oct 31, 2023

amyeroberts Oct 31, 2023

LysandreJik Oct 31, 2023

amyeroberts Oct 31, 2023

amyeroberts Oct 31, 2023

LysandreJik Oct 31, 2023

Rocketknight1 Oct 31, 2023

Rocketknight1 Oct 31, 2023 •

edited

Loading

amyeroberts Oct 31, 2023

LysandreJik commented Oct 31, 2023

Narsil commented Nov 1, 2023

Safetensors serialization by default #27064

Safetensors serialization by default #27064

Conversation

LysandreJik commented Oct 25, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Oct 25, 2023 • edited Loading

Choose a reason for hiding this comment

Narsil left a comment

Choose a reason for hiding this comment

sanchit-gandhi left a comment

Choose a reason for hiding this comment

LysandreJik commented Oct 31, 2023

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Rocketknight1 Oct 31, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LysandreJik commented Oct 31, 2023

Narsil commented Nov 1, 2023

LysandreJik commented Oct 25, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 25, 2023 •

edited

Loading

Rocketknight1 Oct 31, 2023 •

edited

Loading