Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: Flatten nested outputs in legacy keras2 .h5 models #19669

Closed
wants to merge 1 commit into from

Conversation

torzdf
Copy link

@torzdf torzdf commented May 5, 2024

There are several bugs when loading legacy Keras 2.x models into Keras 3 with slightly esoteric layouts. This particular bug is the 'easy' one to patch.

The problem

Effectively, Keras 2 allowed for nested outputs. whilst Keras 3 does not. When creating the following model:

from keras import Input, Model
from keras.layers import Dense

inputs = Input((4, ))

dense1 = Dense(2)(inputs)
dense2 = Dense(2)(dense1)
dense3 = Dense(2)(dense2)

model = Model(inputs, outputs = [[dense1, dense2], dense3])

This model successfully builds in Keras 2, but (correctly) does not in Keras 3:

ValueError: When providing `outputs` as a list/tuple, all values in the list/tuple must be KerasTensors

The impact of this, when loading a legacy Keras 2 .h5 model with nested outputs into Keras 3 is the following:

Traceback (most recent call last):
  File "/mnt/Data/git/keras/bug2.py", line 17, in <module>
    legacy = load_model("legacy_nested_outputs.h5")
  File "/mnt/Data/git/keras/keras/src/saving/saving_api.py", line 183, in load_model
    return legacy_h5_format.load_model_from_hdf5(
  File "/mnt/Data/git/keras/keras/src/legacy/saving/legacy_h5_format.py", line 133, in load_model_from_hdf5
    model = saving_utils.model_from_config(
  File "/mnt/Data/git/keras/keras/src/legacy/saving/saving_utils.py", line 95, in model_from_config
    return serialization.deserialize_keras_object(
  File "/mnt/Data/git/keras/keras/src/legacy/saving/serialization.py", line 495, in deserialize_keras_object
    deserialized_obj = cls.from_config(
  File "/mnt/Data/git/keras/keras/src/models/model.py", line 517, in from_config
    return functional_from_config(
  File "/mnt/Data/git/keras/keras/src/models/functional.py", line 579, in functional_from_config
    output_tensors = map_tensors(config["output_layers"])
  File "/mnt/Data/git/keras/keras/src/models/functional.py", line 573, in map_tensors
    tensor_list = [get_tensor(*v) for v in tensors]
  File "/mnt/Data/git/keras/keras/src/models/functional.py", line 573, in <listcomp>
    tensor_list = [get_tensor(*v) for v in tensors]
TypeError: functional_from_config.<locals>.get_tensor() missing 1 required positional argument: 'tensor_index'

It is hard to create a repeatable gist for this one, as it relies on the model being created in Keras 2 and then being loaded in Keras 3. Unfortunately Colab scrubs the environment when you look to upgrade keras, losing our legacy saved .h5 file. The attached gist includes the output of model.get_config() in Keras 2.x to demonstrate the issue: https://colab.research.google.com/drive/1w3ZPOl5YsuBXobIATakgLVKFnRlCE53e?usp=sharing

The Fix

The fix is implemented in the code that is in place to load legacy models. Specifically it looks for:

  • layers that have the output_layers key in their config
  • layers that do not have the trainable key in their config (used to identify that this layer comes from a keras 2 model)
    • There may be better ways to identify than this, but at this point in the code, we only have access to the current layer config being processed

When a layer matches:

  • All output information is flattened to a single list
  • A check is performed to make sure that this list length is divisible by 3 (name, node_index, tensor_index)
    • if not, the output_layers are not touched, and we are left to fail downstream
  • The outputs are compiled into a single list of output information lists [[name, node_index, tensor_index], ... ]
  • A check is performed to make sure that all of the final output types conform to [str, int, int]
    • if not, the output_layers are not touched, and we are left to fail downstream
  • config["config"]["output_layers"] is updated with our new flattened list of outputs

Testing

Again, I am happy to implement a unit test, but as this function relies on a model created in Keras 2, but then opened in Keras 3, it is not altogether straightforward. Either a legacy model file would need to be generated, or a hard coded config will need to be added to the test (as per the gist). Neither of these approaches appear ideal, as they will only be built to test a very specific model structure.

Any feedback/update requirements are appreciated.

@fchollet
Copy link
Member

fchollet commented May 5, 2024

Thanks for the PR!

Effectively, Keras 2 allowed for nested outputs. whilst Keras 3 does not.

It is on our roadmap to re-allow nested inputs/outputs in Functional models. This would fix the issue. But in the meantime we can apply your fix (though we will have to remove it entirely once it is no longer necessary). Although, I am not sure how much work it is to re-allow nested inputs/outputs, perhaps doing it directly could be less work.

It is hard to create a repeatable gist for this one, as it relies on the model being created in Keras 2 and then being loaded in Keras 3

In such cases I would recommend attaching the saved file to a GitHub issue that contains the Keras 3 code snippet.

Again, I am happy to implement a unit test, but as this function relies on a model created in Keras 2, but then opened in Keras 3, it is not altogether straightforward

We try to avoid unit tests that depend on file assets. What I would recommend is to create a new integration test file legacy_model_loading.py in integration_tests/, and make it download (e.g. via keras.utils.get_file()) a saved file stored on GitHub. You could just use a GitHub issue attachment to store it. We'll reuse that new integration test file to add more cases in the future.

@torzdf
Copy link
Author

torzdf commented May 5, 2024

It is on our roadmap to re-allow nested inputs/outputs in Functional models. This would fix the issue. But in the meantime we can apply your fix (though we will have to remove it entirely once it is no longer necessary).

That's fine. It's just about getting it working in the meantime :)

We try to avoid unit tests that depend on file assets. What I would recommend is to create a new integration test file legacy_model_loading.py in integration_tests/, and make it download (e.g. via keras.utils.get_file()) a saved file stored on GitHub. You could just use a GitHub issue attachment to store it. We'll reuse that new integration test file to add more cases in the future.

I tend to agree on file assets. I will look to update with an issue and a unit test tomorrow

@torzdf
Copy link
Author

torzdf commented May 6, 2024

Ok, in setting up a test file, I have encountered an issue with this fix. It appears that the 'trainable' config key was added in Keras 2.12, so my test for identifying Keras 2 models is not effective.

I see 2 potential solutions for this:

  1. Just flatten outputs for all legacy models, regardless. This should not have any impact, as already flattened outputs will remain flattened (they will just get recompiled to be the same as they originally were)
  2. Move this code to where the legacy models are loaded and handle updating the config to a format supported by Keras 3 here.

My preference is 2 for a few reasons:

  • At this point in the code, we can get the actual Keras version that saved the model file direct from the .h5 attributes. No need to implement any kind of vague check based on dictionary structure
  • I alluded to there being several bugs in keras 2 -> keras 3 model loading. The other major bug is that inbound nodes from shared functional models need to be reduced by 1 (as per Fix loading nested Functional models from config #19509). I'll raise an issue/PR about this with more detail, when I have a better steer on the best approach, but in a nutshell, loading a Keras 2 model with shared Functional models leads to infinite recursion, and thus lock-up, within saving_utils.model_from_config. Unfortunately this issue cannot easily be fixed within model_from_config as we only have access to the current layer being processed, and whilst we can get the name of the inbound layers, we cannot get their class_name to identify whether the inbound is a functional model or not.

Given that we cannot solve for shared functional inputs in the most logical place (when deserializing the actual layers), it makes more sense to me to group all the required legacy updating together (at the point of loading from the .h5 file), by re-writing the config to a format that is supported by Keras 3. In this way it removes the requirement to handle branching logic for Keras 2/3 model files from the main model loading code, makes it trivial to identify a keras 2.x model and keeps all of the specific keras 2 -> 3 code together.

Also related:

Although, I am not sure how much work it is to re-allow nested inputs/outputs, perhaps doing it directly could be less work.

If it were me (it is not, and I am, obviously, far less knowledgeable about how things hang together than you)... Assuming that under the hood, it does not matter if outputs/inputs are nested or not and it is purely a mechanism to help the end user to make sense of their inputs/outputs (which appears to be the case), then I would let the end user nest inputs/outputs, but immediately flatten them when processing the model. In this way the proposed fix in this PR (when finalized) would handle the differences in serialization between Keras 2/3 and no further changes to how Keras 3 works with nested inputs/outputs would be required

@fchollet
Copy link
Member

fchollet commented May 6, 2024

I took a look at implementing support for arbitrary input/output structures, and as it turns out it should be pretty easy. I'll do that instead when I can find some time later today.

Note that this does not resolve the question of legacy model loading (I don't remember how they are structured). We'll need the integration tests in any case.

@fchollet
Copy link
Member

fchollet commented May 6, 2024

Done -- please check if it works for loading your model. IMO 50% chance it works. If it doesn't we'll look at the old saving format and special case it. In any case, we should add a legacy model loading integration test.

@torzdf
Copy link
Author

torzdf commented May 6, 2024

Done -- please check if it works for loading your model. IMO 50% chance it works. If it doesn't we'll look at the old saving format and special case it. In any case, we should add a legacy model loading integration test.

I'll check this when I get a chance....

I PR'd this first as it was the easier fix of the 2 that are roadblocking my project. Without being able to load shared functional models from Keras 2 into Keras 3 though, I can't even get to the point of unraveling/loading nested outputs...

As things stand, I have code in my project that re-writes Keras 2 config into a format that Keras 3 will load. It is not ideal, hence why I was looking to contribute this fix upstream into Keras itself.

I am more than happy to implement + PR integration tests for legacy model loading, and to handle Keras 2 shared functional model code (I already have the code written for my own project), however, the issue still remains as per my last post on the best place to implement it. Basically, I don't want to spend the time creating a PR to fix this bug, if it is ultimately rejected for being in the wrong place.

In the meantime, I will close this PR and my related issue as resolved. If I should press on and raise a PR to fix the Keras 2 Functional model issue, please do let me know, otherwise I'll rely on my downstream hackery.

@torzdf torzdf closed this May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants