-
Notifications
You must be signed in to change notification settings - Fork 19.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix: Flatten nested outputs in legacy keras2 .h5 models #19669
Conversation
Thanks for the PR!
It is on our roadmap to re-allow nested inputs/outputs in Functional models. This would fix the issue. But in the meantime we can apply your fix (though we will have to remove it entirely once it is no longer necessary). Although, I am not sure how much work it is to re-allow nested inputs/outputs, perhaps doing it directly could be less work.
In such cases I would recommend attaching the saved file to a GitHub issue that contains the Keras 3 code snippet.
We try to avoid unit tests that depend on file assets. What I would recommend is to create a new integration test file |
That's fine. It's just about getting it working in the meantime :)
I tend to agree on file assets. I will look to update with an issue and a unit test tomorrow |
Ok, in setting up a test file, I have encountered an issue with this fix. It appears that the 'trainable' config key was added in Keras 2.12, so my test for identifying Keras 2 models is not effective. I see 2 potential solutions for this:
My preference is 2 for a few reasons:
Given that we cannot solve for shared functional inputs in the most logical place (when deserializing the actual layers), it makes more sense to me to group all the required legacy updating together (at the point of loading from the .h5 file), by re-writing the config to a format that is supported by Keras 3. In this way it removes the requirement to handle branching logic for Keras 2/3 model files from the main model loading code, makes it trivial to identify a keras 2.x model and keeps all of the specific keras 2 -> 3 code together. Also related:
If it were me (it is not, and I am, obviously, far less knowledgeable about how things hang together than you)... Assuming that under the hood, it does not matter if outputs/inputs are nested or not and it is purely a mechanism to help the end user to make sense of their inputs/outputs (which appears to be the case), then I would let the end user nest inputs/outputs, but immediately flatten them when processing the model. In this way the proposed fix in this PR (when finalized) would handle the differences in serialization between Keras 2/3 and no further changes to how Keras 3 works with nested inputs/outputs would be required |
I took a look at implementing support for arbitrary input/output structures, and as it turns out it should be pretty easy. I'll do that instead when I can find some time later today. Note that this does not resolve the question of legacy model loading (I don't remember how they are structured). We'll need the integration tests in any case. |
Done -- please check if it works for loading your model. IMO 50% chance it works. If it doesn't we'll look at the old saving format and special case it. In any case, we should add a legacy model loading integration test. |
I'll check this when I get a chance.... I PR'd this first as it was the easier fix of the 2 that are roadblocking my project. Without being able to load shared functional models from Keras 2 into Keras 3 though, I can't even get to the point of unraveling/loading nested outputs... As things stand, I have code in my project that re-writes Keras 2 config into a format that Keras 3 will load. It is not ideal, hence why I was looking to contribute this fix upstream into Keras itself. I am more than happy to implement + PR integration tests for legacy model loading, and to handle Keras 2 shared functional model code (I already have the code written for my own project), however, the issue still remains as per my last post on the best place to implement it. Basically, I don't want to spend the time creating a PR to fix this bug, if it is ultimately rejected for being in the wrong place. In the meantime, I will close this PR and my related issue as resolved. If I should press on and raise a PR to fix the Keras 2 Functional model issue, please do let me know, otherwise I'll rely on my downstream hackery. |
There are several bugs when loading legacy Keras 2.x models into Keras 3 with slightly esoteric layouts. This particular bug is the 'easy' one to patch.
The problem
Effectively, Keras 2 allowed for nested outputs. whilst Keras 3 does not. When creating the following model:
This model successfully builds in Keras 2, but (correctly) does not in Keras 3:
The impact of this, when loading a legacy Keras 2 .h5 model with nested outputs into Keras 3 is the following:
It is hard to create a repeatable gist for this one, as it relies on the model being created in Keras 2 and then being loaded in Keras 3. Unfortunately Colab scrubs the environment when you look to upgrade keras, losing our legacy saved .h5 file. The attached gist includes the output of
model.get_config()
in Keras 2.x to demonstrate the issue: https://colab.research.google.com/drive/1w3ZPOl5YsuBXobIATakgLVKFnRlCE53e?usp=sharingThe Fix
The fix is implemented in the code that is in place to load legacy models. Specifically it looks for:
output_layers
key in their configtrainable
key in their config (used to identify that this layer comes from a keras 2 model)When a layer matches:
name
,node_index
,tensor_index
)output_layers
are not touched, and we are left to fail downstream[[name, node_index, tensor_index], ... ]
[str, int, int]
output_layers
are not touched, and we are left to fail downstreamconfig["config"]["output_layers"]
is updated with our new flattened list of outputsTesting
Again, I am happy to implement a unit test, but as this function relies on a model created in Keras 2, but then opened in Keras 3, it is not altogether straightforward. Either a legacy model file would need to be generated, or a hard coded config will need to be added to the test (as per the gist). Neither of these approaches appear ideal, as they will only be built to test a very specific model structure.
Any feedback/update requirements are appreciated.