Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provided model cannot be used with the new Tensorflow pickle loader. (module tensorflow.python.training.tracking missing) #67

Closed
latendresse opened this issue Dec 8, 2023 · 2 comments
Assignees
Labels
question Request for help or information

Comments

@latendresse
Copy link

Hello,

I tried using the trained model GNN_Edge_MLP_MoLeR__2022-02-24_07-16-23_best.pkl you provided. Trying to generate 10 molecules failed in trying to load it. It appears that the pkl version is no longer handled by the new Tensorflow version. Please, see Python trace below.

Do you have another trained model to try, or should we downgrade to an older version of Tensorflow?

Thank you,

-- Mario

molecule_generation sample /home/azureuser/molecule-generation/model 10
2023-12-08 01:49:50.037579: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions i
n performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
File "/home/azureuser/miniconda3/envs/moler-env/bin/molecule_generation", line 8, in
sys.exit(main())
File "/home/azureuser/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/cli/cli.py", line 35, in main
run_and_debug(lambda: commands[args.command].run_from_args(args), getattr(args, "debug", False))
File "/home/azureuser/miniconda3/envs/moler-env/lib/python3.10/site-packages/dpu_utils/utils/debughelper.py", line 21, in run_and_debug
func()
File "/home/azureuser/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/cli/cli.py", line 35, in
run_and_debug(lambda: commands[args.command].run_from_args(args), getattr(args, "debug", False))
File "/home/azureuser/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/cli/sample.py", line 30, in run_from_args
print_samples(
File "/home/azureuser/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/cli/sample.py", line 13, in print_samples
with load_model_from_directory(model_dir, **model_kwargs) as model:
File "/home/azureuser/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/wrapper.py", line 187, in load_model_from_directory
model_class = get_model_class(ModelWrapper._get_model_file(model_dir))
File "/home/azureuser/miniconda3/envs/moler-env/lib/python3.10/site-packages/molecule_generation/utils/model_utils.py", line 74, in get_model_class
data_to_load = pickle.load(in_file)
ModuleNotFoundError: No module named 'tensorflow.python.training.tracking'

@kmaziarz
Copy link
Collaborator

kmaziarz commented Dec 8, 2023

Yes, there is a backward compatibility issue introduced around tensorflow 2.14. I wanted to take a look at handling this, but for now I would recommend downgrading. The newest tensorflow versions are fishy anyway, as I've recently found that starting with 2.10 there are memory leaks in certain scenarios (in MoLeR's case, they seem to appear when repeatedly encoding a lot of molecules in a loop). Neither the compatibility issues nor the potential leaks happen in 2.9, so I would recommend that version unless you really need to use a newer one.

@kmaziarz kmaziarz self-assigned this Dec 8, 2023
@kmaziarz kmaziarz added the question Request for help or information label Dec 8, 2023
kmaziarz added a commit that referenced this issue Dec 12, 2023
This PR addresses two sources of memory leaks apparent when repeatedly
encoding many molecules in a loop, both originating from `tensorflow`:
- First, there is a very mild leak, caused by `tensorflow` not fully
cleaning up some of its internals, which appears across many
`tensorflow` versions.
- Second, there is also a bigger leak introduced in `tensorflow` vesion
`2.10`.

The first issue is addressed by manually clearing
`_py_funcs_used_in_graph`, while for the second I temporarily pin the
supported `tensorflow` version to `<2.10`, awaiting the issue to be
fixed upstream. The pin also avoids backward compatibility problems that
start to appear in `2.14` and prevent the pretrained checkpoint from
being loaded (see #67).
@kmaziarz
Copy link
Collaborator

kmaziarz commented Jan 4, 2024

Closing as we now require tensorflow<2.10 (due to memory leaks that got introduced in 2.10).

@kmaziarz kmaziarz closed this as completed Jan 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Request for help or information
Projects
None yet
Development

No branches or pull requests

2 participants