Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help in finetuning generator #21

Closed
gmseabra opened this issue May 17, 2022 · 7 comments · Fixed by #30
Closed

Help in finetuning generator #21

gmseabra opened this issue May 17, 2022 · 7 comments · Fixed by #30
Assignees
Labels
enhancement New feature or request

Comments

@gmseabra
Copy link

Hi,

I'm trying an experiment, to fine-tune the generator with a small set of molecules with specific properties, (so it will generate new molecules with similar properties) but I'm running into some errors that I have been unable to solve. I'd really appreciate if anyone could shed some light into what I'm doing wrong.

What I'm doing:

  1. From a set of 10 molecules, split into 80:10:10 for train:valid:test. Put into folder finetune_moler/input.
  2. Run MoLeR in pre-process mode:
    $ molecule_generation preprocess finetune_moler/input finetune_moler/output finetune_moler/trace
  3. Then try to finetune the pre-trained model provided with the small set of molecules above: ```
molecule_generation train MoLeR finetune_moler/trace \
				--load-saved-model ./PRETRAINED_MODEL/GNN_Edge_MLP_MoLeR__2022-02-24_07-16-23_best.pkl \
				--load-weights-only \
				--save-dir finetune_moler/tuned_model

The pre-process step seems to run just fine. But in the fine-tuning step, I'm getting the following error:

(dumps a lot of informational messages)
Traceback (most recent call last):
  File "/opt/miniconda3/envs/moler-env/bin/molecule_generation", line 33, in <module>
    sys.exit(load_entry_point('molecule-generation', 'console_scripts', 'molecule_generation')())
  File "/home/seabra/work/source/repos/microsoft/molecule-generation/molecule_generation/cli/cli.py", line 35, in main
    run_and_debug(lambda: commands[args.command].run_from_args(args), getattr(args, "debug", False))
  File "/opt/miniconda3/envs/moler-env/lib/python3.10/site-packages/dpu_utils/utils/debughelper.py", line 21, in run_and_debug
    func()
  File "/home/seabra/work/source/repos/microsoft/molecule-generation/molecule_generation/cli/cli.py", line 35, in <lambda>
    run_and_debug(lambda: commands[args.command].run_from_args(args), getattr(args, "debug", False))
  File "/home/seabra/work/source/repos/microsoft/molecule-generation/molecule_generation/cli/train.py", line 140, in run_from_args
    loaded_model_dataset = training_utils.get_model_and_dataset(
  File "/opt/miniconda3/envs/moler-env/lib/python3.10/site-packages/tf2_gnn/cli_utils/model_utils.py", line 319, in get_model_and_dataset
    load_weights_verbosely(trained_model_file, model)
  File "/opt/miniconda3/envs/moler-env/lib/python3.10/site-packages/tf2_gnn/cli_utils/model_utils.py", line 148, in load_weights_verbosely
    K.batch_set_value(tfvar_weight_tuples)
  File "/opt/miniconda3/envs/moler-env/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/opt/miniconda3/envs/moler-env/lib/python3.10/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 911, in assign
    raise ValueError(
ValueError: Cannot assign value to variable ' decoder/node_categorical_features_embedding/categorical_features_embedding:0': Shape mismatch.The variable shape (98, 64), and the assigned value shape (166, 64) are incompatible.

Could someone point me to what I'm doing wrong? Would it be possible to get an example of successfully fine-tuning the model?

Thanks a lot!
Gustavo.

@kmaziarz kmaziarz self-assigned this May 17, 2022
@kmaziarz
Copy link
Collaborator

The steps you tried are quite reasonable! The missing ingredient is that preprocessing computes various vocabularies of motif/atom types based on the data, which affects the shapes of some layers in the model, but those shapes are already fixed if a pretrained model is loaded, hence the shape mismatch (intuitively, this is saying that the number of motif/atom types found in your finetuning dataset was smaller than the number of types found in the original dataset, which is expected).

So, one would have to tell preprocessing to use the vocabularies from the pretrained checkpoint instead of computing them afresh. This isn't supported in the current code (we did briefly experiment with fine-tuning, but not enough for this to end up in the release), but shouldn't be hard to add. I'll hack something together this week and then share with you as a branch; once you test it out I can then make a PR and merge it into main.

@kmaziarz kmaziarz added the enhancement New feature or request label May 19, 2022
@gmseabra
Copy link
Author

Oh, I see, I get now.

Thank you so much!

@kmaziarz
Copy link
Collaborator

kmaziarz commented May 20, 2022

@gmseabra: can you pull kmaziarz/finetuning, re-install the package from there, and try fine-tuning again?

The only change to the workflow you described would be passing --pretrained-model-path when doing preprocessing. However, note that by default molecule_generation train will do validation every 5000 steps, and wait until there is no improvement on the validation dataset. If you're fine-tuning on a small set of molecules, it may make sense to set this to something lower (so that training has a chance to stop before it overfits) and/or limit the total number of such rounds of validation. For example, passing

--model-params-override '{"num_train_steps_between_valid": 50}' --max-epochs 8

means that you will do some multiple of 50 steps, at most 8 * 50, but possibly less if validation stops improving.

Let me know how this goes!

@kmaziarz
Copy link
Collaborator

@gmseabra Did you have any luck with fine-tuning?

@kmaziarz kmaziarz linked a pull request Aug 12, 2022 that will close this issue
@MeilinaR
Copy link

Hi! I've been trying to replicate this example with the steps you provided above, where I try to finetune it on a small set of 3K molecules, I still encounter the following error (running everything on Colab now). I just took the existing checkpoint, and I wanted to finetune it to a smaller set:

!molecule_generation preprocess input output trace
--pretrained-model-path --load-saved-model /content/drive/MyDrive/subset_gpu_finetuning/moler/molecule-generation/best_model/GNN_Edge_MLP_MoLeR__2022-02-24_07-16-23_best.pkl

!molecule_generation train MoLeR trace
--model-params-override '{"num_train_steps_between_valid": 50}' --max-epochs 8
--load-saved-model /content/drive/MyDrive/subset_gpu_finetuning/moler/molecule-generation/best_model/GNN_Edge_MLP_MoLeR__2022-02-24_07-16-23_best.pkl
--load-weights-only

But even when aligning the metadata that the model was originally trained with (I took the same Guacamol files), it still didn't want to run. This is the error I encountered:

Traceback (most recent call last):
  File "/usr/local/bin/molecule_generation", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/site-packages/molecule_generation/cli/cli.py", line 35, in main
    run_and_debug(lambda: commands[args.command].run_from_args(args), getattr(args, "debug", False))
  File "/usr/local/lib/python3.10/site-packages/dpu_utils/utils/debughelper.py", line 21, in run_and_debug
    func()
  File "/usr/local/lib/python3.10/site-packages/molecule_generation/cli/cli.py", line 35, in <lambda>
    run_and_debug(lambda: commands[args.command].run_from_args(args), getattr(args, "debug", False))
  File "/usr/local/lib/python3.10/site-packages/molecule_generation/cli/train.py", line 140, in run_from_args
    loaded_model_dataset = training_utils.get_model_and_dataset(
  File "/usr/local/lib/python3.10/site-packages/tf2_gnn/cli_utils/model_utils.py", line 319, in get_model_and_dataset
    load_weights_verbosely(trained_model_file, model)
  File "/usr/local/lib/python3.10/site-packages/tf2_gnn/cli_utils/model_utils.py", line 148, in load_weights_verbosely
    K.batch_set_value(tfvar_weight_tuples)
  File "/usr/local/lib/python3.10/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/usr/local/lib/python3.10/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 1022, in assign
    raise ValueError(
ValueError: Cannot assign value to variable ' decoder/node_type_selector/MLP_final_layer/kernel:0': Shape mismatch.The variable shape (256, 142), and the assigned value shape (256, 167) are incompatible.

I'm not sure whether I did something wrong, and where I could fix it!

@kmaziarz
Copy link
Collaborator

kmaziarz commented Oct 9, 2023

Hi @MeilinaR, sorry for the silence, I forgot to respond to this one. Are you still having issues? I noticed the command you pasted passes --load-saved-model to preprocess; is this a typo?

@kmaziarz kmaziarz reopened this Oct 9, 2023
@kmaziarz
Copy link
Collaborator

kmaziarz commented Jan 4, 2024

Closing due to lack of activity.

@kmaziarz kmaziarz closed this as completed Jan 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants