Skip to content

fix: dynamic oracle loading (sklearn version)#168

Merged
futianfan merged 8 commits into
mims-harvard:mainfrom
jannisborn:sklearn_versioning
Aug 21, 2022
Merged

fix: dynamic oracle loading (sklearn version)#168
futianfan merged 8 commits into
mims-harvard:mainfrom
jannisborn:sklearn_versioning

Conversation

@jannisborn
Copy link
Copy Markdown
Contributor

@jannisborn jannisborn commented Aug 3, 2022

Addresses #163

This is a draft PR with a rough sketch. Let's discuss here :)

  • Draft PR for dynamic loading of oracles dependent on sklearn version.
  • Idea: Determine the sklearn version at runtime. Then, load the correct version of the oracle, i.e., the one that was trained with the sklearn version that the current user has. Disadvantage: Users with different sklearn version will see slightly different predictions for the same Oracle-smiles pair.
  • The idea would be to just append a _sklearn>024 to the filename of the original model on your COS
  • This should be minimally invasive as it is perfectly backwards compatible and does not have side effects

TODO:

  • I do not know the codebase well enough to see whether this sklearn versioning problem might arise in other places of the package as well.

@jannisborn
Copy link
Copy Markdown
Contributor Author

jannisborn commented Aug 3, 2022

@kexinhuang12345 @futianfan

  • I tried to merge the work from [upgrading oracle] drd2, jnk3, gsk3b oracle is compatible with scikit… #167 into this PR.
  • I debugged with 3 sklearn versions, 0.23.0 (which was stable already before), 0.24.2 (broken up to now) and a very recent release (1.0.2 also broken up to now).
  • It seems functional in all cases, that's really great :)
  • I only debugged with the GSK3 oracle
  • only drawback that I see now is that if a user downloads an oracle and THEN upgrades sklearn, an error will occur. This could be prevented by saving the models locally with different names such as gsk3_legacy and gsk3_current. But not sure it's worth it - up to you
  • I decided to update the model loading logic with the new method load_pickled_model since it removes redundant code

For me, it looks like this is problem is solved. Thanks for the support & maintenannce

@jannisborn jannisborn marked this pull request as ready for review August 3, 2022 18:38
@futianfan
Copy link
Copy Markdown
Collaborator

futianfan commented Aug 20, 2022

@jannisborn
Hi Jannis, i create a new branch that merge #167 (my version) and #168 (your version). Please see details at https://github.com/mims-harvard/TDC/tree/sklearn_merge_168_167
we hope to use two oracle files: jnk3.pkl (<=0.23 sklearn) and jnk3_current.pkl (>0.23 sklearn). It would be great if you can take a look. If you find it is ok, can you use the new code to submit PR again? thanks so much for your contribution!

@jannisborn
Copy link
Copy Markdown
Contributor Author

Hi @futianfan
Should be done now! you can delete your sklearn_merge_168_167 branch, it's integrated here

@futianfan futianfan merged commit c8f596b into mims-harvard:main Aug 21, 2022
@jannisborn
Copy link
Copy Markdown
Contributor Author

Thanks! it would be really great if you could release a new version :)
Then we could update our TdC dependency to that version and relax sklearn

@jannisborn jannisborn deleted the sklearn_versioning branch August 21, 2022 08:07
@kexinhuang12345
Copy link
Copy Markdown
Collaborator

Thanks Jannis! We do plan to release a version in the next week

@jannisborn
Copy link
Copy Markdown
Contributor Author

HI @kexinhuang12345, any updates on this?

@kexinhuang12345
Copy link
Copy Markdown
Collaborator

Hi Jannis, sorry for the delay. We were planning to release it with #161 but there is some data upload issue holding the schedule back. Do you need it urgently? if so, we can release it under 0.3.8 today - let me know!

@jannisborn
Copy link
Copy Markdown
Contributor Author

jannisborn commented Sep 5, 2022

Hi Kexin, thanks for the rapid feedback. No it's not super urgent, we were hoping to integrate some more TdC oracles (the ones affected by this PR) into our library and we're getting a bit behind schedule. I dont want to cause you more work, we just prefer installing from PyPI rather than from source, so having a new release anytime soon would be cool :)

@kexinhuang12345
Copy link
Copy Markdown
Collaborator

I see that makes sense! how about we will wait for another two days? On Wednesday if the data issue is still not resolved, we will release a new version first. Hope this would work!

@jannisborn
Copy link
Copy Markdown
Contributor Author

Cool, thanks a lot!

@kexinhuang12345
Copy link
Copy Markdown
Collaborator

Hey Jannis, a quick heads up, TDC is just updated to 0.3.7!

@jannisborn
Copy link
Copy Markdown
Contributor Author

Thanks a ton Kexin, much appreciated 👍🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants