Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: dynamic oracle loading (sklearn version) #168

Merged
merged 8 commits into from
Aug 21, 2022
Merged

fix: dynamic oracle loading (sklearn version) #168

merged 8 commits into from
Aug 21, 2022

Conversation

jannisborn
Copy link
Contributor

@jannisborn jannisborn commented Aug 3, 2022

Addresses #163

This is a draft PR with a rough sketch. Let's discuss here :)

  • Draft PR for dynamic loading of oracles dependent on sklearn version.
  • Idea: Determine the sklearn version at runtime. Then, load the correct version of the oracle, i.e., the one that was trained with the sklearn version that the current user has. Disadvantage: Users with different sklearn version will see slightly different predictions for the same Oracle-smiles pair.
  • The idea would be to just append a _sklearn>024 to the filename of the original model on your COS
  • This should be minimally invasive as it is perfectly backwards compatible and does not have side effects

TODO:

  • I do not know the codebase well enough to see whether this sklearn versioning problem might arise in other places of the package as well.

@jannisborn
Copy link
Contributor Author

jannisborn commented Aug 3, 2022

@kexinhuang12345 @futianfan

  • I tried to merge the work from [upgrading oracle] drd2, jnk3, gsk3b oracle is compatible with scikit… #167 into this PR.
  • I debugged with 3 sklearn versions, 0.23.0 (which was stable already before), 0.24.2 (broken up to now) and a very recent release (1.0.2 also broken up to now).
  • It seems functional in all cases, that's really great :)
  • I only debugged with the GSK3 oracle
  • only drawback that I see now is that if a user downloads an oracle and THEN upgrades sklearn, an error will occur. This could be prevented by saving the models locally with different names such as gsk3_legacy and gsk3_current. But not sure it's worth it - up to you
  • I decided to update the model loading logic with the new method load_pickled_model since it removes redundant code

For me, it looks like this is problem is solved. Thanks for the support & maintenannce

@futianfan
Copy link
Collaborator

futianfan commented Aug 20, 2022

@jannisborn
Hi Jannis, i create a new branch that merge #167 (my version) and #168 (your version). Please see details at https://github.com/mims-harvard/TDC/tree/sklearn_merge_168_167
we hope to use two oracle files: jnk3.pkl (<=0.23 sklearn) and jnk3_current.pkl (>0.23 sklearn). It would be great if you can take a look. If you find it is ok, can you use the new code to submit PR again? thanks so much for your contribution!

@jannisborn
Copy link
Contributor Author

Hi @futianfan
Should be done now! you can delete your sklearn_merge_168_167 branch, it's integrated here

@futianfan futianfan merged commit c8f596b into mims-harvard:main Aug 21, 2022
@jannisborn
Copy link
Contributor Author

Thanks! it would be really great if you could release a new version :)
Then we could update our TdC dependency to that version and relax sklearn

@jannisborn jannisborn deleted the sklearn_versioning branch August 21, 2022 08:07
@kexinhuang12345
Copy link
Collaborator

Thanks Jannis! We do plan to release a version in the next week

@jannisborn
Copy link
Contributor Author

HI @kexinhuang12345, any updates on this?

@kexinhuang12345
Copy link
Collaborator

Hi Jannis, sorry for the delay. We were planning to release it with #161 but there is some data upload issue holding the schedule back. Do you need it urgently? if so, we can release it under 0.3.8 today - let me know!

@jannisborn
Copy link
Contributor Author

jannisborn commented Sep 5, 2022

Hi Kexin, thanks for the rapid feedback. No it's not super urgent, we were hoping to integrate some more TdC oracles (the ones affected by this PR) into our library and we're getting a bit behind schedule. I dont want to cause you more work, we just prefer installing from PyPI rather than from source, so having a new release anytime soon would be cool :)

@kexinhuang12345
Copy link
Collaborator

I see that makes sense! how about we will wait for another two days? On Wednesday if the data issue is still not resolved, we will release a new version first. Hope this would work!

@jannisborn
Copy link
Contributor Author

Cool, thanks a lot!

@kexinhuang12345
Copy link
Collaborator

Hey Jannis, a quick heads up, TDC is just updated to 0.3.7!

@jannisborn
Copy link
Contributor Author

Thanks a ton Kexin, much appreciated 👍🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants