New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notebook tutorial for the bindingdb_deepdta example #165
Conversation
Codecov Report
@@ Coverage Diff @@
## main #165 +/- ##
=======================================
Coverage 89.35% 89.35%
=======================================
Files 46 46
Lines 4405 4405
=======================================
Hits 3936 3936
Misses 469 469 Continue to review full report at Codecov.
|
830c820
to
252240d
Compare
RE the speed. Not sure if relevant but it may help. Here is some documentation for the dataloader: https://pytorch.org/docs/stable/data.html. If you go to "Single- and Multi-process Data Loading" section it talks about the num_workers parameter in the DataLoader. If the dataset is big changing this parameter can help. |
Thanks @Schobs! |
04085fe
to
8c13ea9
Compare
I think this is nearly ready (although performance may need improvement). Couple of questions:
cc @haipinglu , @pz-white |
Hi @bobturneruk Sorry for the late reply as I was traveling to UK and self-isolating last week. Regarding your questions:
I like the deepdta tutorial, which is very concise and clear! Many thanks and it looks no problem from my side. |
About the RDKit pip package, I remembered that Haiping and I tried to use it on test actions. However, it failed to be built into the environment. The pip package is unavailable and here is an instruction from RDKit author: http://rdkit.blogspot.com/2019/11/why-rdkit-isnt-available-on-pypi.html. If you find it actually works, that would be good and we can improve our test workflow again! |
@pz-white I think we discussed rdkit-pypi before. That quoted instruction from the author was in 2019 and rdkit-pypi was released after that. I have looked into it with you months ago. |
I suggest we do not install rdkit via pip as it is not available on Windows (yet). |
@pz-white - I can't see that a seed is explicitly set - can you advise on where it is set, please, and I'll make a note to help new users? |
Hi @bobturneruk
I think so, the rdkit is suggested to be installed by conda-forge.
Sorry I made a mistake, the seed is not set in this example. It should be explicitly set with kale's set_seed() API, just like you did in digits'.
That's great, I think the csv output is enough and easily understand for new users. |
8ca9a3a
to
14350ae
Compare
@@ -0,0 +1 @@ | |||
tb_logs/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have ignore in just one under root for compactness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bobturneruk Did you notice this above comment? I think it is not necessary to create separate gitignore
. Is there a reason that this needs to be here rather than in the root?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@bobturneruk Can we have this PR merged today with a note on remaining issues? Otherwise, September will be the next time looking into this since you'll be on leave soon, and that will be too long (e.g. for our proposal and paper). Many thanks. The priority should be making the Colab work at least. |
conda / |
Modify:
to
to test on collab. |
Currently only works locally or on collab. |
I have the following error when running the notebook at https://colab.research.google.com/github/pykale/pykale/blob/bindingdb_deepdta_tutorial/examples/bindingdb_deepdta/tutorial.ipynb ---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-6-94619e9e84fa> in <module>()
2
3 cfg = get_cfg_defaults()
----> 4 cfg.merge_from_file(cfg_path)
5 cfg.freeze()
6 print(cfg)
/usr/local/lib/python3.7/dist-packages/yacs/config.py in merge_from_file(self, cfg_filename)
209 def merge_from_file(self, cfg_filename):
210 """Load a yaml config file and merge it this CfgNode."""
--> 211 with open(cfg_filename, "r") as f:
212 cfg = self.load_cfg(f)
213 self.merge_from_other_cfg(cfg)
FileNotFoundError: [Errno 2] No such file or directory: './configs/tutorial.yaml' |
Does #165 (comment) help? |
The above shows that adding notebook test will be useful, but not sure how to resolve the git clone path issue (before merging, the notebook is not in main) |
Yes, it helped. No problem now. I need to restart to rerun. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @bobturneruk. I've done one pass and it looks great except that we need @pz-white to confirm that the result (test loss>5) is reasonable and the algorithm is working as expected. Once we confirm that, it will be merged. Enjoy your holiday!
@pz-white Another issue is whether we could make it faster. Now it takes 20mins or so. |
Fixes #164.
Description
Adds a notebook tutorial for the bindingdb_deepdta example.
Status
Work in progress
On the right (delete these after selection):
Types of changes