Notebook tutorial for the bindingdb_deepdta example #165

bobturneruk · 2021-06-29T13:16:54Z

Fixes #164.

Description

Adds a notebook tutorial for the bindingdb_deepdta example.

Status

Work in progress

On the right (delete these after selection):

Select a reviewer if ready for review. Use the suggested one if unsure.

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).

bobturneruk · 2021-06-29T13:31:06Z

Colab: https://colab.research.google.com/github/pykale/pykale/blob/bindingdb_deepdta_tutorial/examples/bindingdb_deepdta/tutorial.ipynb

Binder: https://mybinder.org/v2/gh/pykale/pykale/bindingdb_deepdta_tutorial?filepath=examples%2Fbindingdb_deepdta%2Ftutorial.ipynb

codecov-commenter · 2021-06-29T13:41:02Z

Codecov Report

Merging #165 (336f32e) into main (6176de0) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main     #165   +/-   ##
=======================================
  Coverage   89.35%   89.35%           
=======================================
  Files          46       46           
  Lines        4405     4405           
=======================================
  Hits         3936     3936           
  Misses        469      469

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6176de0...336f32e. Read the comment docs.

Schobs · 2021-07-15T13:30:43Z

RE the speed. Not sure if relevant but it may help. Here is some documentation for the dataloader: https://pytorch.org/docs/stable/data.html. If you go to "Single- and Multi-process Data Loading" section it talks about the num_workers parameter in the DataLoader. If the dataset is big changing this parameter can help.

bobturneruk · 2021-07-15T13:31:37Z

Thanks @Schobs!

bobturneruk · 2021-07-21T12:03:03Z

I think this is nearly ready (although performance may need improvement).

Couple of questions:

Does this tutorial need a seed to be set? I guess yes.
How do we best handle output / logger output?

cc @haipinglu , @pz-white

bobturneruk · 2021-07-22T13:56:46Z

See https://pypi.org/project/rdkit-pypi/

peizhenbai · 2021-07-27T19:24:11Z

Hi @bobturneruk

Sorry for the late reply as I was traveling to UK and self-isolating last week. Regarding your questions:

Does this tutorial need a seed to be set? I guess yes.
Indeed, the seed should be fixed on this example and I see it has been achieved in your code.
How do we best handle output/logger output?
Not sure, do you mean the default TensorBoardLogger should be replaced in the tutorial? I chose the TBLogger as it is good to visualize the training loss at each epoch. Considering it has only several epochs need to run in tutorial, the csv_logger you used is better.

I like the deepdta tutorial, which is very concise and clear! Many thanks and it looks no problem from my side.

peizhenbai · 2021-07-27T19:31:12Z

See https://pypi.org/project/rdkit-pypi/

About the RDKit pip package, I remembered that Haiping and I tried to use it on test actions. However, it failed to be built into the environment. The pip package is unavailable and here is an instruction from RDKit author: http://rdkit.blogspot.com/2019/11/why-rdkit-isnt-available-on-pypi.html. If you find it actually works, that would be good and we can improve our test workflow again!

haipinglu · 2021-07-27T20:56:01Z

See https://pypi.org/project/rdkit-pypi/

About the RDKit pip package, I remembered that Haiping and I tried to use it on test actions. However, it failed to be built into the environment. The pip package is unavailable and here is an instruction from RDKit author: http://rdkit.blogspot.com/2019/11/why-rdkit-isnt-available-on-pypi.html. If you find it actually works, that would be good and we can improve our test workflow again!

@pz-white I think we discussed rdkit-pypi before. That quoted instruction from the author was in 2019 and rdkit-pypi was released after that. I have looked into it with you months ago.

bobturneruk · 2021-07-29T08:55:24Z

I suggest we do not install rdkit via pip as it is not available on Windows (yet).

bobturneruk · 2021-07-29T08:58:40Z

@pz-white - I can't see that a seed is explicitly set - can you advise on where it is set, please, and I'll make a note to help new users?

bobturneruk · 2021-07-29T09:00:37Z

@pz-white - regarding the logger - our digits example creates csv output:

does / should this example do so too?

peizhenbai · 2021-07-29T10:56:09Z

Hi @bobturneruk

I suggest we do not install rdkit via pip as it is not available on Windows (yet).

I think so, the rdkit is suggested to be installed by conda-forge.

I can't see that a seed is explicitly set - can you advise on where it is set, please, and I'll make a note to help new users?

Sorry I made a mistake, the seed is not set in this example. It should be explicitly set with kale's set_seed() API, just like you did in digits'.

regarding the logger - our digits example creates csv output.

That's great, I think the csv output is enough and easily understand for new users.

haipinglu · 2021-08-11T10:10:26Z

examples/bindingdb_deepdta/.gitignore

@@ -0,0 +1 @@
+tb_logs/*


We should have ignore in just one under root for compactness.

@bobturneruk Did you notice this above comment? I think it is not necessary to create separate gitignore. Is there a reason that this needs to be here rather than in the root?

haipinglu · 2021-08-12T07:39:50Z

@bobturneruk Can we have this PR merged today with a note on remaining issues? Otherwise, September will be the next time looking into this since you'll be on leave soon, and that will be too long (e.g. for our proposal and paper). Many thanks.

The priority should be making the Colab work at least.

bobturneruk · 2021-08-12T09:50:37Z

conda / environment.yml seems to work OK locally (windows 10).

bobturneruk · 2021-08-12T13:51:30Z

Modify:

!git clone https://github.com/pykale/pykale.git

to

!git clone -b bindingdb_deepdta_tutorial https://github.com/pykale/pykale.git

to test on collab.

bobturneruk · 2021-08-12T13:57:56Z

Currently only works locally or on collab.

haipinglu · 2021-08-12T14:16:38Z

I have the following error when running the notebook at https://colab.research.google.com/github/pykale/pykale/blob/bindingdb_deepdta_tutorial/examples/bindingdb_deepdta/tutorial.ipynb
What's the possible cause? Don't you have the same error?

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-6-94619e9e84fa> in <module>()
      2 
      3 cfg = get_cfg_defaults()
----> 4 cfg.merge_from_file(cfg_path)
      5 cfg.freeze()
      6 print(cfg)

/usr/local/lib/python3.7/dist-packages/yacs/config.py in merge_from_file(self, cfg_filename)
    209     def merge_from_file(self, cfg_filename):
    210         """Load a yaml config file and merge it this CfgNode."""
--> 211         with open(cfg_filename, "r") as f:
    212             cfg = self.load_cfg(f)
    213         self.merge_from_other_cfg(cfg)

FileNotFoundError: [Errno 2] No such file or directory: './configs/tutorial.yaml'

bobturneruk · 2021-08-12T14:17:55Z

Does #165 (comment) help?

haipinglu · 2021-08-12T14:32:25Z

The above shows that adding notebook test will be useful, but not sure how to resolve the git clone path issue (before merging, the notebook is not in main)

haipinglu · 2021-08-12T20:19:41Z

Does #165 (comment) help?

Yes, it helped. No problem now. I need to restart to rerun. Thanks.

haipinglu

Thanks @bobturneruk. I've done one pass and it looks great except that we need @pz-white to confirm that the result (test loss>5) is reasonable and the algorithm is working as expected. Once we confirm that, it will be merged. Enjoy your holiday!

haipinglu · 2021-08-13T12:38:41Z

@pz-white Another issue is whether we could make it faster. Now it takes 20mins or so.

bobturneruk added documentation Improvements or additions to documentation work-in-progress Work in progress that should NOT be merged labels Jun 29, 2021

github-actions bot added this to In progress in v0.1.0 Jun 29, 2021

bobturneruk force-pushed the bindingdb_deepdta_tutorial branch from 830c820 to 252240d Compare July 9, 2021 11:38

bobturneruk force-pushed the bindingdb_deepdta_tutorial branch from 04085fe to 8c13ea9 Compare July 21, 2021 11:30

bobturneruk force-pushed the bindingdb_deepdta_tutorial branch from 8ca9a3a to 14350ae Compare August 5, 2021 11:16

haipinglu reviewed Aug 11, 2021

View reviewed changes

bobturneruk added 9 commits August 12, 2021 09:45

pip -> conda for rdkit in tutorials

8f300ab

empty notebook

55c21c3

copy paste error

574c793

conda on colab

c5bd850

increase chance to run on colab

5101335

better use of conda on colab

75d5cec

colab save problem

ee0e822

absolute path

044dfcc

clone specific branch (for now)

fbf92e8

correct colab cd

55a88c8

drop myBinder support (for now)

f94b838

bobturneruk marked this pull request as ready for review August 12, 2021 13:57

bobturneruk requested review from haipinglu and peizhenbai August 12, 2021 13:57

haipinglu added 4 commits August 12, 2021 22:14

Merge branch 'main' into bindingdb_deepdta_tutorial

ead96d0

remove local gitignore

06a1140

Update notebook description in doc

d395264

revise notebook

017738c

haipinglu reviewed Aug 12, 2021

View reviewed changes

haipinglu added 2 commits August 12, 2021 23:16

minor correction and refinement

0a2232b

Syn updates in notebook

ac0ad80

peizhenbai and others added 4 commits August 13, 2021 18:43

decrase running time by sampling a subset

9fb8b09

fix seed

e44cf1b

Improve description

b5a2019

More precise description

336f32e

haipinglu enabled auto-merge August 13, 2021 19:43

haipinglu approved these changes Aug 13, 2021

View reviewed changes

haipinglu merged commit cd0ae42 into main Aug 13, 2021

v0.1.0 automation moved this from In progress to Done Aug 13, 2021

haipinglu deleted the bindingdb_deepdta_tutorial branch August 13, 2021 20:31

github-actions bot mentioned this pull request Sep 10, 2021

Release 0.1.0rc3 #216

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notebook tutorial for the bindingdb_deepdta example #165

Notebook tutorial for the bindingdb_deepdta example #165

bobturneruk commented Jun 29, 2021

bobturneruk commented Jun 29, 2021 •

edited

codecov-commenter commented Jun 29, 2021 •

edited

Schobs commented Jul 15, 2021

bobturneruk commented Jul 15, 2021

bobturneruk commented Jul 21, 2021

bobturneruk commented Jul 22, 2021

peizhenbai commented Jul 27, 2021 •

edited

peizhenbai commented Jul 27, 2021

haipinglu commented Jul 27, 2021

bobturneruk commented Jul 29, 2021

bobturneruk commented Jul 29, 2021

bobturneruk commented Jul 29, 2021

peizhenbai commented Jul 29, 2021 •

edited

haipinglu Aug 11, 2021

haipinglu Aug 12, 2021

bobturneruk Aug 12, 2021

haipinglu Aug 12, 2021

haipinglu commented Aug 12, 2021 •

edited

bobturneruk commented Aug 12, 2021

bobturneruk commented Aug 12, 2021

bobturneruk commented Aug 12, 2021

haipinglu commented Aug 12, 2021

bobturneruk commented Aug 12, 2021

haipinglu commented Aug 12, 2021

haipinglu commented Aug 12, 2021

haipinglu left a comment

haipinglu commented Aug 13, 2021

		@@ -0,0 +1 @@
		tb_logs/*

Notebook tutorial for the bindingdb_deepdta example #165

Notebook tutorial for the bindingdb_deepdta example #165

Conversation

bobturneruk commented Jun 29, 2021

Description

Status

Types of changes

bobturneruk commented Jun 29, 2021 • edited

codecov-commenter commented Jun 29, 2021 • edited

Codecov Report

Schobs commented Jul 15, 2021

bobturneruk commented Jul 15, 2021

bobturneruk commented Jul 21, 2021

bobturneruk commented Jul 22, 2021

peizhenbai commented Jul 27, 2021 • edited

peizhenbai commented Jul 27, 2021

haipinglu commented Jul 27, 2021

bobturneruk commented Jul 29, 2021

bobturneruk commented Jul 29, 2021

bobturneruk commented Jul 29, 2021

peizhenbai commented Jul 29, 2021 • edited

haipinglu Aug 11, 2021

Choose a reason for hiding this comment

haipinglu Aug 12, 2021

Choose a reason for hiding this comment

bobturneruk Aug 12, 2021

Choose a reason for hiding this comment

haipinglu Aug 12, 2021

Choose a reason for hiding this comment

haipinglu commented Aug 12, 2021 • edited

bobturneruk commented Aug 12, 2021

bobturneruk commented Aug 12, 2021

bobturneruk commented Aug 12, 2021

haipinglu commented Aug 12, 2021

bobturneruk commented Aug 12, 2021

haipinglu commented Aug 12, 2021

haipinglu commented Aug 12, 2021

haipinglu left a comment

Choose a reason for hiding this comment

haipinglu commented Aug 13, 2021

bobturneruk commented Jun 29, 2021 •

edited

codecov-commenter commented Jun 29, 2021 •

edited

peizhenbai commented Jul 27, 2021 •

edited

peizhenbai commented Jul 29, 2021 •

edited

haipinglu commented Aug 12, 2021 •

edited