Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notebook tutorial for the bindingdb_deepdta example #165

Merged
merged 40 commits into from Aug 13, 2021

Conversation

bobturneruk
Copy link
Collaborator

Fixes #164.

Description

Adds a notebook tutorial for the bindingdb_deepdta example.

Status

Work in progress

On the right (delete these after selection):

  • Select a reviewer if ready for review. Use the suggested one if unsure.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).

@bobturneruk bobturneruk added documentation Improvements or additions to documentation work-in-progress Work in progress that should NOT be merged labels Jun 29, 2021
@github-actions github-actions bot added this to In progress in v0.1.0 Jun 29, 2021
@codecov-commenter
Copy link

codecov-commenter commented Jun 29, 2021

Codecov Report

Merging #165 (336f32e) into main (6176de0) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #165   +/-   ##
=======================================
  Coverage   89.35%   89.35%           
=======================================
  Files          46       46           
  Lines        4405     4405           
=======================================
  Hits         3936     3936           
  Misses        469      469           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6176de0...336f32e. Read the comment docs.

@Schobs
Copy link
Member

Schobs commented Jul 15, 2021

RE the speed. Not sure if relevant but it may help. Here is some documentation for the dataloader: https://pytorch.org/docs/stable/data.html. If you go to "Single- and Multi-process Data Loading" section it talks about the num_workers parameter in the DataLoader. If the dataset is big changing this parameter can help.

@bobturneruk
Copy link
Collaborator Author

Thanks @Schobs!

@bobturneruk
Copy link
Collaborator Author

I think this is nearly ready (although performance may need improvement).

Couple of questions:

  • Does this tutorial need a seed to be set? I guess yes.
  • How do we best handle output / logger output?

cc @haipinglu , @pz-white

@bobturneruk
Copy link
Collaborator Author

@peizhenbai
Copy link
Collaborator

peizhenbai commented Jul 27, 2021

Hi @bobturneruk

Sorry for the late reply as I was traveling to UK and self-isolating last week. Regarding your questions:

  • Does this tutorial need a seed to be set? I guess yes.
    Indeed, the seed should be fixed on this example and I see it has been achieved in your code.
  • How do we best handle output/logger output?
    Not sure, do you mean the default TensorBoardLogger should be replaced in the tutorial? I chose the TBLogger as it is good to visualize the training loss at each epoch. Considering it has only several epochs need to run in tutorial, the csv_logger you used is better.

I like the deepdta tutorial, which is very concise and clear! Many thanks and it looks no problem from my side.

@peizhenbai
Copy link
Collaborator

See https://pypi.org/project/rdkit-pypi/

About the RDKit pip package, I remembered that Haiping and I tried to use it on test actions. However, it failed to be built into the environment. The pip package is unavailable and here is an instruction from RDKit author: http://rdkit.blogspot.com/2019/11/why-rdkit-isnt-available-on-pypi.html. If you find it actually works, that would be good and we can improve our test workflow again!

@haipinglu
Copy link
Member

See https://pypi.org/project/rdkit-pypi/

About the RDKit pip package, I remembered that Haiping and I tried to use it on test actions. However, it failed to be built into the environment. The pip package is unavailable and here is an instruction from RDKit author: http://rdkit.blogspot.com/2019/11/why-rdkit-isnt-available-on-pypi.html. If you find it actually works, that would be good and we can improve our test workflow again!

@pz-white I think we discussed rdkit-pypi before. That quoted instruction from the author was in 2019 and rdkit-pypi was released after that. I have looked into it with you months ago.

@bobturneruk
Copy link
Collaborator Author

I suggest we do not install rdkit via pip as it is not available on Windows (yet).

@bobturneruk
Copy link
Collaborator Author

@pz-white - I can't see that a seed is explicitly set - can you advise on where it is set, please, and I'll make a note to help new users?

@bobturneruk
Copy link
Collaborator Author

@pz-white - regarding the logger - our digits example creates csv output:

image

does / should this example do so too?

@peizhenbai
Copy link
Collaborator

peizhenbai commented Jul 29, 2021

Hi @bobturneruk

I suggest we do not install rdkit via pip as it is not available on Windows (yet).

I think so, the rdkit is suggested to be installed by conda-forge.

I can't see that a seed is explicitly set - can you advise on where it is set, please, and I'll make a note to help new users?

Sorry I made a mistake, the seed is not set in this example. It should be explicitly set with kale's set_seed() API, just like you did in digits'.

regarding the logger - our digits example creates csv output.

That's great, I think the csv output is enough and easily understand for new users.

@@ -0,0 +1 @@
tb_logs/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have ignore in just one under root for compactness.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bobturneruk Did you notice this above comment? I think it is not necessary to create separate gitignore. Is there a reason that this needs to be here rather than in the root?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@haipinglu
Copy link
Member

haipinglu commented Aug 12, 2021

@bobturneruk Can we have this PR merged today with a note on remaining issues? Otherwise, September will be the next time looking into this since you'll be on leave soon, and that will be too long (e.g. for our proposal and paper). Many thanks.

The priority should be making the Colab work at least.

@bobturneruk
Copy link
Collaborator Author

conda / environment.yml seems to work OK locally (windows 10).

@bobturneruk
Copy link
Collaborator Author

Modify:

!git clone https://github.com/pykale/pykale.git

to

!git clone -b bindingdb_deepdta_tutorial https://github.com/pykale/pykale.git 

to test on collab.

@bobturneruk bobturneruk marked this pull request as ready for review August 12, 2021 13:57
@bobturneruk
Copy link
Collaborator Author

Currently only works locally or on collab.

@haipinglu
Copy link
Member

I have the following error when running the notebook at https://colab.research.google.com/github/pykale/pykale/blob/bindingdb_deepdta_tutorial/examples/bindingdb_deepdta/tutorial.ipynb
What's the possible cause? Don't you have the same error?

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-6-94619e9e84fa> in <module>()
      2 
      3 cfg = get_cfg_defaults()
----> 4 cfg.merge_from_file(cfg_path)
      5 cfg.freeze()
      6 print(cfg)

/usr/local/lib/python3.7/dist-packages/yacs/config.py in merge_from_file(self, cfg_filename)
    209     def merge_from_file(self, cfg_filename):
    210         """Load a yaml config file and merge it this CfgNode."""
--> 211         with open(cfg_filename, "r") as f:
    212             cfg = self.load_cfg(f)
    213         self.merge_from_other_cfg(cfg)

FileNotFoundError: [Errno 2] No such file or directory: './configs/tutorial.yaml'

@bobturneruk
Copy link
Collaborator Author

Does #165 (comment) help?

@haipinglu
Copy link
Member

The above shows that adding notebook test will be useful, but not sure how to resolve the git clone path issue (before merging, the notebook is not in main)

@haipinglu
Copy link
Member

Does #165 (comment) help?

Yes, it helped. No problem now. I need to restart to rerun. Thanks.

Copy link
Member

@haipinglu haipinglu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bobturneruk. I've done one pass and it looks great except that we need @pz-white to confirm that the result (test loss>5) is reasonable and the algorithm is working as expected. Once we confirm that, it will be merged. Enjoy your holiday!

@haipinglu
Copy link
Member

@pz-white Another issue is whether we could make it faster. Now it takes 20mins or so.

@haipinglu haipinglu merged commit cd0ae42 into main Aug 13, 2021
v0.1.0 automation moved this from In progress to Done Aug 13, 2021
@haipinglu haipinglu deleted the bindingdb_deepdta_tutorial branch August 13, 2021 20:31
@github-actions github-actions bot mentioned this pull request Sep 10, 2021
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation work-in-progress Work in progress that should NOT be merged
Projects
No open projects
v0.1.0
  
Done
Development

Successfully merging this pull request may close these issues.

BindingDB DeepDTA Notebook
5 participants