Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diagnose & Speed-up Hypergraph tutorials #215

Closed
ninamiolane opened this issue Sep 19, 2023 · 10 comments
Closed

Diagnose & Speed-up Hypergraph tutorials #215

ninamiolane opened this issue Sep 19, 2023 · 10 comments
Assignees

Comments

@ninamiolane
Copy link
Collaborator

What?

Testing the tutorials on hypergraphs takes ~15 minutes, whereas testing the tutorials on other domains takes ~2-5 minutes (see screenshot).

There is probably one tutorial on hypergraphs that takes very long and slows down the whole github action workflow.

Find out which one and whether it can be accelerated.

Why?

A slow testing workflow slows down all the contributors, who have to wait for all tests to pass before being able to move on.

Image

@devendragovil
Copy link
Contributor

@ninamiolane

Analysis

I have analyzed the runtime for all unit tests. Hypergraph Tutorials indeed do take the longest durations. Please find the times of the longest 5 tests here:

Category Name Run Time (sec)
Hypergraph DHGCN. 208
Hypergraph Hypersage 176
Hypergraph UniGCNII. 81
Hypergraph UniGCN 42
Simplicial Scone 27

My observations:

  1. Individual test times are not that outrageous.
  2. It takes really long because all the tests are running sequentially

Deep Dive (DHGCN Tutorial)

All steps are taking reasonable amount of time (< 5 secs) except the last step which is a 5 epoch training run for the DHGCN Hypergraph TNN.

image

Observations

  1. Individual train times do seem reasonable to me (please correct me if I am wrong). These might speed up with GPU access
  2. Environment is built repeatedly. For tutorials the libraries are imported repeatedly.
  3. All tests (until recently) were being run sequentially.

Recommendations/Solutions

  1. We can arrange for GPU for the test-suite. I don't think Github actions provides a runner with GPU, we will need to arrange our own runner, which can be configured. However, configuration might be time consuming and hosting a GPU instance might be costly.
  2. We can do aggressive caching for our environment as well as libraries being imported.
  3. We can run tests in parallel. There are many libraries like pytest-xdist and pytest-split that enable this. Since Github Actions runners are single core, we can use the matrix strategy for parallelization. The tests can also be split based on the time they take to enable 5-7 (or as required) equally timed partitions. Since the longest test takes just over 3 minutes, that is the shortest time parallelization can achieve without making changes in tutorials themselves.
  4. A Naive Solution: Reducing number of epochs in tutorials. Reducing number of epochs in DHGCN from 5 to 1 reduces the time by a fifth, and DHGCN tutorial concludes within a minute.

@ninamiolane
Copy link
Collaborator Author

Excellent, thanks for the very detailed diagnosis. I agree with all your points and the solutions.

iv. I like the naive solution of reducing the number of epochs from 5 to 1, together with a comment in the text explaining that in real applications that number should be increased.
@devendragovil could you do this?

i-iii. These are awesome solutions, but would take more time. Maybe we can deprioritize them for now? (there are a lot of other tasks remaining).

@devendragovil
Copy link
Contributor

@ninamiolane
yes I can do this. I can also implement the 3rd solution as well, I was independently working on the same for some time, and should hopefully be able to do it by Sunday. Will that work if I implement the 3rd solution by Sunday?

@devendragovil
Copy link
Contributor

Independently of this issue, I also wanted to know if Sunday is a reasonable target to resolve all (or most in case of getting totally stuck in an issue) issues assigned to me?

@ninamiolane
Copy link
Collaborator Author

Even better if you can do iii as well, thanks for offering!

Sunday is a perfect target of deadline 💯 Thanks for your great and fast work.

@devendragovil
Copy link
Contributor

Thanks a lot!

@devendragovil
Copy link
Contributor

@ninamiolane I fell ill after my travel back from India last week, so couldn't meet the timeline that I gave earlier. Sorry for that! I will try to complete all the issues asap. Thanks a lot for your consideration.

@ninamiolane
Copy link
Collaborator Author

Thanks for the heads-up, and sorry to hear that you feel ill. Stay safe!

@ninamiolane
Copy link
Collaborator Author

@devendragovil any update on this?

@devendragovil
Copy link
Contributor

@ninamiolane Oh I am really sorry for the late response. I have raised a PR for this issue, run-times are now around 5.5-6 mins. I am stuck at one thing for a long time, it will help reduce overall run-time by 1-1.5 mins, but this PR helps reduce most of the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants