-
-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Digits notebook #149
Digits notebook #149
Conversation
Currently invoked with:
output
It's taking a very long time to run (>half an hour) even with a small number of epochs. |
Codecov Report
@@ Coverage Diff @@
## main #149 +/- ##
=======================================
Coverage 88.08% 88.08%
=======================================
Files 44 44
Lines 4156 4156
=======================================
Hits 3661 3661
Misses 495 495 Continue to review full report at Codecov.
|
To reduce further
Possible to do a profiling to understand the most costly part? We may brainstorm more ideas on Thursday too. |
Profiling indicates this takes about an hour to run on my PC as is, so I'll look at reducing number of classes. |
Is the envisioned approach to extract train and test sets from source and target, subset, then rebuild the datasetaccess objects for subsequent use e.g. at
Maybe I'm misunderstanding or not explaining myself well! |
Yes, the line above get the Oh, this question helps me spot another way to reduce the computational cost, i.e. setting |
examples/digits_dann_lightn/main.py
Outdated
target, | ||
config_weight_type=cfg.DATASET.WEIGHT_TYPE, | ||
config_size_type=cfg.DATASET.SIZE_TYPE, | ||
val_split_ratio=0.5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to change it via yaml config rather than hardcode, e.g., cfg.DATASET.VAL_RATIO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To get it running first, you can then config to cfg.DATASET.VAL_RATIO=0.9 in the fast version, but still keeping the default 0.1 as the default in config.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I was confused by it having a different name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would DATASET.NUM_REPEAT
be another target to reduce?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I was confused by it having a different name.
What does your "it" refer to? VAL_SPLIT_RATIO? I was hoping to make it more compact but if that causes confusion, just use cfg.DATASET.VAL_SPLIT_RATIO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would
DATASET.NUM_REPEAT
be another target to reduce?
Oh, I forgot mention that. That's right and we are almost there!
That should be the first to reduce. Just change it to 1 and you will get 10x improvement. No need to repeat for notebook. Then the notebook should only take minutes to run for 10 classes, and maybe 1 min for 2 classes.
If you think flexible subset is good, we can also make the number of classes in subset a configurable variable.
Got it down to about 5 minutes on my machine. I'll put an notebook together and see how long it takes in myBinder without further modification.
|
Currently getting this error locally and on myBinder:
Related: Lightning-AI/pytorch-lightning#2547 Colab needs work so it clones the whole repo. |
Did it work previously (when you said down to 5min)? |
" progress_bar_refresh_rate=cfg.OUTPUT.PB_FRESH, # in steps\n", | ||
" min_epochs=cfg.SOLVER.MIN_EPOCHS,\n", | ||
" max_epochs=cfg.SOLVER.MAX_EPOCHS,\n", | ||
" checkpoint_callback=checkpoint_callback,\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is your Pytorch Lightning>=1.3.0? Could you try replacing checkpoint_callback=checkpoint_callback
with callbacks=[checkpoint_callback],
. It probably can fix the above Attribute Error. Reference: https://pytorch-lightning.readthedocs.io/en/stable/common/weights_loading.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sz144 - I'll check this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has helped and the notebook now runs, with Pytorch Lightning>=1.3.0 specified in setup.py
for extras.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@haipinglu we need check our previous examples for compatibility with the latest PyTorch Lightning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we could pin a specific earlier version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, this should be covered by tests
rather than manual check. Of course examples are not covered by tests but I think good tests should do this. The priority should be the API, then if possible we could cover the examples. But I do not think that this is the priority now.
If we found any compatibility issues, we can have respective tests to cover them or discuss how to prevent that from happening in future.
8e99097
to
613f46d
Compare
I suspect the tests are currently failing due to data download issues, but I'm not 100% sure. |
Resolves #148 |
I've linked it on the right to automate. See the above. |
|
Thanks @haipinglu:
I'll prioritise cutting down the data and increase the epochs to get myBinder to perform better, unless I hear from you that you'd prefer me to look again at colab to deliver a speedup without altering the data, perhaps returning to myBinder afterwards. |
Yes, let's get myBinder running first. For Colab, we can deal with it, maybe learn from some other successful examples to see how they did it. Thanks! |
Can't currently get it to work |
I did not see it used by other ML/pytorch packages so I think we should stick to colab and/or binder, given the two weeks left till the deadline. |
@bobturneruk : @mustafa1728 has got the 2-class version done, pending integration. We can discuss later today. |
Saw the notebook - looks good @mustafa1728! |
I've learned a bit about pytorch from reading it! |
Thanks @bobturneruk. Glad to be of help! |
Now runs (at least partially) on both Colab and myBinder without user needing to comment anything in or out. No progress bar in either (but works locally). Maybe related: Lightning-AI/pytorch-lightning#1112 |
97960a1
to
87c763f
Compare
@haipinglu - I think this is now pretty close. Remaining issues from my perspective are:
|
Add a link to wiki will be good. Then follow the top summary in wiki to describe: https://en.wikipedia.org/wiki/Domain_adaptation
Much more bearable with Colab. Yes, pbar is helpful. No need to profile now.
OK.
I saw 3-class got 44% >random 33%. I tried 2 classes just 55% marginally > random 50. Tried 2 class with 0.1 split: still just 55.7%. We'd better tweak to a good acc.
Many thanks. We can target to do merging in today's meeting. |
@bobturneruk It will be useful (for us as well as users) to record and display the time taken for training, testing, and overall. |
Can you share an example, please, @haipinglu? |
https://stackoverflow.com/questions/1557571/how-do-i-get-time-of-a-python-programs-execution |
The button, not the timing. Timing probably a bit different in IPython. Speak soon! |
Associated with #147
Description
Idea is to test the feasibility of adding an interactive notebook e.g. with myBinder or Google Collab (or both).
Status
Work in progress
On the right (delete these after selection):
Types of changes