-
Notifications
You must be signed in to change notification settings - Fork 7.1k
CORe50 Dataset + reporthook for download in Utils #340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @vlomonaco,
I haven't had time to check the PR fully yet, but instead of adding a report hook function to utils why not use tqdm (if available)?
i.e.
try:
from tqdm import tqdm
except ImportError:
tqdm = lambda x:x
Then you can just wrap the loop in tqdm
You can also add something to the README/docstring about optional packages for torchvision (to enhance usability but not required to use it)
Hi @alykhantejani! Thank you for the feedback! Maybe I misunderstood your suggestion but using Moreover, with just a 10-lines function we can provide download details also to users who don't have |
Will this PR ever get merged? It's been pending for more than a year now... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the long delay in reviewing!
I've made some comments, let me know what you think.
Also, do you think it would be possible to add some tests to the dataset loading, similar to #976 ?
root = self.root | ||
|
||
# Downloading the dataset and filelists | ||
for name in (self.img_size, 'filelists'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have recently added functionality to extract files in datasets/utils.py
, can you use those instead?
'batches_filelists.zip': 'e3297508a8998ba0c99a83d6b36bde62' | ||
} | ||
|
||
def __init__(self, root, check_integrity=True, scenario='ni', train=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need a check_integrity
argument in the constructor, and it should by default do the integrity check
@@ -18,6 +20,20 @@ def check_integrity(fpath, md5): | |||
return True | |||
|
|||
|
|||
def reporthook(count, block_size, total_size): | |||
global start_time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think start_time
is defined anywhere in this file.
Let's remove this functional altogether for now, and maybe send another PR adding it if necessary?
the test set). | ||
|
||
""" | ||
ntrain_batch = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit confusing, as it's not used (with this name) anywhere in the code, and is referenced as max-batch
in the documentation.
If you want to keep this around, I think you should add some asserts in the __init__
checking that batch
is within ntrain_batch
|
||
if train: | ||
self.fpath = os.path.join( | ||
scenario.upper() + '_' + suffix, 'run' + str(run), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe something like
'{}_{}'.format(scenario.upper(), suffix), 'run{}'.format(run),
'train_batch_{:02d}_filelist.txt'.format(batch)
path = os.path.join(self.root, self.filenames['filelists'][:-4], | ||
self.fpath) | ||
with open(path, 'r') as f: | ||
for i, line in enumerate(f): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the i
is not used. Maybe you could use f.readlines()
instead?
* Update README.md * Update README.md * Update README.md
Hi! This PR would add support to the CORe50 dataset, recently published @ CoRL2017.
I've also added a simple report hook function (credit to Shichao) in the Utils.py file for showing up the status of the download which can be useful for big Datasets.
Note: This implementation it's not super efficient since similar to the ImageFolder class but good for small RAM devices. I plan to update the code with a more efficient pre-loading strategy in the near future.