Skip to content

Conversation

vlomonaco
Copy link

@vlomonaco vlomonaco commented Nov 23, 2017

Hi! This PR would add support to the CORe50 dataset, recently published @ CoRL2017.

I've also added a simple report hook function (credit to Shichao) in the Utils.py file for showing up the status of the download which can be useful for big Datasets.

Note: This implementation it's not super efficient since similar to the ImageFolder class but good for small RAM devices. I plan to update the code with a more efficient pre-loading strategy in the near future.

Copy link
Contributor

@alykhantejani alykhantejani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vlomonaco,

I haven't had time to check the PR fully yet, but instead of adding a report hook function to utils why not use tqdm (if available)?

i.e.

try:
    from tqdm import tqdm
except ImportError:
    tqdm = lambda x:x

Then you can just wrap the loop in tqdm

You can also add something to the README/docstring about optional packages for torchvision (to enhance usability but not required to use it)

@vlomonaco
Copy link
Author

vlomonaco commented Dec 1, 2017

Hi @alykhantejani! Thank you for the feedback!

Maybe I misunderstood your suggestion but using tqdm with urllib is not that straightforward. You still have to create a report hook function since there's no external loop in a urllib request (see an example here).

Moreover, with just a 10-lines function we can provide download details also to users who don't have tqdm installed. Your call!

@vlomonaco
Copy link
Author

vlomonaco commented May 30, 2019

Will this PR ever get merged? It's been pending for more than a year now...

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long delay in reviewing!

I've made some comments, let me know what you think.

Also, do you think it would be possible to add some tests to the dataset loading, similar to #976 ?

root = self.root

# Downloading the dataset and filelists
for name in (self.img_size, 'filelists'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have recently added functionality to extract files in datasets/utils.py, can you use those instead?

'batches_filelists.zip': 'e3297508a8998ba0c99a83d6b36bde62'
}

def __init__(self, root, check_integrity=True, scenario='ni', train=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a check_integrity argument in the constructor, and it should by default do the integrity check

@@ -18,6 +20,20 @@ def check_integrity(fpath, md5):
return True


def reporthook(count, block_size, total_size):
global start_time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think start_time is defined anywhere in this file.

Let's remove this functional altogether for now, and maybe send another PR adding it if necessary?

the test set).

"""
ntrain_batch = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing, as it's not used (with this name) anywhere in the code, and is referenced as max-batch in the documentation.

If you want to keep this around, I think you should add some asserts in the __init__ checking that batch is within ntrain_batch


if train:
self.fpath = os.path.join(
scenario.upper() + '_' + suffix, 'run' + str(run),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe something like

'{}_{}'.format(scenario.upper(), suffix), 'run{}'.format(run), 
'train_batch_{:02d}_filelist.txt'.format(batch)

path = os.path.join(self.root, self.filenames['filelists'][:-4],
self.fpath)
with open(path, 'r') as f:
for i, line in enumerate(f):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the i is not used. Maybe you could use f.readlines() instead?

@pmeier pmeier self-assigned this Apr 8, 2022
rajveerb pushed a commit to rajveerb/vision that referenced this pull request Nov 30, 2023
* Update README.md

* Update README.md

* Update README.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants