-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: ILSVRC 2010 (a.k.a. ImageNet) #68
Conversation
Is ImageNet colloquial enough that we could use it as the name of the dataset without causing too much confusion? |
Well, as test data for later year competitions become available, we'd like My hope is that a lot of the same code will be applicable to other On Thu, Apr 9, 2015 at 3:33 PM, vdumoulin notifications@github.com wrote:
|
@bartvm @vdumoulin I've got this converter into a near workable state now, and could use some eyes on it suggesting refactorings before I go and try to add unit tests. I've adopted this kwargs-based approach to debug logging that I quite like, and think we could use a similar thing (or maybe a virtually identical thing) for passing information to a progress bar via logging calls. |
You should eventually think of moving the code to In order to support kwargs in the converter itself, we should eventually refactor converter functions to use subparsers like downloader functions do. I can review that if you choose to make it part of your PR. |
Oh, nevermind, I just saw that this was already on your TODO list. |
Yes, though I have a slight refactor I'll propose for the dispatched function, something like from argparse import Namespace
from functools import wraps
def accept_namespace(func):
@wraps(func)
def wrapped(*args, **kwargs):
if len(kwargs) == 0 and len(args) == 1 and isinstance(args[0], Namespace):
return func(**vars(args[0]))
else:
return func(*args, **kwargs)
return wrapped That way you can write functions like this @accepts_namespace
def foo(a, b, c):
... and use it as a dispatch function for a subparser with matching argument names, while still having both a function that can be used programmatically without awkwardly instantiating a namespace and also can be documented more naturally. |
I like that idea! |
4e7023d
to
7c9765f
Compare
@vdumoulin I've rewritten the history to put this file in fuel/converters/ (git filter-branch is neat). I also realized I hadn't included an entire file. |
xrange(1000))) | ||
|
||
# Mapping to take ILSVRC2010 (integer) IDs to our internal 0-999 encoding. | ||
# label_map = dict(zip(synsets['ILSVRC2010_ID'], xrange(1000))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it still useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I haven't pushed in a while but it is necessary for the valid and test
sets (yes, they distribute the ground truth for the valid and test sets
differently).
On Apr 20, 2015 1:39 PM, "vdumoulin" notifications@github.com wrote:
In fuel/converters/ilsvrc2010.py
#68 (comment):
The number of images the workers should send to the sink at a
time.
- """
- debug = make_debug_logging_function(log, 'MAIN')
Read what's necessary from the development kit.
- devkit_path = os.path.join(directory, DEVKIT_ARCHIVE)
- synsets, cost_matrix, raw_valid_groundtruth = read_devkit(devkit_path)
Mapping to take WordNet IDs to our internal 0-999 encoding.
- wnid_map = dict(zip((s.decode('utf8') for s in synsets['WNID']),
xrange(1000)))
Mapping to take ILSVRC2010 (integer) IDs to our internal 0-999 encoding.
label_map = dict(zip(synsets['ILSVRC2010_ID'], xrange(1000)))
Is it still useful?
—
Reply to this email directly or view it on GitHub
https://github.com/bartvm/fuel/pull/68/files#r28710617.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it looks like you're building that later on (ilsvrc_id_to_zero_based
, lines 123-124).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vdumoulin A very rough guide to ZeroMQ:
|
debug = make_debug_logging_function(log, 'VENTILATOR') | ||
debug(status='START') | ||
sender = context.socket(zmq.PUSH) | ||
sender.hwm = high_water_mark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use the sender.set_hwm
setter instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hwm
is a property which invokes set_hwm
anyway, so no real difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know!
@dwf I've gone through a superficial review of docstrings. I'm still reading the code to give it an in-depth review. |
connected on `logging_port`. | ||
|
||
""" | ||
logger = logging.getLogger(__name__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need logger
as argument when you immediately assign it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. 37e5f2b.
@vdumoulin I just pushed a pretty major update. Lots of new docstrings, and better factored. |
The number of worker processes to deploy. | ||
worker_batch_size : int, optional | ||
The number of images the workers should send to the sink at a | ||
time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add documentation for output_filename
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c99dcaa
to
c05e7c2
Compare
@dwf and I did an offline test converting the ImageNet dataset, and it works! Coverage is down 1.2%, but considering the importance of this PR, I'm ready to merge right away. |
WIP: ILSVRC 2010 (a.k.a. ImageNet)
Putting this up so that the interested can have a look. In particular looking for some feedback from @vdumoulin, as well as @bartvm for the right place we could split this up in order to speed it up with multiple processes.