-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added sort stream #336
added sort stream #336
Changes from 8 commits
4c3536d
5b9dbe1
1e971d3
2672efc
919bede
8140900
0c3e61f
95fbcd8
dcf8b22
a96dc6c
9f2f5b6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -314,6 +314,33 @@ def _cache(self): | |
cache.extend(data) | ||
|
||
|
||
class SortMapping(object): | ||
"""Callable class for creating sorting mappings. | ||
|
||
This class can be used to create a callable that can be used by the | ||
:class:`DataStreamMapping` constructor. | ||
|
||
Parameters | ||
---------- | ||
key : callable | ||
The mapping that returns the value to sort on. Its input will be | ||
a tuple that contains a single data point for each source. | ||
reverse : boolean value that indicates whether the sort order should | ||
be reversed. | ||
|
||
""" | ||
def __init__(self, key, reverse=False): | ||
self.key = key | ||
self.reverse = reverse | ||
|
||
def __call__(self, x): | ||
values = [self.key(i) for i in zip(*x)] | ||
indices = [i for (v, i) in | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry of being picky, but sorting indices is not really Pythonic. In Python copying a reference to an object, i.e. def __call__(self, batch):
with_keys = [(example, self.key(example)) for example in zip(*batch)]
with_keys.sort(key=lambda _, key : key)
return [example for example, _ in with_keys] Another thing is precomputing keys. In general it is a nice thing to do, but I quickly checked and seems like def __call__(self, batch):
return list(sorted(zip(*batch), key=self.key, reverse=self.reverse)) Did I miss anything? |
||
sorted(((v, i) for (i, v) in enumerate(values)), | ||
reverse=self.reverse)] | ||
return tuple([[i[j] for j in indices] for i in x]) | ||
|
||
|
||
class BatchDataStream(DataStreamWrapper): | ||
"""Creates minibatches from data streams providing single examples. | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you still need these two assignments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it might be handy for debugging to see which arguments were used to construct the stream but they are not necessary. I could also make the
SortMapping
instance an attribute instead because it contains the same information.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A user can always access it through e.g.
stream.mapping.key
right? Becausestream.mapping
is a class instance in this case, so I'd just get rid of it, but your call.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I think about it, all this class now does is apply the mapper. Perhaps it's cleaner to use the
DataStreamMapping
directly and have a couple of common mappings available in the module. What do you think?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Less SLOCs sounds good to me. We'll just need to make sure to add the mappings to the
See also
section ofDataStreamMapping
or something so that people will find them, but that can go on the lengthy doc to-do list if you want.