-
Notifications
You must be signed in to change notification settings - Fork 28
Add support for batched Iteration #102
Conversation
93edac5 to
4b968ac
Compare
aac3f46 to
dda9c6c
Compare
* iter_batches function in Dataset class returns a BatchLoader object * BatchLoader class added * Moved utils.py * Renamed utils.py * Created test_data.py * Cleanup * Fix Typo
dda9c6c to
57325b4
Compare
bfineran
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice refactor @rahul-tuli
going to sync offline about the single input case
Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>
Address:PR review comments
bfineran
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, just one comment for the code. @rahul-tuli for tests/sparsezoo/utils.py can we rename it to helpers.py and then we need to change every instance of from tests.sparsezoo.utils import ... to from tests.sparsezoo.helpers import ...
Fix:Unwrapping Single Input Errors
36e6fc6 to
477191e
Compare
bfineran
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending tests!
Fix:Unwrapping Single Input Errors Update:tests_data.py
1c4114b to
f7f3857
Compare
src/sparsezoo/utils/data.py
Outdated
| iterations: int, | ||
| ): | ||
| self._data = data | ||
| self._single_input = type(self._data[0]) is numpy.ndarray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would still be true if someone passed in a List[numpy.ndarray]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want it to be True in that case, maybe inverting the condition and renaming the variable will make more sense, check latest push.
caveat: this is to distinguish b/w [[np.ndarray]] and [np.ndarray]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we wrap inputs of the form List[np.array] in an extra outer list in the initializer, then unwrap it just before yielding a batch
src/sparsezoo/utils/data.py
Outdated
| if self._single_input: | ||
| batch = batch[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does starting with a single input produce a single batch? this is ignoring the batch_size parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It produces more batches, we are unwrapping an extra outer list here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, if the dataset is of numpy arrays, we want to return batches of numpy arrays. this unwraps the array batch from a list as part of other shared logic
mgoin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks Rahul
This pull request adds support for batched iteration in sparsezoo's Dataset class