-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rebase for checking the coverage tests #45
Conversation
Just FYI, no need to open a new PR after rebasing. After you've rebased you can force push to your remote by using |
@bartvm I have checked the coverage, the method get_example in Transformer was never tested because Batch was calling get_data all the time instead of get_data_from_example. |
else: | ||
return self.get_data_from_example(request) | ||
|
||
def get_data_from_example(self, request=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bartvm , are the methods get_data_from_example
and get_data_from_batch
a part of the plan? I do not remember those being discussed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is something I discussed with @mducoffe by e-mail. Let me just copy-paste the rationale. They're really just renames of the methods we had for these reasons:
I've made some comments, but I realised that the naming is awkward though... Right now get_example and get_batch refer to what the output is (as their name suggests) but the difference between get_example and get_batch should really depend on what the input is. Technically speaking you could have a transformer that can take either an example or a batch as an input, but will output a batch in both cases. The methods will still need to be different though.
A very artificial example, but consider a transformer which outputs batches of size N. Its behaviour changes depending on the input, but the output is always the same:
class BatchOfNExamples(Transformer): def get_data_from_batch(self): batch = [] while len(batch) < N: batch += next(self.child_epoch_iterator)[:N - len(batch)] return batch def get_data_from_example(self): return [next(self.child_epoch_iterator) for _ in range(N)]Maybe a good idea would then be to rename get_batch to get_data_from_batch and get_example to get_data_from_example (means you'll have to swap them around for Unpack and Batch), and change the keyword argument from
batch
tobatch_input
to make clear that it refers to what the input is, not to what the output is. (I hope you're familiar with your text editor's renaming function...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should say this is getting quite involved, which names with 3 underscores kind of suggest. I have to run now, but I will keep thinking how to make things simpler here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's really exactly the same as it was! It's just that get_batch
has been renamed to get_data_from_batch
to make clear that get_batch
doesn't return a batch, but that it expects a batch as input. Another naming suggestion is welcome, but I find things like get_from_batch
, from_batch
unclear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is tempting to propose transform_batch
and transform_example
that look nice with the Transformer
name, but those do not reflect that multiple input entries can make one output entry and vice versa.
I do not mind merging it as it to keep workflow fast, but I will create an issue because such names do not really invite to quickly write your own Transformer
.
@bartvm done =) |
|
||
def get_data_from_example(self, request=None): | ||
raise NotImplementedError( | ||
"`{}` does not support examples as inputs, '\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python actually automatically concatenates strings within parentheses, so you can do:
raise NotImplementedError(
"`{}` does not support examples as inputs, "
"but `batch_input` was set to `False`".format(type(self))
)
@bartvm Is it okay for merging ? |
Sorry, I got really confused at one point because I made comments in the old PR and didn't think you addressed them, but looks like you did in this PR... There's some other stuff that seems to have reverted to an old version though :/ |
batch_input : boolean | ||
Specification whether the input stream is on example or batch | ||
|
||
batch : boolean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how this got reintroduced, but we now have batch_input
and batch
in the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After that, ping me and I'll press the merge button straight away, I promise!
@bartvm =) |
rebase for checking the coverage tests
No description provided.