Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rebase for checking the coverage tests #45

Merged
merged 9 commits into from
Mar 25, 2015

Conversation

mducoffe
Copy link

No description provided.

@bartvm
Copy link
Member

bartvm commented Mar 16, 2015

Just FYI, no need to open a new PR after rebasing. After you've rebased you can force push to your remote by using push -f origin CCW_Transformer.

@mducoffe
Copy link
Author

@bartvm I have checked the coverage, the method get_example in Transformer was never tested because Batch was calling get_data all the time instead of get_data_from_example.
I have a question also : is Cache use example stream only or batch stream is okay ?

else:
return self.get_data_from_example(request)

def get_data_from_example(self, request=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartvm , are the methods get_data_from_example and get_data_from_batch a part of the plan? I do not remember those being discussed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is something I discussed with @mducoffe by e-mail. Let me just copy-paste the rationale. They're really just renames of the methods we had for these reasons:

I've made some comments, but I realised that the naming is awkward though... Right now get_example and get_batch refer to what the output is (as their name suggests) but the difference between get_example and get_batch should really depend on what the input is. Technically speaking you could have a transformer that can take either an example or a batch as an input, but will output a batch in both cases. The methods will still need to be different though.

A very artificial example, but consider a transformer which outputs batches of size N. Its behaviour changes depending on the input, but the output is always the same:

class BatchOfNExamples(Transformer):
    def get_data_from_batch(self):
        batch = []
        while len(batch) < N:
            batch += next(self.child_epoch_iterator)[:N - len(batch)]
        return batch

    def get_data_from_example(self):
        return [next(self.child_epoch_iterator) for _ in range(N)]

Maybe a good idea would then be to rename get_batch to get_data_from_batch and get_example to get_data_from_example (means you'll have to swap them around for Unpack and Batch), and change the keyword argument from batch to batch_input to make clear that it refers to what the input is, not to what the output is. (I hope you're familiar with your text editor's renaming function...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should say this is getting quite involved, which names with 3 underscores kind of suggest. I have to run now, but I will keep thinking how to make things simpler here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really exactly the same as it was! It's just that get_batch has been renamed to get_data_from_batch to make clear that get_batch doesn't return a batch, but that it expects a batch as input. Another naming suggestion is welcome, but I find things like get_from_batch, from_batch unclear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is tempting to propose transform_batch and transform_example that look nice with the Transformer name, but those do not reflect that multiple input entries can make one output entry and vice versa.

I do not mind merging it as it to keep workflow fast, but I will create an issue because such names do not really invite to quickly write your own Transformer.

@mducoffe
Copy link
Author

@bartvm done =)


def get_data_from_example(self, request=None):
raise NotImplementedError(
"`{}` does not support examples as inputs, '\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python actually automatically concatenates strings within parentheses, so you can do:

raise NotImplementedError(
    "`{}` does not support examples as inputs, "
    "but `batch_input` was set to `False`".format(type(self))
)

@mducoffe
Copy link
Author

@bartvm Is it okay for merging ?

@bartvm
Copy link
Member

bartvm commented Mar 25, 2015

Sorry, I got really confused at one point because I made comments in the old PR and didn't think you addressed them, but looks like you did in this PR... There's some other stuff that seems to have reverted to an old version though :/

batch_input : boolean
Specification whether the input stream is on example or batch

batch : boolean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how this got reintroduced, but we now have batch_input and batch in the docstring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After that, ping me and I'll press the merge button straight away, I promise!

@mducoffe
Copy link
Author

@bartvm =)

bartvm added a commit that referenced this pull request Mar 25, 2015
rebase for checking the coverage tests
@bartvm bartvm merged commit 52b3be2 into mila-iqia:master Mar 25, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants