Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script to use classifier to predict against test #641

Merged
merged 1 commit into from Jul 27, 2018

Conversation

nlothian
Copy link

@sebastianruder
Copy link

Hi Nick, thanks! This is great! I'm just on the way back from a conference. Will test the code once I'm back.

@Varal7
Copy link

Varal7 commented Jul 26, 2018

Good idea!
I think, you can just use argmax instead of softmax for inference

@nlothian
Copy link
Author

@Varal7 this is true.

There's a whole set of questions on the forum about "what are these numbers which come back as classifier scores" and "why don't they add up to 1" etc. The answer is always softmax, and my thought was to include it to avoid all that confusion.

@jph00 jph00 merged commit 75e8ae0 into fastai:master Jul 27, 2018
@linbojin
Copy link

linbojin commented Sep 13, 2018

I think this code doesn't pad any "padding index" so the result will be different from which is computed by constructing DataLoader.

For my own test:

encoded text is: [3, 4, 5, 6, 303, 10, 11, 2, 73, 24, 72, 27, 15, 32, 46, 22, 16, 2, 25, 7]
I 
result: [-0.74961,  0.71659]

encoded text for dataloader:
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 4, 5, 6, 303, 10, 11, 2, 73, 24, 72, 27, 15, 32, 46, 22, 16, 2, 25, 7]
result : [ 0.31941, -0.15691]

@nlothian @sebastianruder @jph00 Any idea?

@nlothian
Copy link
Author

You could be right.

I remember looking at the padding issues some, but when I was testing it I don't think I found any difference. There's no difference in behavior here between the scripts and the notebook is there?

I don't have the code to retest that, and if you are seeing a difference then it is likely you are correct. I'm sure a patch would be welcomed!

@linbojin
Copy link

The problem is I can not determine how many 1s should i pad, and different 1s will give different results. It's confusing.

@sebastianruder
Copy link

Yeah, this is quite weird. I didn't look into this before as I thought we were already ignoring padding using pack_padded_sequence (see here for an explanation).
In the code, the padding is done here and the number of padding tokens is the difference of the current document length to the length of the largest document in the batch (see here).
As a short-term measure, for padding individual examples at test time, you could thus use something like the difference to the average document length in the training data.
In the mid-term, we should use pack_padded_sequence and pad_packed_sequence as described here.
cc @jph00 @sgugger

@ranihorev
Copy link

I created a PR to fix this issue.
#882 (comment)

@nlothian
Copy link
Author

Just bumping this because I think it's pretty important. @ranihorev's proposed changes broke a lot of the language model, so there should be a better way.

@sebastianruder suggestion of the short term fix of using difference to the average document length in the training data is difficult too, because that isn't actually recorded anywhere at the moment.

I follow the pack_padded_sequence post mentioned above up to trick 3, but I need to think about that some more.

What is the approach in FastAI 1.0 here? Is it fixed somehow?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants