This repository has been archived by the owner on Nov 22, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 801
Dense Feature Normalization Pre-Processing #859
Closed
rohanpritchard
wants to merge
1
commit into
facebookresearch:master
from
rohanpritchard:export-D16357113
Closed
Dense Feature Normalization Pre-Processing #859
rohanpritchard
wants to merge
1
commit into
facebookresearch:master
from
rohanpritchard:export-D16357113
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
facebook-github-bot
added
the
CLA Signed
Do not delete this pull request or issue due to inactivity.
label
Jul 31, 2019
rohanpritchard
force-pushed
the
export-D16357113
branch
from
July 31, 2019 13:47
45e3efc
to
89fb5e5
Compare
rohanpritchard
pushed a commit
to rohanpritchard/pytext
that referenced
this pull request
Jul 31, 2019
Summary: Pull Request resolved: facebookresearch#859 Often, normalising vector inputs can dramatically improve the performance of your model, [this video](https://www.youtube.com/watch?v=UIp2CMI0748) explains why. I have also found that when training my models on un-normalized data, the confidence scores of my labels end up being exactly 1 or 0, with no values in-between, normalizing my data usually fixes this. This diff adds a config option to perform vector normalisation via *(x - mean)/stddev* for the FloatListTensorizer, and exports the avgs/stddevs meta data through to the TorchScript forward function for `DocModel`, so that fresh data at inference time can also be normalised. The default config option is False, so no current model configs should be affected. The test plan below uses 2 different text + dense_feature models (an unnormalised/default model, and a normalised model), the latter performs considerably better in this context. Differential Revision: D16357113 fbshipit-source-id: da097c102b432139773280142e51ffabc07a7b29
rohanpritchard
force-pushed
the
export-D16357113
branch
from
July 31, 2019 14:32
89fb5e5
to
0eed49e
Compare
rohanpritchard
pushed a commit
to rohanpritchard/pytext
that referenced
this pull request
Jul 31, 2019
Summary: Pull Request resolved: facebookresearch#859 Often, normalising vector inputs can dramatically improve the performance of your model, [this video](https://www.youtube.com/watch?v=UIp2CMI0748) explains why. I have also found that when training my models on un-normalized data, the confidence scores of my labels end up being exactly 1 or 0, with no values in-between, normalizing my data usually fixes this. This diff adds a config option to perform vector normalisation via *(x - mean)/stddev* for the FloatListTensorizer, and exports the avgs/stddevs meta data through to the TorchScript forward function for `DocModel`, so that fresh data at inference time can also be normalised. The default config option is False, so no current model configs should be affected. The test plan below uses 2 different text + dense_feature models (an unnormalised/default model, and a normalised model), the latter performs considerably better in this context. Differential Revision: D16357113 fbshipit-source-id: f0710e5ca85649d254c0a46d07951ed95c4df18d
rohanpritchard
force-pushed
the
export-D16357113
branch
from
August 1, 2019 12:38
0eed49e
to
fc048a7
Compare
rohanpritchard
pushed a commit
to rohanpritchard/pytext
that referenced
this pull request
Aug 1, 2019
Summary: Pull Request resolved: facebookresearch#859 Often, normalising vector inputs can dramatically improve the performance of your model, [this video](https://www.youtube.com/watch?v=UIp2CMI0748) explains why. I have also found that when training my models on un-normalized data, the confidence scores of my labels end up being exactly 1 or 0, with no values in-between, normalizing my data usually fixes this. This diff adds a config option to perform vector normalisation via *(x - mean)/stddev* for the FloatListTensorizer, and exports the avgs/stddevs meta data through to the TorchScript forward function for `DocModel`, so that fresh data at inference time can also be normalised. The default config option is False, so no current model configs should be affected. The test plan below uses 2 different text + dense_feature models (an unnormalised/default model, and a normalised model), the latter performs considerably better in this context. Differential Revision: D16357113 fbshipit-source-id: 656f3351ba614f5a2c11cdb6691907c83aec9560
rohanpritchard
force-pushed
the
export-D16357113
branch
from
August 3, 2019 08:39
fc048a7
to
8b8939c
Compare
rohanpritchard
pushed a commit
to rohanpritchard/pytext
that referenced
this pull request
Aug 3, 2019
Summary: Pull Request resolved: facebookresearch#859 Often, normalising vector inputs can dramatically improve the performance of your model, [this video](https://www.youtube.com/watch?v=UIp2CMI0748) explains why. I have also found that when training my models on un-normalized data, the confidence scores of my labels end up being exactly 1 or 0, with no values in-between, normalizing my data usually fixes this. This diff adds a config option to perform vector normalisation via *(x - mean)/stddev* for the FloatListTensorizer, and exports the avgs/stddevs meta data through to the TorchScript forward function for `DocModel`, so that fresh data at inference time can also be normalised. The default config option is False, so no current model configs should be affected. The test plan below uses 2 different text + dense_feature models (an unnormalised/default model, and a normalised model), the latter performs considerably better in this context. Differential Revision: D16357113 fbshipit-source-id: 1a9eb438e5446157a7cd92bc6025eb98b25d6e3b
rohanpritchard
force-pushed
the
export-D16357113
branch
from
August 5, 2019 12:59
8b8939c
to
4e7e184
Compare
rohanpritchard
pushed a commit
to rohanpritchard/pytext
that referenced
this pull request
Aug 5, 2019
Summary: Pull Request resolved: facebookresearch#859 Often, normalising vector inputs can dramatically improve the performance of your model, [this video](https://www.youtube.com/watch?v=UIp2CMI0748) explains why. I have also found that when training my models on un-normalized data, the confidence scores of my labels end up being exactly 1 or 0, with no values in-between, normalizing my data usually fixes this. This diff adds a config option to perform vector normalisation via *(x - mean)/stddev* for the FloatListTensorizer, and exports the avgs/stddevs meta data through to the TorchScript forward function for `DocModel`, so that fresh data at inference time can also be normalised. The default config option is False, so no current model configs should be affected. The test plan below uses 2 different text + dense_feature models (an unnormalised/default model, and a normalised model), the latter performs considerably better in this context. Differential Revision: D16357113 fbshipit-source-id: 4563559c7b54167c6610ce157e33a11009a875fa
Summary: Pull Request resolved: facebookresearch#859 Often, normalising vector inputs can dramatically improve the performance of your model, [this video](https://www.youtube.com/watch?v=UIp2CMI0748) explains why. I have also found that when training my models on un-normalized data, the confidence scores of my labels end up being exactly 1 or 0, with no values in-between, normalizing my data usually fixes this. This diff adds a config option to perform vector normalisation via *(x - mean)/stddev* for the FloatListTensorizer, and exports the avgs/stddevs meta data through to the TorchScript forward function for `DocModel`, so that fresh data at inference time can also be normalised. The default config option is False, so no current model configs should be affected. The test plan below uses 2 different text + dense_feature models (an unnormalised/default model, and a normalised model), the latter performs considerably better in this context. Differential Revision: D16357113 fbshipit-source-id: 8cb70145692361a0c9b44d311df0da8ab6824406
rohanpritchard
force-pushed
the
export-D16357113
branch
from
August 5, 2019 13:16
4e7e184
to
ecc9d17
Compare
This pull request has been merged in 47a6843. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
Often, normalising vector inputs can dramatically improve the performance of your model, this video explains why. I have also found that when training my models on un-normalized data, the confidence scores of my labels end up being exactly 1 or 0, with no values in-between, normalizing my data usually fixes this.
This diff adds a config option to perform vector normalisation via (x - mean)/stddev for the FloatListTensorizer, and exports the avgs/stddevs meta data through to the TorchScript forward function for
DocModel
, so that fresh data at inference time can also be normalised. The default config option is False, so no current model configs should be affected.The test plan below uses 2 different text + dense_feature models (an unnormalised/default model, and a normalised model), the latter performs considerably better in this context.
Differential Revision: D16357113