Skip to content
This repository has been archived by the owner on Nov 22, 2022. It is now read-only.

Dense Feature Normalization Pre-Processing #859

Conversation

rohanpritchard
Copy link

Summary:
Often, normalising vector inputs can dramatically improve the performance of your model, this video explains why. I have also found that when training my models on un-normalized data, the confidence scores of my labels end up being exactly 1 or 0, with no values in-between, normalizing my data usually fixes this.

This diff adds a config option to perform vector normalisation via (x - mean)/stddev for the FloatListTensorizer, and exports the avgs/stddevs meta data through to the TorchScript forward function for DocModel, so that fresh data at inference time can also be normalised. The default config option is False, so no current model configs should be affected.

The test plan below uses 2 different text + dense_feature models (an unnormalised/default model, and a normalised model), the latter performs considerably better in this context.

Differential Revision: D16357113

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Jul 31, 2019
rohanpritchard pushed a commit to rohanpritchard/pytext that referenced this pull request Jul 31, 2019
Summary:
Pull Request resolved: facebookresearch#859

Often, normalising vector inputs can dramatically improve the performance of your model, [this video](https://www.youtube.com/watch?v=UIp2CMI0748) explains why. I have also found that when training my models on un-normalized data, the confidence scores of my labels end up being exactly 1 or 0, with no values in-between, normalizing my data usually fixes this.

This diff adds a config option to perform vector normalisation via *(x - mean)/stddev* for the FloatListTensorizer, and exports the avgs/stddevs meta data through to the TorchScript forward function for `DocModel`, so that fresh data at inference time can also be normalised. The default config option is False, so no current model configs should be affected.

The test plan below uses 2 different text + dense_feature models (an unnormalised/default model, and a normalised model), the latter performs considerably better in this context.

Differential Revision: D16357113

fbshipit-source-id: da097c102b432139773280142e51ffabc07a7b29
rohanpritchard pushed a commit to rohanpritchard/pytext that referenced this pull request Jul 31, 2019
Summary:
Pull Request resolved: facebookresearch#859

Often, normalising vector inputs can dramatically improve the performance of your model, [this video](https://www.youtube.com/watch?v=UIp2CMI0748) explains why. I have also found that when training my models on un-normalized data, the confidence scores of my labels end up being exactly 1 or 0, with no values in-between, normalizing my data usually fixes this.

This diff adds a config option to perform vector normalisation via *(x - mean)/stddev* for the FloatListTensorizer, and exports the avgs/stddevs meta data through to the TorchScript forward function for `DocModel`, so that fresh data at inference time can also be normalised. The default config option is False, so no current model configs should be affected.

The test plan below uses 2 different text + dense_feature models (an unnormalised/default model, and a normalised model), the latter performs considerably better in this context.

Differential Revision: D16357113

fbshipit-source-id: f0710e5ca85649d254c0a46d07951ed95c4df18d
rohanpritchard pushed a commit to rohanpritchard/pytext that referenced this pull request Aug 1, 2019
Summary:
Pull Request resolved: facebookresearch#859

Often, normalising vector inputs can dramatically improve the performance of your model, [this video](https://www.youtube.com/watch?v=UIp2CMI0748) explains why. I have also found that when training my models on un-normalized data, the confidence scores of my labels end up being exactly 1 or 0, with no values in-between, normalizing my data usually fixes this.

This diff adds a config option to perform vector normalisation via *(x - mean)/stddev* for the FloatListTensorizer, and exports the avgs/stddevs meta data through to the TorchScript forward function for `DocModel`, so that fresh data at inference time can also be normalised. The default config option is False, so no current model configs should be affected.

The test plan below uses 2 different text + dense_feature models (an unnormalised/default model, and a normalised model), the latter performs considerably better in this context.

Differential Revision: D16357113

fbshipit-source-id: 656f3351ba614f5a2c11cdb6691907c83aec9560
rohanpritchard pushed a commit to rohanpritchard/pytext that referenced this pull request Aug 3, 2019
Summary:
Pull Request resolved: facebookresearch#859

Often, normalising vector inputs can dramatically improve the performance of your model, [this video](https://www.youtube.com/watch?v=UIp2CMI0748) explains why. I have also found that when training my models on un-normalized data, the confidence scores of my labels end up being exactly 1 or 0, with no values in-between, normalizing my data usually fixes this.

This diff adds a config option to perform vector normalisation via *(x - mean)/stddev* for the FloatListTensorizer, and exports the avgs/stddevs meta data through to the TorchScript forward function for `DocModel`, so that fresh data at inference time can also be normalised. The default config option is False, so no current model configs should be affected.

The test plan below uses 2 different text + dense_feature models (an unnormalised/default model, and a normalised model), the latter performs considerably better in this context.

Differential Revision: D16357113

fbshipit-source-id: 1a9eb438e5446157a7cd92bc6025eb98b25d6e3b
rohanpritchard pushed a commit to rohanpritchard/pytext that referenced this pull request Aug 5, 2019
Summary:
Pull Request resolved: facebookresearch#859

Often, normalising vector inputs can dramatically improve the performance of your model, [this video](https://www.youtube.com/watch?v=UIp2CMI0748) explains why. I have also found that when training my models on un-normalized data, the confidence scores of my labels end up being exactly 1 or 0, with no values in-between, normalizing my data usually fixes this.

This diff adds a config option to perform vector normalisation via *(x - mean)/stddev* for the FloatListTensorizer, and exports the avgs/stddevs meta data through to the TorchScript forward function for `DocModel`, so that fresh data at inference time can also be normalised. The default config option is False, so no current model configs should be affected.

The test plan below uses 2 different text + dense_feature models (an unnormalised/default model, and a normalised model), the latter performs considerably better in this context.

Differential Revision: D16357113

fbshipit-source-id: 4563559c7b54167c6610ce157e33a11009a875fa
Summary:
Pull Request resolved: facebookresearch#859

Often, normalising vector inputs can dramatically improve the performance of your model, [this video](https://www.youtube.com/watch?v=UIp2CMI0748) explains why. I have also found that when training my models on un-normalized data, the confidence scores of my labels end up being exactly 1 or 0, with no values in-between, normalizing my data usually fixes this.

This diff adds a config option to perform vector normalisation via *(x - mean)/stddev* for the FloatListTensorizer, and exports the avgs/stddevs meta data through to the TorchScript forward function for `DocModel`, so that fresh data at inference time can also be normalised. The default config option is False, so no current model configs should be affected.

The test plan below uses 2 different text + dense_feature models (an unnormalised/default model, and a normalised model), the latter performs considerably better in this context.

Differential Revision: D16357113

fbshipit-source-id: 8cb70145692361a0c9b44d311df0da8ab6824406
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 47a6843.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed Do not delete this pull request or issue due to inactivity. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants