Dense Feature Normalization Pre-Processing #859

rohanpritchard · 2019-07-31T11:37:43Z

Summary:
Often, normalising vector inputs can dramatically improve the performance of your model, this video explains why. I have also found that when training my models on un-normalized data, the confidence scores of my labels end up being exactly 1 or 0, with no values in-between, normalizing my data usually fixes this.

This diff adds a config option to perform vector normalisation via (x - mean)/stddev for the FloatListTensorizer, and exports the avgs/stddevs meta data through to the TorchScript forward function for DocModel, so that fresh data at inference time can also be normalised. The default config option is False, so no current model configs should be affected.

The test plan below uses 2 different text + dense_feature models (an unnormalised/default model, and a normalised model), the latter performs considerably better in this context.

Differential Revision: D16357113

Summary: Pull Request resolved: facebookresearch#859 Often, normalising vector inputs can dramatically improve the performance of your model, [this video](https://www.youtube.com/watch?v=UIp2CMI0748) explains why. I have also found that when training my models on un-normalized data, the confidence scores of my labels end up being exactly 1 or 0, with no values in-between, normalizing my data usually fixes this. This diff adds a config option to perform vector normalisation via *(x - mean)/stddev* for the FloatListTensorizer, and exports the avgs/stddevs meta data through to the TorchScript forward function for `DocModel`, so that fresh data at inference time can also be normalised. The default config option is False, so no current model configs should be affected. The test plan below uses 2 different text + dense_feature models (an unnormalised/default model, and a normalised model), the latter performs considerably better in this context. Differential Revision: D16357113 fbshipit-source-id: da097c102b432139773280142e51ffabc07a7b29

Summary: Pull Request resolved: facebookresearch#859 Often, normalising vector inputs can dramatically improve the performance of your model, [this video](https://www.youtube.com/watch?v=UIp2CMI0748) explains why. I have also found that when training my models on un-normalized data, the confidence scores of my labels end up being exactly 1 or 0, with no values in-between, normalizing my data usually fixes this. This diff adds a config option to perform vector normalisation via *(x - mean)/stddev* for the FloatListTensorizer, and exports the avgs/stddevs meta data through to the TorchScript forward function for `DocModel`, so that fresh data at inference time can also be normalised. The default config option is False, so no current model configs should be affected. The test plan below uses 2 different text + dense_feature models (an unnormalised/default model, and a normalised model), the latter performs considerably better in this context. Differential Revision: D16357113 fbshipit-source-id: f0710e5ca85649d254c0a46d07951ed95c4df18d

Summary: Pull Request resolved: facebookresearch#859 Often, normalising vector inputs can dramatically improve the performance of your model, [this video](https://www.youtube.com/watch?v=UIp2CMI0748) explains why. I have also found that when training my models on un-normalized data, the confidence scores of my labels end up being exactly 1 or 0, with no values in-between, normalizing my data usually fixes this. This diff adds a config option to perform vector normalisation via *(x - mean)/stddev* for the FloatListTensorizer, and exports the avgs/stddevs meta data through to the TorchScript forward function for `DocModel`, so that fresh data at inference time can also be normalised. The default config option is False, so no current model configs should be affected. The test plan below uses 2 different text + dense_feature models (an unnormalised/default model, and a normalised model), the latter performs considerably better in this context. Differential Revision: D16357113 fbshipit-source-id: 656f3351ba614f5a2c11cdb6691907c83aec9560

Summary: Pull Request resolved: facebookresearch#859 Often, normalising vector inputs can dramatically improve the performance of your model, [this video](https://www.youtube.com/watch?v=UIp2CMI0748) explains why. I have also found that when training my models on un-normalized data, the confidence scores of my labels end up being exactly 1 or 0, with no values in-between, normalizing my data usually fixes this. This diff adds a config option to perform vector normalisation via *(x - mean)/stddev* for the FloatListTensorizer, and exports the avgs/stddevs meta data through to the TorchScript forward function for `DocModel`, so that fresh data at inference time can also be normalised. The default config option is False, so no current model configs should be affected. The test plan below uses 2 different text + dense_feature models (an unnormalised/default model, and a normalised model), the latter performs considerably better in this context. Differential Revision: D16357113 fbshipit-source-id: 1a9eb438e5446157a7cd92bc6025eb98b25d6e3b

Summary: Pull Request resolved: facebookresearch#859 Often, normalising vector inputs can dramatically improve the performance of your model, [this video](https://www.youtube.com/watch?v=UIp2CMI0748) explains why. I have also found that when training my models on un-normalized data, the confidence scores of my labels end up being exactly 1 or 0, with no values in-between, normalizing my data usually fixes this. This diff adds a config option to perform vector normalisation via *(x - mean)/stddev* for the FloatListTensorizer, and exports the avgs/stddevs meta data through to the TorchScript forward function for `DocModel`, so that fresh data at inference time can also be normalised. The default config option is False, so no current model configs should be affected. The test plan below uses 2 different text + dense_feature models (an unnormalised/default model, and a normalised model), the latter performs considerably better in this context. Differential Revision: D16357113 fbshipit-source-id: 4563559c7b54167c6610ce157e33a11009a875fa

Summary: Pull Request resolved: facebookresearch#859 Often, normalising vector inputs can dramatically improve the performance of your model, [this video](https://www.youtube.com/watch?v=UIp2CMI0748) explains why. I have also found that when training my models on un-normalized data, the confidence scores of my labels end up being exactly 1 or 0, with no values in-between, normalizing my data usually fixes this. This diff adds a config option to perform vector normalisation via *(x - mean)/stddev* for the FloatListTensorizer, and exports the avgs/stddevs meta data through to the TorchScript forward function for `DocModel`, so that fresh data at inference time can also be normalised. The default config option is False, so no current model configs should be affected. The test plan below uses 2 different text + dense_feature models (an unnormalised/default model, and a normalised model), the latter performs considerably better in this context. Differential Revision: D16357113 fbshipit-source-id: 8cb70145692361a0c9b44d311df0da8ab6824406

facebook-github-bot · 2019-08-07T11:49:05Z

This pull request has been merged in 47a6843.

facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Jul 31, 2019

rohanpritchard force-pushed the export-D16357113 branch from 45e3efc to 89fb5e5 Compare July 31, 2019 13:47

rohanpritchard force-pushed the export-D16357113 branch from 89fb5e5 to 0eed49e Compare July 31, 2019 14:32

rohanpritchard force-pushed the export-D16357113 branch from 0eed49e to fc048a7 Compare August 1, 2019 12:38

rohanpritchard force-pushed the export-D16357113 branch from fc048a7 to 8b8939c Compare August 3, 2019 08:39

rohanpritchard force-pushed the export-D16357113 branch from 8b8939c to 4e7e184 Compare August 5, 2019 12:59

rohanpritchard force-pushed the export-D16357113 branch from 4e7e184 to ecc9d17 Compare August 5, 2019 13:16

facebook-github-bot closed this in 47a6843 Aug 7, 2019

facebook-github-bot added the Merged label Aug 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dense Feature Normalization Pre-Processing #859

Dense Feature Normalization Pre-Processing #859

rohanpritchard commented Jul 31, 2019

facebook-github-bot commented Aug 7, 2019

Dense Feature Normalization Pre-Processing #859

Dense Feature Normalization Pre-Processing #859

Conversation

rohanpritchard commented Jul 31, 2019

facebook-github-bot commented Aug 7, 2019