Skip to content

Conversation

@s4ayub
Copy link
Contributor

@s4ayub s4ayub commented Feb 8, 2022

Summary:
Pull Request resolved: #10

The original fb dlrm implementation shuffled batches to get their final results.

Reviewed By: colin2328

Differential Revision: D34008000

…ng [1/n]

Summary:
**Preproc for dlrm inspired by NVIDIA DLRM Preproc: ** https://catalog.ngc.nvidia.com/orgs/nvidia/resources/dlrm_for_pytorch/advanced (under dataset guidelines)

- Re-map sparse ids to contiguous integers (`with this you can have an embedding table of size num_categories x emb_dim`)
- Frequency thresholding; if an id shows up less than T times, remap it to a value of 1 (`Fit model on particular GPU`, `Capture all rarely occurring categories into one because otherwise for these categories you would overfit`)

full details of benefits of this preprocessing:
- NVIDIA/DeepLearningExamples#1062 (comment)

Differential Revision: D33998505

fbshipit-source-id: a7a2fb6bcbfbb4ffa347cd3663f3f1a87a56b9aa
Summary: Shuffles the dataset by creating the full dataset from the split .npy files. Outputs the shuffled dataset in split format (labels, sparse, dense) .npy files.

Differential Revision: D34007646

fbshipit-source-id: e6c265bf5572619e7b6c9647dc095919c70f76d1
Summary:
Pull Request resolved: meta-pytorch/torchrec#10

The original fb dlrm implementation shuffled batches to get their final results.

Reviewed By: colin2328

Differential Revision: D34008000

fbshipit-source-id: b008c79841d75590f709150156455d1e9a68805a
@facebook-github-bot facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported labels Feb 8, 2022
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D34008000

@s4ayub s4ayub changed the title Shuffle training batches [3/n] (#10) Shuffling scripts for preproc and shuffle during training Feb 8, 2022
@s4ayub s4ayub changed the title Shuffling scripts for preproc and shuffle during training Shuffling scripts for preproc and shuffle during training [dlrm/criteo] Feb 8, 2022
@s4ayub
Copy link
Contributor Author

s4ayub commented Feb 8, 2022

linter warning is unrelated to code changes

facebook-github-bot pushed a commit that referenced this pull request Feb 9, 2022
Summary:
Pull Request resolved: #54

Pull Request resolved: #10

The original fb dlrm implementation shuffled batches to get their final results.

Reviewed By: colin2328

Differential Revision: D34008000

fbshipit-source-id: 811405e02bf36436afcb0fab4873d0947396be6b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants