Shuffling scripts for preproc and shuffle during training [dlrm/criteo] #54

s4ayub · 2022-02-08T22:32:13Z

Summary:
Pull Request resolved: #10

The original fb dlrm implementation shuffled batches to get their final results.

Reviewed By: colin2328

Differential Revision: D34008000

…ng [1/n] Summary: **Preproc for dlrm inspired by NVIDIA DLRM Preproc: ** https://catalog.ngc.nvidia.com/orgs/nvidia/resources/dlrm_for_pytorch/advanced (under dataset guidelines) - Re-map sparse ids to contiguous integers (`with this you can have an embedding table of size num_categories x emb_dim`) - Frequency thresholding; if an id shows up less than T times, remap it to a value of 1 (`Fit model on particular GPU`, `Capture all rarely occurring categories into one because otherwise for these categories you would overfit`) full details of benefits of this preprocessing: - NVIDIA/DeepLearningExamples#1062 (comment) Differential Revision: D33998505 fbshipit-source-id: a7a2fb6bcbfbb4ffa347cd3663f3f1a87a56b9aa

Summary: Shuffles the dataset by creating the full dataset from the split .npy files. Outputs the shuffled dataset in split format (labels, sparse, dense) .npy files. Differential Revision: D34007646 fbshipit-source-id: e6c265bf5572619e7b6c9647dc095919c70f76d1

Summary: Pull Request resolved: meta-pytorch/torchrec#10 The original fb dlrm implementation shuffled batches to get their final results. Reviewed By: colin2328 Differential Revision: D34008000 fbshipit-source-id: b008c79841d75590f709150156455d1e9a68805a

facebook-github-bot · 2022-02-08T22:32:33Z

This pull request was exported from Phabricator. Differential Revision: D34008000

s4ayub · 2022-02-08T22:38:21Z

linter warning is unrelated to code changes

Summary: Pull Request resolved: #54 Pull Request resolved: #10 The original fb dlrm implementation shuffled batches to get their final results. Reviewed By: colin2328 Differential Revision: D34008000 fbshipit-source-id: 811405e02bf36436afcb0fab4873d0947396be6b

s4ayub added 3 commits February 8, 2022 14:32

facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported labels Feb 8, 2022

s4ayub changed the title ~~Shuffle training batches [3/n] (#10)~~ Shuffling scripts for preproc and shuffle during training Feb 8, 2022

s4ayub changed the title ~~Shuffling scripts for preproc and shuffle during training~~ Shuffling scripts for preproc and shuffle during training [dlrm/criteo] Feb 8, 2022

facebook-github-bot closed this Feb 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Shuffling scripts for preproc and shuffle during training [dlrm/criteo] #54

Shuffling scripts for preproc and shuffle during training [dlrm/criteo] #54

Uh oh!

s4ayub commented Feb 8, 2022

Uh oh!

facebook-github-bot commented Feb 8, 2022

Uh oh!

s4ayub commented Feb 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Shuffling scripts for preproc and shuffle during training [dlrm/criteo] #54

Shuffling scripts for preproc and shuffle during training [dlrm/criteo] #54

Uh oh!

Conversation

s4ayub commented Feb 8, 2022

Uh oh!

facebook-github-bot commented Feb 8, 2022

Uh oh!

s4ayub commented Feb 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants