Add pyre support to torchrec repo #10

joshuadeng · 2021-10-27T19:05:06Z

Summary: tsia

Differential Revision: D31941381

fbshipit-source-id: 18eb1fdcd8d0571b03cc80ffe3ac866486871ba3

Summary: The current torchrec lint has issues. The issue is exposed by the recent diff: D30597438. For example: the current BaseEmbedding in fbcode/torchrec/distributed/embedding_lookup.py has no __init__ function, which will cause a keyError Exception. This failure won't be catched by the recent fix in D30597438. Thus, making the lint output failed as: Error (TORCHRECDOCSTRING) lint-command-failure Command `python3 fbcode/torchrec/linter/module_linter.py @{{PATHSFILE}}` failed. Run `arc lint --trace --take TORCHRECDOCSTRING` for the full command output. Oncall: torchrec Sandcastle test also shows the following error as: {F659615618} We submit these two changes: 1. to resolve the function_name "__init__" or "forward", if they are not in function list 2. to catch the remaining exception, except the syntaxerror. Reviewed By: zertosh Differential Revision: D30722426 fbshipit-source-id: 5f11110af039f2fa7bc3f63902739f0e6ea5e287

Summary: * added ads 2021h1 model * random data launcher unit test locally * refactor torchrec ddp init for module with no params with gradients need(we have a pure buffer-based module for ads as calibration) * source model: https://www.internalfb.com/code/fbsource/[9f3e1042dd2d]/fbcode/hpc/models/ads/ads_1x_2021h1.py Reviewed By: xing-liu Differential Revision: D30076019 fbshipit-source-id: 568205e0c4fa6e60eaf4c9e94946acad5d8578e5

Summary: The current design expose invoke_on_rank_and_broadcast_result(...) call to users which is not very user friendly Reviewed By: divchenko Differential Revision: D30733086 fbshipit-source-id: 6824d2ecfb9fc149c3cb7fc095d7f9ac96ba4ed1

Summary: WarmupOptimizer needs to update underlying optimizer states; bug was introduced in refactor of CombinedOptimizer D30176405 Reviewed By: divchenko Differential Revision: D30755047 fbshipit-source-id: 122038e6a4c7bc73cc859ed8cffa68e2b9841a63

Summary: Generalize regroup method introduced in D30044807 Will switch out usage in ig_clips_tab_pt.py in a followup diff Reviewed By: divchenko Differential Revision: D30375713 fbshipit-source-id: 6eb37c4f547db04d0048134187bb7aa0657bb9cf

Summary: Use torch SiLU instead Reviewed By: colin2328 Differential Revision: D30700094 fbshipit-source-id: b4a92e971769b9f7be739264869cee176f55f5e9

Summary: This modules promote non-functional style of modeling. Reviewed By: wx1988 Differential Revision: D30701381 fbshipit-source-id: fedc510366e5a10e87b6ab71ac12204c5b91b45d

Summary: Pull Request resolved: #1 * move it to ml_foundation folder before further testing on the performance * make it non lazy * add numerical testing Reviewed By: divchenko Differential Revision: D30756661 fbshipit-source-id: e2c50848bec12943951476d23991a6f586916487

Summary: Prior to this diff, we used a fixed param key "embedding_bags" in embedding_lookup.py, This diff moves the code to embedding module sharders so we can use different keys for different embedding modules. Reviewed By: divchenko Differential Revision: D30801536 fbshipit-source-id: ef04bd0b727139829bc6879555dfe819422b3884

Summary: This diff integrates with ShardedTensor in PyTorch distributed according to the plan/discussions in https://fb.quip.com/fwucARGO5SeO We are doing the first part of integration, which we replaces the ShardedTensor/Metadata defintions in torchrec and using the ones defined in Pytorch distributed. A second part integration might be more involved that we need to accomodate fbgemm kernels to take sharded tensor and do the computation, then switching to a mode that a ShardedModule contains a sharded weight/tensor directly, instead of multiple small nn.EmbeddingBags. Reviewed By: YazhiGao Differential Revision: D29403713 fbshipit-source-id: 279643bd01261ae564238b9dea9d2af5597342c2

Summary: Support Copies of Data Reviewed By: YazhiGao Differential Revision: D30262094 fbshipit-source-id: 33a32245afbc419436c1902ba32020ebb4c133e7

… on AWS cluster. Summary: # Context * Inside fbcode, we don't need to worry much about how to use torchrec. It's as simple as running `import torchrec` and letting autodeps figure out how to add the relevant buck target. * In OSS, where there is no buck, we need to somehow be able to run `import torchrec`. And we want to do this in a way that is independent of where we call our python script `python3 ~/example_folder/example_folder/.../my_torchrec_script.py`. i.e. We don't want to have to keep `my_torchrec_script.py` at the same level as the torchrec repo just so we can call `import torchrec` (as this will not work when my_torchrec_script.py cannot be easily co-located with the torchrec repo. e.g. torchrec STL app scripts): ``` random_folder |_______________repos/ |_______torchrec/ |_______my_torchrec_script.py ``` # This Diff The way to allow us to run `import torchrec` anywhere is to make a `setup.py` for torchrec which allows us to install torchrec with `python setup.py install`. This diff adds a **minimum viable version** of the setup.py that is **just good enough to unblock TorchRec external scale validation work on AWS clusters**. If you look at the setup.py for other domain libraries, they are way more complicated (e.g. [torchvision setup.py](https://fburl.com/zqef7peu)) and we will eventually upgrade this setup.py so it is more sophisticated for the official OSS release. Reviewed By: colin2328 Differential Revision: D30839689 fbshipit-source-id: 9ac7722eaf8685e5d7a6b7f422ae3c91991d49c6

Summary: Assert integer types for JT & KJT lengths and offsets; checked tensor data type in JT class. KJT class was already covered. Reviewed By: dstaay-fb Differential Revision: D30842080 fbshipit-source-id: cf78edfffabb30f664951bfe35cf7b665df18e7c

…ollection, and GroupedPooledEmbeddingsLookup Summary: all nn.Modules should be able to self.load_state_dict(self.state_dict()). Current EmbeddingBag modules cannot, and DMP itself cannot. This diff reflects state_dict() customization to undo it in load_state_dict to maintain. It adds a test in DMP to test for this Reviewed By: divchenko, rkindi Differential Revision: D30820466 fbshipit-source-id: 181ee3484aac6c348b6bb15dc59494c188b2e89c

Summary: To add layer norm's non-lazy version. Keep current usage of Lazy version LayerNorm in Video and IG as is. * Add non lazy version of LayerNorm * Rename the TorchRec version LayerNorm as MCLayerNorm and LazyMCLayerNorm * Move MCLayerName and LazyMCLayerNorm into torchrec/fb/module folder * Add numerical unit test * Add lazy vs nonlazy numerical unit test * Fix the adoption. Reviewed By: divchenko Differential Revision: D30828204 fbshipit-source-id: db722abef965622829489c60a7e5866178343814

Summary: update KJTA2A docstring, provide _recat example Reviewed By: colin2328 Differential Revision: D30877670 fbshipit-source-id: 50eca883d0c49df0738837d682c7179332c88627

…le workers. Summary: # Context DataLoader can be used with multiple workers/processes to increase throughput. Map-style datasets (due to having a length property and keyed samples) automatically ensure that samples from the dataset are not duplicated across the multiple workers. However, for IterDataPipes (stream-style datasets), we must manually support coordinate the workers so they don't duplicate samples ([see relevant PyTorch docs here](https://pytorch.org/docs/stable/data.html#torch.utils.data.IterableDataset)). Criteo is a torchrec IterDataPipe that does not currently have logic to prevent duplicate samples. # This Diff * Adds support for Criteo to handle multiple workers without duplicating samples across workers. Followed the PyTorch [docs](https://pytorch.org/docs/stable/data.html#torch.utils.data.IterableDataset)' suggestion on how to do this. * Adds some unit tests wrapping the Criteo dataset in DataLoader showing that multiple workers now works without duplicating data. # Implementation Details *How do we split up the input criteo TSV files across the different workers?* There are a few options I considered. **tldr** Option 1 used in this diff is simple and performant. If you want to squeeze additional utilization of the workers, you can subdivide the TSVs into smaller ones. Option 2 is too wasteful. Option 3 is too complicated and is not as performant as option 1. * Option 1 (what this diff does): Each TSV file is assigned to one worker. * Pros: * Straightforward implementation. Works best when number TSV files evenly divides num_workers. * All data is read only once. * Cons: * During validation, if you have just 1 tsv file, only one worker gets to process that file while all other workers are idle. * Option 2: Every tsv file is read across all the workers and we drop rows on each worker to prevent duplication. * Pros: * All workers are being utilized even for a single TSV. * Cons: * Terribly wasteful: each worker reads all of the rows and drops (num_workers - 1) / (num_workers) portion of the rows. Each worker essentially reads in all the data. * Option 3: Every tsv file is sharded across all the workers. Instead of naively reading all the data like in Option 2, we somehow use IOBase `seek` to chunk the tsv up and assign the chunks to different workers. * Pros: * All data is only read once. (in theory, see cons below) * All workers are being utilized even for a single TSV. * Cons: * **Very complicated.** Because each row of the TSV does not use the same number of bytes, when you seek in a TSV file, you might end up somewhere in the middle of one of the rows. You might need to drop that row, or do an additional seek to jump back to collect the rest of the row. You may get a performance hit due to the seeking. * You can achieve the same effect with better performance (due to the lack of seeks) by subdividing the TSV files into smaller files and using Option 1. Reviewed By: colin2328 Differential Revision: D30872755 fbshipit-source-id: 85396e8db28f79ed83d62f70fcf991cfd6108216

Summary: The diff is to refactor mlp related modules: * make the perceptron, mlp, mcmlp and mcperceptron non-lazy * make the mlp as a apex.mlp wrapper if it is available * move the mlp (calling perceptron) to torchrec/fb/modules * move the mc version to torchrec/fb/ml_foundation/modules * update unit test * update the related calling Reviewed By: wx1988 Differential Revision: D30874769 fbshipit-source-id: 59b0d4d0fcd456ce528de141d1074374f2bde4fd

Summary: For a somewhat common data transform use case, where we need to convert from unpacked format back to the packed format. (In particular, this is a dependency for cross batch sampling) Reviewed By: divchenko Differential Revision: D30890351 fbshipit-source-id: b387f9f67b58c7e7b021fc6fc67bcc9f9be432de

Summary: 1. lengths in KJTAll2all can be int64 2. Use the external all_to_all_single(...) API instead of alltoall_base Reviewed By: colin2328, jiaqizhai Differential Revision: D30925298 fbshipit-source-id: f835454f6dbaec60c8a0bbeceaba2efe25e8ab18

Summary: Pull Request resolved: #2 * add shards idx, ranks, size in related config for metadata passing * add cw sharding for per-rank table allocation. * many design decision are captured in https://fb.quip.com/byvkAZGpK1o0 Reviewed By: dstaay-fb Differential Revision: D30437562 fbshipit-source-id: 0570e431d1ebb128d3d0871681093f95fe56d5f8

Summary: Added unit tests for GradientClippingOptimizer Reviewed By: dstaay-fb Differential Revision: D30876265 fbshipit-source-id: 762567572b712bd9dd40820f07ec21843fe014df

…ules Summary: 1. override named_parameters(). Optimizer will use named_parameters() instead. 2. simplify state_dict() Differential Revision: D30944159 fbshipit-source-id: 7240f5e6188a3ee014f025ec4947032043bb086b

Summary: Ensure Rank/Device match in ShardMetaData (can not assume device is same as current device planner is run on - ie. before change was leading to rank:1/cuda:0) Reviewed By: YazhiGao Differential Revision: D31030367 fbshipit-source-id: 54f9de2611170d1a529afe74a4452388b057f818

Differential Revision: D31042728 fbshipit-source-id: 14799576da39297674ad302ca3fb035c436d82cc

Summary: The diff contains the following items: * DCN refactor to be non-lazy version * move DCN to torchrec/modules * add unit test to have numerical testing The reason to not keep lazy version is that: - it is a minor change with in_features, such that the lazy module won't save much from complexity. - torchrec/modules is a non-lazy environment. Reviewed By: yiq-liu Differential Revision: D31028571 fbshipit-source-id: dececb85889471aad642404d83a5b6faec32d975

Summary: Pull Request resolved: #3 fix tensor placement where the remote device should receive {rank, local_rank} Reviewed By: dstaay-fb Differential Revision: D31072120 fbshipit-source-id: b884afce691cac48a74524ca69e55c90e1308b39

Summary: as title - twrw doesn't really makes sense for gloo/cpu Reviewed By: rkindi Differential Revision: D31092150 fbshipit-source-id: 0d43c0f68ea049d085c105375c61995285a58f35

Summary: Implement DMP.named_buffers() Differential Revision: D31104124 fbshipit-source-id: 984baf747c3c89b1d0f5ccf4da5d45b57bdf4754

facebook-github-bot · 2021-11-04T19:58:27Z

This pull request was exported from Phabricator. Differential Revision: D31941381

facebook-github-bot · 2021-11-04T20:04:25Z

This pull request was exported from Phabricator. Differential Revision: D31941381

Summary: As titled Reviewed By: JadeNie Differential Revision: D31568073 fbshipit-source-id: f31788013c20deae54dab7e7582c39d95ec0ae5c

Summary: - Calculator: max device cost - Placer: support CW, swap backtrack algo around - Enumerator: switch to using name vs fqn (which is more common on hpc/torchrec configs) Reviewed By: YLGH Differential Revision: D32165540 fbshipit-source-id: ad143c176676ddc5145a90698d13689d7b7ba803

Summary: For some reason in D32055570 (70d9684) the contents of the symlinked files did not show up on github - https://github.com/fairinternal/torchrec/blob/main/third_party/fbgemm_gpu rkindi is this the right way to symlink a file from internal repo? Does this need more config changes? EDIT: Looks like symlinking doesn't work well with github exports. Going to delete the symlink and do this via git submodules Reviewed By: colin2328 Differential Revision: D32073541 fbshipit-source-id: ef0a4d36cca7259a84279e54884c16b1e2674426

Summary: Currently verifying if installation is successful involves installing stl tasks and lightning and is fairly complicated. Apart from the complexity we shouldn't need a second library to verify if TorchRec is installed. I see a lot of value in having a simpler script to verify if installation is successful Reviewed By: rkindi Differential Revision: D32122481 fbshipit-source-id: b4431433be5a799840d6cf6d4c1a5570f8c15afd

Summary: Adding: - ddr memory in gb - sorting sharding types in shards column - adding unused sharding types per rank for readability across all ranks - legend to explain - all logging on >100 lines Differential Revision: D32215263 fbshipit-source-id: 61b54afa6fef96a3ba4935532c529dc2acc44e9c

facebook-github-bot · 2021-11-07T04:11:29Z

This pull request was exported from Phabricator. Differential Revision: D31941381

facebook-github-bot · 2021-11-07T04:11:48Z

This pull request was exported from Phabricator. Differential Revision: D31941381

Summary: Fix warmup for fused optimizers Reviewed By: divchenko, jiaqizhai, zhangruiskyline Differential Revision: D32229152 fbshipit-source-id: 10c924e5b58a6b29cbfde260e7e8677cfe4e3e90

facebook-github-bot · 2021-11-07T04:22:25Z

This pull request was exported from Phabricator. Differential Revision: D31941381

facebook-github-bot · 2021-11-07T04:22:47Z

This pull request was exported from Phabricator. Differential Revision: D31941381

facebook-github-bot · 2021-11-07T04:54:26Z

This pull request was exported from Phabricator. Differential Revision: D31941381

facebook-github-bot · 2021-11-07T04:57:51Z

This pull request was exported from Phabricator. Differential Revision: D31941381

facebook-github-bot · 2021-11-07T04:58:30Z

This pull request was exported from Phabricator. Differential Revision: D31941381

facebook-github-bot · 2021-11-07T05:22:39Z

This pull request was exported from Phabricator. Differential Revision: D31941381

facebook-github-bot · 2021-11-07T05:23:04Z

This pull request was exported from Phabricator. Differential Revision: D31941381

facebook-github-bot · 2021-11-07T05:31:37Z

This pull request was exported from Phabricator. Differential Revision: D31941381

Differential Revision: D32228418 fbshipit-source-id: 3d995813dc93bc05c1277824cdc8a0e750e2cee9

Summary: kjt.bucketize is only called in row-wise sharding before sparse feature all2all. Move it to embedding_sharding.py so we don't need trim_keys(...) for KJT. Reviewed By: divchenko Differential Revision: D32234942 fbshipit-source-id: 6e921c3aa4676573bfdedbaf3449c8d75f2bb1f8

facebook-github-bot · 2021-11-08T20:34:03Z

This pull request was exported from Phabricator. Differential Revision: D31941381

Summary: tsia Pull Request resolved: #10 Differential Revision: D31941381 Pulled By: joshuadeng fbshipit-source-id: 7a073a67dfbbb1050bbae43e702fa2df42bfdffb

facebook-github-bot · 2021-11-08T20:36:41Z

This pull request was exported from Phabricator. Differential Revision: D31941381

facebook-github-bot · 2021-12-28T18:09:47Z

This pull request was exported from Phabricator. Differential Revision: D31941381

facebook-github-bot · 2021-12-28T18:11:51Z

This pull request was exported from Phabricator. Differential Revision: D31941381

Summary: Pull Request resolved: meta-pytorch/torchrec#54 Pull Request resolved: meta-pytorch/torchrec#10 The original fb dlrm implementation shuffled batches to get their final results. Reviewed By: colin2328 Differential Revision: D34008000 fbshipit-source-id: 811405e02bf36436afcb0fab4873d0947396be6b

facebook-github-bot and others added 30 commits September 2, 2021 11:36

Initial commit

fa3adb7

fbshipit-source-id: 18eb1fdcd8d0571b03cc80ffe3ac866486871ba3

Add collective_plan(...) to EmbeddingPlanner

ff1ee34

Summary: The current design expose invoke_on_rank_and_broadcast_result(...) call to users which is not very user friendly Reviewed By: divchenko Differential Revision: D30733086 fbshipit-source-id: 6824d2ecfb9fc149c3cb7fc095d7f9ac96ba4ed1

Introduce KeyedTensor.regroup

baf5733

Summary: Generalize regroup method introduced in D30044807 Will switch out usage in ig_clips_tab_pt.py in a followup diff Reviewed By: divchenko Differential Revision: D30375713 fbshipit-source-id: 6eb37c4f547db04d0048134187bb7aa0657bb9cf

Remove Swish

63eeb82

Summary: Use torch SiLU instead Reviewed By: colin2328 Differential Revision: D30700094 fbshipit-source-id: b4a92e971769b9f7be739264869cee176f55f5e9

Remove concat

d42ee21

Summary: This modules promote non-functional style of modeling. Reviewed By: wx1988 Differential Revision: D30701381 fbshipit-source-id: fedc510366e5a10e87b6ab71ac12204c5b91b45d

Permute supports copies

0e618cb

Summary: Support Copies of Data Reviewed By: YazhiGao Differential Revision: D30262094 fbshipit-source-id: 33a32245afbc419436c1902ba32020ebb4c133e7

_recat docstring

15111c3

Summary: update KJTA2A docstring, provide _recat example Reviewed By: colin2328 Differential Revision: D30877670 fbshipit-source-id: 50eca883d0c49df0738837d682c7179332c88627

Fix dist_data.py

78ebecd

Summary: 1. lengths in KJTAll2all can be int64 2. Use the external all_to_all_single(...) API instead of alltoall_base Reviewed By: colin2328, jiaqizhai Differential Revision: D30925298 fbshipit-source-id: f835454f6dbaec60c8a0bbeceaba2efe25e8ab18

Add unit test for GradientClippingOptimizer

a0892ce

Summary: Added unit tests for GradientClippingOptimizer Reviewed By: dstaay-fb Differential Revision: D30876265 fbshipit-source-id: 762567572b712bd9dd40820f07ec21843fe014df

suppress errors in torchrec

8ee9f26

Differential Revision: D31042728 fbshipit-source-id: 14799576da39297674ad302ca3fb035c436d82cc

fix parameter placement (#3)

3015bca

Summary: Pull Request resolved: #3 fix tensor placement where the remote device should receive {rank, local_rank} Reviewed By: dstaay-fb Differential Revision: D31072120 fbshipit-source-id: b884afce691cac48a74524ca69e55c90e1308b39

disable twrw sharding for non cuda/nccl

a4bc5c2

Summary: as title - twrw doesn't really makes sense for gloo/cpu Reviewed By: rkindi Differential Revision: D31092150 fbshipit-source-id: 0d43c0f68ea049d085c105375c61995285a58f35

Implement named_buffers

5df6f63

Summary: Implement DMP.named_buffers() Differential Revision: D31104124 fbshipit-source-id: 984baf747c3c89b1d0f5ccf4da5d45b57bdf4754

Wenchen Wang and others added 5 commits November 4, 2021 14:24

Add feature process logic in AdsGeneralPooledEmbeddingArch

fff4ba9

Summary: As titled Reviewed By: JadeNie Differential Revision: D31568073 fbshipit-source-id: f31788013c20deae54dab7e7582c39d95ec0ae5c

fix optimizer warmup

31b399b

Summary: Fix warmup for fused optimizers Reviewed By: divchenko, jiaqizhai, zhangruiskyline Differential Revision: D32229152 fbshipit-source-id: 10c924e5b58a6b29cbfde260e7e8677cfe4e3e90

Jiaqi Zhai and others added 2 commits November 7, 2021 12:38

renames concat_sync to concat per discussions

1e2a6c4

Differential Revision: D32228418 fbshipit-source-id: 3d995813dc93bc05c1277824cdc8a0e750e2cee9

Add pyre support to torchrec repo (#10)

7472837

Summary: tsia Pull Request resolved: #10 Differential Revision: D31941381 Pulled By: joshuadeng fbshipit-source-id: 7a073a67dfbbb1050bbae43e702fa2df42bfdffb

RenfeiChen-FB closed this Nov 11, 2021

RenfeiChen-FB force-pushed the main branch from 69326f8 to 69e007d Compare November 11, 2021 00:50

joshuadeng mentioned this pull request Dec 28, 2021

Add pyre support to torchrec repo #30

Closed

hetian127 mentioned this pull request Nov 12, 2025

test_installation.py does not work #3539

Open

Add pyre support to torchrec repo #10

Add pyre support to torchrec repo #10

Uh oh!

Conversation

joshuadeng commented Oct 27, 2021

Uh oh!

facebook-github-bot commented Nov 4, 2021

Uh oh!

facebook-github-bot commented Nov 4, 2021

Uh oh!

facebook-github-bot commented Nov 7, 2021

Uh oh!

facebook-github-bot commented Nov 7, 2021

Uh oh!

facebook-github-bot commented Nov 7, 2021

Uh oh!

facebook-github-bot commented Nov 7, 2021

Uh oh!

facebook-github-bot commented Nov 7, 2021

Uh oh!

facebook-github-bot commented Nov 7, 2021

Uh oh!

facebook-github-bot commented Nov 7, 2021

Uh oh!

facebook-github-bot commented Nov 7, 2021

Uh oh!

facebook-github-bot commented Nov 7, 2021

Uh oh!

facebook-github-bot commented Nov 7, 2021

Uh oh!

facebook-github-bot commented Nov 8, 2021

Uh oh!

facebook-github-bot commented Nov 8, 2021

Uh oh!

facebook-github-bot commented Dec 28, 2021

Uh oh!

facebook-github-bot commented Dec 28, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants