-
Notifications
You must be signed in to change notification settings - Fork 667
per-group and per-channel quantization #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
24815db
to
ba364c6
Compare
jspark1105
added a commit
to jspark1105/pytorch
that referenced
this pull request
Nov 26, 2018
Summary: Pull Request resolved: pytorch#14340 Pull Request resolved: pytorch/FBGEMM#25 Per-group and per-channel quantization in fbgemm This diff also cleans up explicit template instantiation using macro expansion This diff also changes randFill interface which was easy to make mistakes of generating integer random numbers for floating point vectors. Using this in DNNLOWP operators will be done in a separate diff. Differential Revision: D13176386 fbshipit-source-id: 3137039d2822e42a16881638d54897d9c8bc75f4
Differential Revision: D13166591 fbshipit-source-id: 749815ff7efbb17a1853381c42b5dc6b32d71919
Differential Revision: D13167073 fbshipit-source-id: 6749e2df85d64572b0d0e261b0beff0b206a52f9
Differential Revision: D13176477 fbshipit-source-id: 670a43fd691ef2840262bb0b839794278d3656d7
Summary: Pull Request resolved: pytorch/pytorch#14340 Pull Request resolved: pytorch#25 Per-group and per-channel quantization in fbgemm This diff also cleans up explicit template instantiation using macro expansion This diff also changes randFill interface which was easy to make mistakes of generating integer random numbers for floating point vectors. Using this in DNNLOWP operators will be done in a separate diff. Differential Revision: D13176386 fbshipit-source-id: e08c676b6b9cf301f76b87cdb901ecc51c4cc8a4
ba364c6
to
86109d3
Compare
facebook-github-bot
pushed a commit
to pytorch/pytorch
that referenced
this pull request
Nov 27, 2018
Summary: Pull Request resolved: #14340 Pull Request resolved: pytorch/FBGEMM#25 Per-group and per-channel quantization in fbgemm This diff also cleans up explicit template instantiation using macro expansion This diff also changes randFill interface which was easy to make mistakes of generating integer random numbers for floating point vectors. Using this in DNNLOWP operators will be done in a separate diff. Reviewed By: dskhudia Differential Revision: D13176386 fbshipit-source-id: e46c53e31e21520bded71b8ed86e8b19e010e2dd
pruthvistony
referenced
this pull request
in ROCm/FBGEMM
Apr 22, 2022
* Alinging with upstream with merge_pooled_embeddings_test.py and enabling cuda. * Disabling use_cpu in split_table_batched_embeddings_test since it's still unstable. Co-authored-by: root <root@ixt-rack-61.local.lan>
liligwu
referenced
this pull request
in ROCm/FBGEMM
May 2, 2022
* Alinging with upstream with merge_pooled_embeddings_test.py and enabling cuda. * Disabling use_cpu in split_table_batched_embeddings_test since it's still unstable. Co-authored-by: root <root@ixt-rack-61.local.lan>
liligwu
referenced
this pull request
in ROCm/FBGEMM
May 2, 2022
* Make WeightDecayMode consistent (pytorch#1063) Summary: Pull Request resolved: pytorch#1063 Currently in FE we define `L2=1` and `DECOUPLE=2` but in FBGEMM we use `L2=0` and `DECOUPLE=1` (https://fburl.com/code/65u4a608). While function-wise it is OK since the interface is converted, it may introduce unnecessary confusion on the numbering. Here we make them consistent acrossing FE/BE by using `L2=1` and `DECOUPLE=2` for both. Differential Revision: D35763365 fbshipit-source-id: c61041f38844b02fdecac0fb1182a3184711d3bd * Add default values for func args in FBGEMM codegen (pytorch#1066) Summary: Pull Request resolved: pytorch#1066 We enforce mandate default values for float/int function args (usually hyper-parameters for optimizers) when generating FBGEMM code using codegen. This makes backward compatibility easier as we can add more parameters without breaking compatibility. Note: developers need to be cautious when adding new args with default values. The behavior should remain the same with default values. If no default values are provided for float/int parameters, they'll be set to 0.0/0 by default. Reviewed By: jianyuh Differential Revision: D35795294 fbshipit-source-id: 2632e1452c164d2ae7f999e9b17033ea77fe3864 * Enabling cuda (#25) * Alinging with upstream with merge_pooled_embeddings_test.py and enabling cuda. * Disabling use_cpu in split_table_batched_embeddings_test since it's still unstable. Co-authored-by: root <root@ixt-rack-61.local.lan> * enable merge_pooled_embeddings in oss (pytorch#1064) Summary: Pull Request resolved: pytorch#1064 In inference OSS we need to build fbgemm from source and we need the `merge_pooled_embeddings` operator. This is not available in fbgemm oss because of this: https://www.internalfb.com/diff/D30037992 (pytorch@41ab9713cb1c083414bd9759ebb95d47609101b7)?dst_version_fbid=1066324687448445&transaction_fbid=198310519085547, a dependency on nvml.h. However, generally nvml.h is present on systems and can be located at: `${CUDA_TOOLKIT_ROOT_DIR}/lib64/stubs/libnvidia-ml.so`, as detailed here: https://tianyuliukingcrimson.wordpress.com/2018/07/23/findnvml-cmake-done-correctly-how-to-have-cmake-find-nvidia-management-library-nvml-on-windows-and-linux/. **However**, sometimes systems don't have it preinstalled with cuda for whatever reason, in which case you can get it by installing cudatoolkit-dev: `conda install -c conda-forge cudatoolkit-dev` (as i had to for my system) This changes the path that `libnvidia-ml.so` exists on, so we can give the option for people to specify where this library lives: `nvml_lib_path` post: https://fb.workplace.com/groups/2126278550786248/posts/5357069087707162 Reviewed By: jspark1105 Differential Revision: D35785768 fbshipit-source-id: a2cb10fb54d5d97cbb6ecadfbbcb0c37bce7043b * Add GLIBCXX_USE_CXX11_ABI compile option (pytorch#1073) Summary: Pull Request resolved: pytorch#1073 Reviewed By: s4ayub Differential Revision: D35682606 fbshipit-source-id: 58c78ec52a9b5caebbded97f836e658c59fb0d51 * Add even division checker for offsets in boundary checker (pytorch#1071) Summary: Pull Request resolved: pytorch#1071 As title. This might be helpful to detect and check the issues for s268163 Enforce the following check: 1. the size of offsets need to be exactly B * T + 1 2. the last element of offsets should be equal to indices.numel() 3. the max pooling size should be less than or equal to the indice weight size. Reviewed By: zhenqin95 Differential Revision: D35768276 fbshipit-source-id: d942dfc7b01bfdbcf5b3d3fb76a50f1abe2da325 * Make variable type consistent in CPU code (pytorch#1076) Summary: Pull Request resolved: pytorch#1076 Variable types got mixed up in code versions for CPU code. Here we clean it up and make variable types consistent. Reviewed By: shintaro-iwasaki Differential Revision: D35817968 fbshipit-source-id: 4de43cbac3388896d1ae81c2eafd0d154dda6fca * Follow up on throw errors directly on host code for CUDA bounds check op (pytorch#1075) Summary: Pull Request resolved: pytorch#1075 Follow up for D35768276 (pytorch@7be1fcb): throw errors directly on host code. Reviewed By: yinghai Differential Revision: D35905891 fbshipit-source-id: f97047ff9cb27f7f169dc0223fa0295cc14a8fe8 * Add dtype <-> SparseType conversion util function (pytorch#1057) Summary: Pull Request resolved: pytorch#1057 As title Reviewed By: geyyer Differential Revision: D35532366 fbshipit-source-id: 73891dd0eadcb0c79d6d0a06d7e0da911bd2519a * Implement kernel for counter based weight decay and learning rate adjustment in rowwise_adagrad (pytorch#1068) Summary: Pull Request resolved: pytorch#1068 Implemented the kernel for counter based weight decay and learning rate adjustment in rowwise_adagrad Reviewed By: csmiler Differential Revision: D35758762 fbshipit-source-id: 1953ca950c8ebd3f45c0e5c343a5c2214393b487 * add bf16 support in jagged tensor ops (pytorch#1079) Summary: Pull Request resolved: pytorch#1079 To support bf16 training Reviewed By: ajtulloch Differential Revision: D35955466 fbshipit-source-id: 0f740f29074576c026005362c78f872fec80bbcc * allow FP16-type grad_t (pytorch#1072) Summary: Pull Request resolved: pytorch#1072 This Diff partially revives D31432199 (pytorch@127f813), but only enables `grad_t = FP16` (no `BF16` support) to reduce the adverse side effect (e.g., the increase of binary size and compilation time). Specifically, D31432199 (pytorch@127f813) provides FP32, FP16, and BF16 for `grad_t`. This Diff removes BF16 options for `grad_t` (so only FP32 and FP16 for `grad_t`). Reviewed By: jianyuh Differential Revision: D35120293 fbshipit-source-id: b9a1d35f901b26277a220360a2a68583c65c8554 * use shfl_sync instead of __shfl_sync (pytorch#1080) Summary: Pull Request resolved: pytorch#1080 This patch replaces CUDA-specific `__shfl_sync` used in D35758762 (pytorch@dfb36cd) with `shfl_sync`, which is a wrapper that supports both NVIDIA and AMD GPUs (like D33231489 (pytorch@c6df576)). Reviewed By: dvksabin Differential Revision: D35980472 fbshipit-source-id: f77c9e9dce31d55e80a201f80f98e44bbe8dce9e * allow specify output_dtype for split no_bag embedding forward (pytorch#1067) Summary: "split_embedding_nobag_forward" did not accept "output_dtype" parameters when "{% if not dense and not nobag %}". So when user created "SplitTableBatchedEmbeddingBagsCodegen" with the "output_dtype" to some type needed, it is not passed into split_embedding_nobag_forward, so the real output data type is not aligned with output_dtype user specified. And also there is no warning or error happens as well. This PR added the "output_dtype" support for "split_embedding_nobag_forward". Pull Request resolved: pytorch#1067 Reviewed By: brad-mengchi, shintaro-iwasaki Differential Revision: D35866293 Pulled By: jianyuh fbshipit-source-id: 4cf95c649dcd25408668644788f3817561d35c20 * Fix the OSS nightly build; Release FBGEMM v0.1.0 for TorchRec OSS release (pytorch#1088) Summary: Pull Request resolved: pytorch#1088 The OSS nightly build for CPU and GPU is broken, due to the package name configuration conflicts between pyproject.toml and setup.py. This Diff removes pyproject.toml and only keep setup.py as the ground truth. Reviewed By: geyyer, brad-mengchi Differential Revision: D36040950 fbshipit-source-id: 2ca5a6f1da6cc4e8e1fecdf98c6ef6921cbce4ae * Fix the OSS CUDA GPG key CI test failure (pytorch#1089) Summary: Pull Request resolved: pytorch#1089 Check https://forums.developer.nvidia.com/t/notice-cuda-linux-repository-key-rotation/212772 This Diff fixes the OSS test failure in https://github.com/pytorch/FBGEMM/runs/6242695168?check_suite_focus=true Reviewed By: brad-mengchi Differential Revision: D36048995 fbshipit-source-id: 13fd7fc24c41f4042392849b22e29b8659b782b8 * Add permute_pooled_embedding_ops_split for cpu_only and gpu (pytorch#1082) Summary: Pull Request resolved: pytorch#1082 Following up on the post https://fb.workplace.com/groups/2126278550786248/permalink/5353232054757532/ Reviewed By: jianyuh Differential Revision: D35971699 fbshipit-source-id: a3c8a9d8ce453abb732bd0774cb4f95ef10240f9 * clean up output_dtype tensor allocation branches (pytorch#1086) Summary: Pull Request resolved: pytorch#1086 As title Reviewed By: brad-mengchi Differential Revision: D36018114 fbshipit-source-id: 9d8d6b5af53a4a75b917dee673629e6feeaa7ba3 * Fix build for embedding_inplace_update/embedding_inplace_update_cpu (pytorch#1081) Summary: Pull Request resolved: pytorch#1081 Reviewed By: jasonjk-park, jianyuh, houseroad Differential Revision: D35984814 fbshipit-source-id: 40a4d3dd5cfffb4240b517abb70e723abb396dff Co-authored-by: Wang Zhou <wangzhou@fb.com> Co-authored-by: root <root@ixt-rack-61.local.lan> Co-authored-by: Shabab Ayub <shababayub@fb.com> Co-authored-by: Jianyu Huang <jianyuhuang@fb.com> Co-authored-by: Sabin Devkota <devkotasabin@fb.com> Co-authored-by: Jongsoo Park <jongsoo@fb.com> Co-authored-by: Shintaro Iwasaki <siwasaki@fb.com> Co-authored-by: pengwa@microsoft.com <pengwa@microsoft.com> Co-authored-by: Rostyslav Geyyer <grostyslav@fb.com> Co-authored-by: Mengchi Zhang <mengchi@fb.com>
liligwu
added a commit
to liligwu/FBGEMM
that referenced
this pull request
Nov 30, 2022
…granu Enable arbitrary embedding dimensions for ROCm
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
Per-group and per-channel quantization in fbgemm
This diff also cleans up explicit template instantiation using macro expansion
Using this in DNNLOWP operators will be done in a separate diff.
Differential Revision: D13176386