Skip to content

Conversation

jspark1105
Copy link
Contributor

Summary:
Per-group and per-channel quantization in fbgemm
This diff also cleans up explicit template instantiation using macro expansion
Using this in DNNLOWP operators will be done in a separate diff.

Differential Revision: D13176386

jspark1105 added a commit to jspark1105/pytorch that referenced this pull request Nov 26, 2018
Summary:
Pull Request resolved: pytorch#14340

Pull Request resolved: pytorch/FBGEMM#25

Per-group and per-channel quantization in fbgemm
This diff also cleans up explicit template instantiation using macro expansion
This diff also changes randFill interface which was easy to make mistakes of generating integer random numbers for floating point vectors.

Using this in DNNLOWP operators will be done in a separate diff.

Differential Revision: D13176386

fbshipit-source-id: 3137039d2822e42a16881638d54897d9c8bc75f4
Differential Revision: D13166591

fbshipit-source-id: 749815ff7efbb17a1853381c42b5dc6b32d71919
Differential Revision: D13167073

fbshipit-source-id: 6749e2df85d64572b0d0e261b0beff0b206a52f9
Differential Revision: D13176477

fbshipit-source-id: 670a43fd691ef2840262bb0b839794278d3656d7
Summary:
Pull Request resolved: pytorch/pytorch#14340

Pull Request resolved: pytorch#25

Per-group and per-channel quantization in fbgemm
This diff also cleans up explicit template instantiation using macro expansion
This diff also changes randFill interface which was easy to make mistakes of generating integer random numbers for floating point vectors.

Using this in DNNLOWP operators will be done in a separate diff.

Differential Revision: D13176386

fbshipit-source-id: e08c676b6b9cf301f76b87cdb901ecc51c4cc8a4
facebook-github-bot pushed a commit to pytorch/pytorch that referenced this pull request Nov 27, 2018
Summary:
Pull Request resolved: #14340

Pull Request resolved: pytorch/FBGEMM#25

Per-group and per-channel quantization in fbgemm
This diff also cleans up explicit template instantiation using macro expansion
This diff also changes randFill interface which was easy to make mistakes of generating integer random numbers for floating point vectors.

Using this in DNNLOWP operators will be done in a separate diff.

Reviewed By: dskhudia

Differential Revision: D13176386

fbshipit-source-id: e46c53e31e21520bded71b8ed86e8b19e010e2dd
pruthvistony referenced this pull request in ROCm/FBGEMM Apr 22, 2022
* Alinging with upstream with merge_pooled_embeddings_test.py and enabling cuda.

* Disabling use_cpu in split_table_batched_embeddings_test since it's still unstable.

Co-authored-by: root <root@ixt-rack-61.local.lan>
liligwu referenced this pull request in ROCm/FBGEMM May 2, 2022
* Alinging with upstream with merge_pooled_embeddings_test.py and enabling cuda.

* Disabling use_cpu in split_table_batched_embeddings_test since it's still unstable.

Co-authored-by: root <root@ixt-rack-61.local.lan>
liligwu referenced this pull request in ROCm/FBGEMM May 2, 2022
* Make WeightDecayMode consistent (pytorch#1063)

Summary:
Pull Request resolved: pytorch#1063

Currently in FE we define `L2=1` and `DECOUPLE=2` but in FBGEMM we use `L2=0` and `DECOUPLE=1` (https://fburl.com/code/65u4a608). While function-wise it is OK since the interface is converted, it may introduce unnecessary confusion on the numbering. Here we make them consistent acrossing FE/BE by using `L2=1` and `DECOUPLE=2` for both.

Differential Revision: D35763365

fbshipit-source-id: c61041f38844b02fdecac0fb1182a3184711d3bd

* Add default values for func args in FBGEMM codegen (pytorch#1066)

Summary:
Pull Request resolved: pytorch#1066

We enforce mandate default values for float/int function args (usually hyper-parameters for optimizers) when generating FBGEMM code using codegen. This makes backward compatibility easier as we can add more parameters without breaking compatibility.

Note: developers need to be cautious when adding new args with default values. The behavior should remain the same with default values. If no default values are provided for float/int parameters, they'll be set to 0.0/0 by default.

Reviewed By: jianyuh

Differential Revision: D35795294

fbshipit-source-id: 2632e1452c164d2ae7f999e9b17033ea77fe3864

* Enabling cuda (#25)

* Alinging with upstream with merge_pooled_embeddings_test.py and enabling cuda.

* Disabling use_cpu in split_table_batched_embeddings_test since it's still unstable.

Co-authored-by: root <root@ixt-rack-61.local.lan>

* enable merge_pooled_embeddings in oss (pytorch#1064)

Summary:
Pull Request resolved: pytorch#1064

In inference OSS we need to build fbgemm from source and we need the `merge_pooled_embeddings` operator.

This is not available in fbgemm oss because of this: https://www.internalfb.com/diff/D30037992 (pytorch@41ab9713cb1c083414bd9759ebb95d47609101b7)?dst_version_fbid=1066324687448445&transaction_fbid=198310519085547, a dependency on nvml.h.

However, generally nvml.h is present on systems and can be located at: `${CUDA_TOOLKIT_ROOT_DIR}/lib64/stubs/libnvidia-ml.so`, as detailed here: https://tianyuliukingcrimson.wordpress.com/2018/07/23/findnvml-cmake-done-correctly-how-to-have-cmake-find-nvidia-management-library-nvml-on-windows-and-linux/.

**However**, sometimes systems don't have it preinstalled with cuda for whatever reason, in which case you can get it by installing cudatoolkit-dev:

`conda install -c conda-forge cudatoolkit-dev` (as i had to for my system)

This changes the path that `libnvidia-ml.so` exists on, so we can give the option for people to specify where this library lives: `nvml_lib_path`

post: https://fb.workplace.com/groups/2126278550786248/posts/5357069087707162

Reviewed By: jspark1105

Differential Revision: D35785768

fbshipit-source-id: a2cb10fb54d5d97cbb6ecadfbbcb0c37bce7043b

* Add GLIBCXX_USE_CXX11_ABI compile option (pytorch#1073)

Summary: Pull Request resolved: pytorch#1073

Reviewed By: s4ayub

Differential Revision: D35682606

fbshipit-source-id: 58c78ec52a9b5caebbded97f836e658c59fb0d51

* Add even division checker for offsets in boundary checker (pytorch#1071)

Summary:
Pull Request resolved: pytorch#1071

As title. This might be helpful to detect and check the issues for s268163

Enforce the following check:
1. the size of offsets need to be exactly B * T + 1
2. the last element of offsets should be equal to indices.numel()
3. the max pooling size should be less than or equal to the indice weight size.

Reviewed By: zhenqin95

Differential Revision: D35768276

fbshipit-source-id: d942dfc7b01bfdbcf5b3d3fb76a50f1abe2da325

* Make variable type consistent in CPU code (pytorch#1076)

Summary:
Pull Request resolved: pytorch#1076

Variable types got mixed up in code versions for CPU code. Here we clean it up and make variable types consistent.

Reviewed By: shintaro-iwasaki

Differential Revision: D35817968

fbshipit-source-id: 4de43cbac3388896d1ae81c2eafd0d154dda6fca

* Follow up on throw errors directly on host code for CUDA bounds check op (pytorch#1075)

Summary:
Pull Request resolved: pytorch#1075

Follow up for D35768276 (pytorch@7be1fcb): throw errors directly on host code.

Reviewed By: yinghai

Differential Revision: D35905891

fbshipit-source-id: f97047ff9cb27f7f169dc0223fa0295cc14a8fe8

* Add dtype <-> SparseType conversion util function (pytorch#1057)

Summary:
Pull Request resolved: pytorch#1057

As title

Reviewed By: geyyer

Differential Revision: D35532366

fbshipit-source-id: 73891dd0eadcb0c79d6d0a06d7e0da911bd2519a

* Implement kernel for counter based weight decay and learning rate adjustment in rowwise_adagrad (pytorch#1068)

Summary:
Pull Request resolved: pytorch#1068

Implemented the kernel for counter based weight decay and learning rate adjustment in rowwise_adagrad

Reviewed By: csmiler

Differential Revision: D35758762

fbshipit-source-id: 1953ca950c8ebd3f45c0e5c343a5c2214393b487

* add bf16 support in jagged tensor ops (pytorch#1079)

Summary:
Pull Request resolved: pytorch#1079

To support bf16 training

Reviewed By: ajtulloch

Differential Revision: D35955466

fbshipit-source-id: 0f740f29074576c026005362c78f872fec80bbcc

* allow FP16-type grad_t (pytorch#1072)

Summary:
Pull Request resolved: pytorch#1072

This Diff partially revives D31432199 (pytorch@127f813), but only enables `grad_t = FP16` (no `BF16` support) to reduce the adverse side effect (e.g., the increase of binary size and compilation time).

Specifically, D31432199 (pytorch@127f813) provides FP32, FP16, and BF16 for `grad_t`.
This Diff removes BF16 options for `grad_t` (so only FP32 and FP16 for `grad_t`).

Reviewed By: jianyuh

Differential Revision: D35120293

fbshipit-source-id: b9a1d35f901b26277a220360a2a68583c65c8554

* use shfl_sync instead of __shfl_sync (pytorch#1080)

Summary:
Pull Request resolved: pytorch#1080

This patch replaces CUDA-specific `__shfl_sync` used in D35758762 (pytorch@dfb36cd) with `shfl_sync`, which is a wrapper that supports both NVIDIA and AMD GPUs (like D33231489 (pytorch@c6df576)).

Reviewed By: dvksabin

Differential Revision: D35980472

fbshipit-source-id: f77c9e9dce31d55e80a201f80f98e44bbe8dce9e

* allow specify output_dtype for split no_bag embedding forward (pytorch#1067)

Summary:
"split_embedding_nobag_forward" did not accept "output_dtype" parameters when "{% if not dense and not nobag %}".

So when user created "SplitTableBatchedEmbeddingBagsCodegen" with the "output_dtype" to some type needed, it is not passed into split_embedding_nobag_forward, so the real output data type is not aligned with output_dtype user specified. And also there is no warning or error happens as well.

This PR added the "output_dtype" support for "split_embedding_nobag_forward".

Pull Request resolved: pytorch#1067

Reviewed By: brad-mengchi, shintaro-iwasaki

Differential Revision: D35866293

Pulled By: jianyuh

fbshipit-source-id: 4cf95c649dcd25408668644788f3817561d35c20

* Fix the OSS nightly build; Release FBGEMM v0.1.0 for TorchRec OSS release (pytorch#1088)

Summary:
Pull Request resolved: pytorch#1088

The OSS nightly build for CPU and GPU is broken, due to the package name configuration conflicts between pyproject.toml and setup.py. This Diff removes pyproject.toml and only keep setup.py as the ground truth.

Reviewed By: geyyer, brad-mengchi

Differential Revision: D36040950

fbshipit-source-id: 2ca5a6f1da6cc4e8e1fecdf98c6ef6921cbce4ae

* Fix the OSS CUDA GPG key CI test failure (pytorch#1089)

Summary:
Pull Request resolved: pytorch#1089

Check https://forums.developer.nvidia.com/t/notice-cuda-linux-repository-key-rotation/212772

This Diff fixes the OSS test failure in https://github.com/pytorch/FBGEMM/runs/6242695168?check_suite_focus=true

Reviewed By: brad-mengchi

Differential Revision: D36048995

fbshipit-source-id: 13fd7fc24c41f4042392849b22e29b8659b782b8

* Add permute_pooled_embedding_ops_split for cpu_only and gpu (pytorch#1082)

Summary:
Pull Request resolved: pytorch#1082

Following up on the post https://fb.workplace.com/groups/2126278550786248/permalink/5353232054757532/

Reviewed By: jianyuh

Differential Revision: D35971699

fbshipit-source-id: a3c8a9d8ce453abb732bd0774cb4f95ef10240f9

* clean up output_dtype tensor allocation branches (pytorch#1086)

Summary:
Pull Request resolved: pytorch#1086

As title

Reviewed By: brad-mengchi

Differential Revision: D36018114

fbshipit-source-id: 9d8d6b5af53a4a75b917dee673629e6feeaa7ba3

* Fix build for embedding_inplace_update/embedding_inplace_update_cpu (pytorch#1081)

Summary: Pull Request resolved: pytorch#1081

Reviewed By: jasonjk-park, jianyuh, houseroad

Differential Revision: D35984814

fbshipit-source-id: 40a4d3dd5cfffb4240b517abb70e723abb396dff

Co-authored-by: Wang Zhou <wangzhou@fb.com>
Co-authored-by: root <root@ixt-rack-61.local.lan>
Co-authored-by: Shabab Ayub <shababayub@fb.com>
Co-authored-by: Jianyu Huang <jianyuhuang@fb.com>
Co-authored-by: Sabin Devkota <devkotasabin@fb.com>
Co-authored-by: Jongsoo Park <jongsoo@fb.com>
Co-authored-by: Shintaro Iwasaki <siwasaki@fb.com>
Co-authored-by: pengwa@microsoft.com <pengwa@microsoft.com>
Co-authored-by: Rostyslav Geyyer <grostyslav@fb.com>
Co-authored-by: Mengchi Zhang <mengchi@fb.com>
liligwu added a commit to liligwu/FBGEMM that referenced this pull request Nov 30, 2022
…granu

Enable arbitrary embedding dimensions for ROCm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant