[Feature] Support SigmoidFocalLoss with Cambricon MLU backend #1346

yushinliu · 2021-09-18T08:58:28Z

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

The motivation of the PR is to support compiling Cambricon Bang src code and support running SigmoidFocalLoss with Cambricon MLU backend.

It includes four parts:

Refactor setup.py to support compiling Cambricon Bang src code.
Refactor focal_loss.cpp and pytorch_cpp_helper.cpp to support dispatching to MLU.
Add focal_loss_sigmoid_mlu.cpp and focal_loss_sigmoid_internal.mlu cpp and bang src code.
Refactor test_focal_loss.py to support testing sigmoid_focal_loss with MLU backend.

Modification

Setup

Add MLUExtension and BuildExtension for compiling .cpp and .mlu code if Cambricon Neuware and MLU device are available.

Dispatch

Add dispatching to SigmoidFocalLossForwardMLUKernelLauncher in focal_loss.cpp.

MLU src code

Build a new directory mmcv/ops/csrc/pytorch/mlu for MLU src code repository.

BC-breaking (Optional)

Does the modification introduce changes that break the backward-compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

Please refer to mmcv/tests/test_ops/test_focal_loss.py.

Checklist

Before PR:

I have read and followed the workflow indicated in the CONTRIBUTING.md to create this PR.
Pre-commit or linting tools indicated in CONTRIBUTING.md are used to fix the potential lint issues.
Bug fixes are covered by unit tests, the case that causes the bug should be added in the unit tests.
New functionalities are covered by complete unit tests. If not, please add more unit test to ensure the correctness.
The documentation has been modified accordingly, including docstring or example tutorials.

After PR:

If the modification has potential influence on downstream or other related projects, this PR should be tested with some of those projects, like MMDet or MMCls.
CLA has been signed and all committers have signed the CLA in this PR.

CLAassistant · 2021-09-18T08:58:32Z

All committers have signed the CLA.

zhouzaida · 2021-09-20T08:32:20Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

liuyuxin@cambricon.com seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

hi @yushinliu , please sign the CLA

mmcv/ops/csrc/pytorch/focal_loss.cpp

mmcv/ops/csrc/common/pytorch_cpp_helper.hpp

mmcv/ops/csrc/pytorch/focal_loss.cpp

mmcv/ops/csrc/pytorch/mlu/focal_loss_sigmoid_mlu.cpp

codecov · 2021-09-28T18:29:35Z

Codecov Report

Merging #1346 (5b325be) into mlu (1216e5f) will decrease coverage by 0.54%.
The diff coverage is 0.00%.

❗ Current head 5b325be differs from pull request most recent head 2b4c227. Consider uploading reports for the commit 2b4c227 to get more accurate results

@@            Coverage Diff             @@
##              mlu    #1346      +/-   ##
==========================================
- Coverage   69.14%   68.59%   -0.55%     
==========================================
  Files         162      164       +2     
  Lines       10746    10891     +145     
  Branches     1978     1991      +13     
==========================================
+ Hits         7430     7471      +41     
- Misses       2927     3030     +103     
- Partials      389      390       +1

Flag	Coverage Δ
unittests	`68.59% <0.00%> (-0.55%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
mmcv/ops/focal_loss.py	`23.89% <0.00%> (ø)`
mmcv/ops/pixel_group.py	`72.72% <0.00%> (-27.28%)`	⬇️
mmcv/ops/contour_expand.py	`75.00% <0.00%> (-25.00%)`	⬇️
mmcv/ops/sync_bn.py	`15.49% <0.00%> (-4.15%)`	⬇️
mmcv/video/optflow.py	`96.07% <0.00%> (-1.60%)`	⬇️
mmcv/ops/__init__.py	`100.00% <0.00%> (ø)`
mmcv/utils/ext_loader.py	`35.89% <0.00%> (ø)`
mmcv/ops/correlation.py	`24.63% <0.00%> (ø)`
mmcv/ops/ball_query.py	`50.00% <0.00%> (ø)`
mmcv/runner/dist_utils.py	`51.06% <0.00%> (+1.06%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1216e5f...2b4c227. Read the comment docs.

grimoire · 2021-10-14T12:07:50Z

mmcv/ops/csrc/pytorch/mlu/focal_loss_sigmoid_backward_mlu.cpp

+    threshold_c = (nram_pingpong_size / nram_split_pingpong - target_data_bytes) /
+                  (nram_split_num * compute_data_bytes);
+  }
+  // deal_n * compute_c * nram_split_pingpong * compute_data_bytes * nram_split_num +


Why do we need this deal_n?

Deal_n is the value demonstrated max volumn of processed data for a round in MLU device.

grimoire

LGTM

tests/test_ops/test_focal_loss.py

grimoire

LGTM

ZwwWayne · 2021-11-05T02:52:42Z

mmcv/ops/csrc/common/pytorch_mlu_helper.hpp

+
+#define NFU_ALIGN_SIZE 128
+
+#define PAD_UP(x, y) (((x) / (y) + (int)((x) % (y) > 0)) * (y))


This seems to duplicate the PAD_UP in common_mlu_helper.hpp, can we just keep one of them?

Is '(x)' necessary? can we simply write it as ((x / y + (int)(x % y > 0)) * y)? If not, we may write comments to explain what you want to prevent so that readers and future developers could easily understand your motivation

common_mlu_helper.hpp keeps the functions only for device, and it will include more device-only functions in the future. Therefore, we have to keep the compilation isolation between common_mlu_helper and pytorch_mlu_helper.hpp.

the parentheses can not be eliminated, for example, if x = 6 +2, then the 2/y will caculated first.

* [Feature] Add roiaware pool3d ops from mmdet3d (#1382) * add ops (roiaware pool3d) in mmdet3d * refactor code * fix typo Co-authored-by: zhouzaida <zhouzaida@163.com> * [Feature] Add iou3d op from mmdet3d (#1356) * add ops (iou3d) in mmdet3d * add unit test * refactor code * refactor code * refactor code * refactor code * refactor code Co-authored-by: zhouzaida <zhouzaida@163.com> * [Fix] Update test data for test_iou3d (#1427) * Update test data for test_iou3d * delete blank lines Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> * [Feature] Add group points ops from mmdet3d (#1415) * add op (group points) and its related ops (ball query and knn) in mmdet3d * refactor code * fix typo * refactor code * fix typo * refactor code * make input contiguous Co-authored-by: zhouzaida <zhouzaida@163.com> * add mmdet3d op (#1425) Co-authored-by: zhouzaida <zhouzaida@163.com> * [Feature] Loading objects from different backends and dumping objects to different backends (#1330) * [Feature] Choose storage backend by the prefix of filepath * refactor FileClient and add unittest * support loading from different backends * polish docstring * fix unittet * rename attribute str_like_obj to is_str_like_obj * add infer_client method * add check_exist method * rename var client to file_client * polish docstring * add join_paths method * remove join_paths and add _format_path * enhance unittest * refactor unittest * singleton pattern * fix test_clientio.py * deprecate CephBackend * enhance docstring * refactor unittest for petrel * refactor unittest for disk backend * update io.md * add concat_paths method * improve docstring * improve docstring * add isdir and copyfile for file backend * delete copyfile and add get_local_path * remove isdir method of petrel * fix typo * add comment and polish docstring * polish docstring * rename _path_mapping to _map_path * polish docstring and fix typo * refactor get_local_path * add list_dir_or_file for FileClient * add list_dir_or_file for PetrelBackend * fix windows ci * Add return docstring * polish docstring * fix typo * fix typo * deprecate the conversion from Path to str * add docs for loading checkpoints with FileClient * refactor map_path * add _ensure_methods to ensure methods have been implemented * fix list_dir_or_file * rename _ensure_method_implemented to has_method * Add CI for pytorch 1.10 (#1431) * [Feature] Upload checkpoints and logs to ceph (#1375) * [Feature] Choose storage backend by the prefix of filepath * refactor FileClient and add unittest * support loading from different backends * polish docstring * fix unittet * rename attribute str_like_obj to is_str_like_obj * [Docs] Upload checkpoint to petrel oss * add infer_client method * Support uploading checkpoint to petrel oss * add check_exist method * refactor CheckpointHook * support uploading logs to ceph * rename var client to file_client * polish docstring * enhance load_from_ceph * refactor load_from_ceph * refactor TextLoggerHook * change the meaning of out_dir argument * fix test_checkpoint_hook.py * add join_paths method * remove join_paths and add _format_path * enhance unittest * refactor unittest * add a unittest for EvalHook when file backend is petrel * singleton pattern * fix test_clientio.py * deprecate CephBackend * add warning in load_from_ceph * fix type of out_suffix * enhance docstring * refactor unittest for petrel * refactor unittest for disk backend * update io.md * add concat_paths method * fix CI * mock check_exist * improve docstring * improve docstring * improve docstring * improve docstring * add isdir and copyfile for file backend * delete copyfile and add get_local_path * remove isdir method of petrel * fix typo * rename check_exists to exists * refactor code and polish docstring * fix windows ci * add comment and polish docstring * polish docstring * polish docstring * rename _path_mapping to _map_path * polish docstring and fix typo * refactor get_local_path * add list_dir_or_file for FileClient * add list_dir_or_file for PetrelBackend * fix windows ci * Add return docstring * polish docstring * fix typo * fix typo * fix typo * fix error when mocking PetrelBackend * deprecate the conversion from Path to str * add docs for loading checkpoints with FileClient * rename keep_log to keep_local * refactor map_path * add _ensure_methods to ensure methods have been implemented * fix list_dir_or_file * rename _ensure_method_implemented to has_method * refactor * polish information * format information * bump version to v1.3.16 (#1430) * [Fix]: Update test data of test_tin_shift (#1426) * Update test data of test_tin_shift * Delete tmp.engine * add pytest raises asserterror test * raise valueerror, update test log * add more comment * Apply suggestions from code review Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> * fix the wrong function reference bug in BaseTransformerLayer when batch_first is True (#1418) * [Docs] Add mmcv itself in the docs list (#1441) * Add mmcv itself in the docs list * modify link of docs * [Improve] improve checkpoint loading log (#1446) * [Feature] Support SigmoidFocalLoss with Cambricon MLU backend (#1346) * [Feature] Support SigmoidFocalLoss with Cambricon MLU backend * refactor MMCV_WITH_MLU macro define * refactor NFU_ALIGN_SIZE, PAD_DOWN and split_pipeline_num * delete extra fool proofing in cpp * [Feature] Support SigmoidFocalLossBackward with Cambricon MLU backend * fix macro definition in SigmoidFocalLoss * refactor mlu files into clang-format * refactor sigmoid focal loss test * refactor Sigmoid Focal Loss file structure. * fix python lint error * fix import torch_mlu error type * fix lint * refactor clang format style to google Co-authored-by: zhouzaida <zhouzaida@163.com> * [Feature] Support RoiAlign With Cambricon MLU Backend (#1429) * [Feature] Support NMS with cambricon MLU backend (#1467) * [Feature] Support BBoxOverlaps with cambricon MLU backend (#1507) * [Refactor] Format C++ code * [Refactor] include common_mlu_helper in pytorch_mlu_helper and refactor build condition * [Improve] Improve the performance of roialign, nms and focalloss with MLU backend (#1572) * [Improve] Improve the performance of roialign with MLU backend * replace CHECK_MLU with CHECK_MLU_INPUT * [Improve] Improve the perf of nms and focallosssigmoid with MLU backend * [Improve] Improve the performance of roialign with MLU backend (#1741) * [Feature] Support tin_shift with cambricon MLU backend (#1696) * [Feature] Support tin_shift with cambricon MLU backend * [fix] Add the assertion of batch_size in tin_shift.py * [fix] fix the param check of tin_shift in cambricon code * [fix] Fix lint failure. * [fix] Fix source file lint failure. * Update mmcv/ops/tin_shift.py [Refactor] Modify the code in mmcv/ops/tin_shift.py. Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: budefei <budefei@cambricon.com> Co-authored-by: budefei <budefei@cambricom.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> * resolve conflicts and fix lint * fix mmcv.utils.__init__ * fix mmcv.utils.__init__ * Fix lints and change FLAG * fix setup and refine * remove a redundant line * remove an unnecessary 'f' * fix compilation error Co-authored-by: dingchang <hudingchang.vendor@sensetime.com> Co-authored-by: zhouzaida <zhouzaida@163.com> Co-authored-by: q.yao <yaoqian@sensetime.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: pc <luopeichao@sensetime.com> Co-authored-by: Wenwei Zhang <40779233+ZwwWayne@users.noreply.github.com> Co-authored-by: q.yao <streetyao@live.com> Co-authored-by: Tong Gao <gaotongxiao@gmail.com> Co-authored-by: Yuxin Liu <liuyuxin@cambricon.com> Co-authored-by: zihanchang11 <92860914+zihanchang11@users.noreply.github.com> Co-authored-by: shlrao <shenglong.rao@gmail.com> Co-authored-by: zhouchenyang <zcy19950525@gmail.com> Co-authored-by: Mrxiaofei <36697723+Mrxiaofei@users.noreply.github.com> Co-authored-by: budefei <budefei@cambricon.com> Co-authored-by: budefei <budefei@cambricom.com>

[Feature] Support SigmoidFocalLoss with Cambricon MLU backend

6e63c4f

zhouzaida requested a review from grimoire September 18, 2021 09:30

zhouzaida mentioned this pull request Sep 20, 2021

MMCV support Cambricon MLU computing #1345

Closed

zhouzaida added the Waiting for signing CLA label Sep 21, 2021

grimoire reviewed Sep 22, 2021

View reviewed changes

mmcv/ops/csrc/pytorch/focal_loss.cpp Outdated Show resolved Hide resolved

grimoire reviewed Sep 22, 2021

View reviewed changes

mmcv/ops/csrc/common/pytorch_cpp_helper.hpp Show resolved Hide resolved

mmcv/ops/csrc/pytorch/focal_loss.cpp Show resolved Hide resolved

zhouzaida added awaiting response and removed Waiting for signing CLA labels Sep 23, 2021

refactor MMCV_WITH_MLU macro define

51eaf31

zhouzaida added MLU and removed awaiting response labels Sep 24, 2021

grimoire reviewed Sep 24, 2021

View reviewed changes

mmcv/ops/csrc/pytorch/mlu/focal_loss_sigmoid_mlu.cpp Outdated Show resolved Hide resolved

grimoire reviewed Sep 24, 2021

View reviewed changes

mmcv/ops/csrc/pytorch/mlu/focal_loss_sigmoid_mlu.cpp Outdated Show resolved Hide resolved

grimoire reviewed Sep 24, 2021

View reviewed changes

mmcv/ops/csrc/pytorch/mlu/focal_loss_sigmoid_mlu.cpp Outdated Show resolved Hide resolved

liuyuxin@cambricon.com added 2 commits September 28, 2021 11:13

refactor NFU_ALIGN_SIZE, PAD_DOWN and split_pipeline_num

5b325be

delete extra fool proofing in cpp

c323ebe

[Feature] Support SigmoidFocalLossBackward with Cambricon MLU backend

ac616c3

grimoire reviewed Oct 14, 2021

View reviewed changes

grimoire approved these changes Oct 15, 2021

View reviewed changes

fix macro definition in SigmoidFocalLoss

eafcf62

zhouzaida reviewed Oct 20, 2021

View reviewed changes

tests/test_ops/test_focal_loss.py Outdated Show resolved Hide resolved

yushinliu added 2 commits October 20, 2021 18:07

refactor mlu files into clang-format

2a8a5f0

refactor sigmoid focal loss test

22252df

zhouzaida approved these changes Oct 20, 2021

View reviewed changes

grimoire approved these changes Oct 20, 2021

View reviewed changes

zcyKTH mentioned this pull request Oct 21, 2021

[Feature] Support BBoxOverlaps with cambricon MLU backend #1421

Closed

7 tasks

zhouzaida changed the base branch from master to mlu October 21, 2021 14:58

ChaojifeixiaDazhuang mentioned this pull request Oct 27, 2021

[Feature] Support NMS with cambricon MLU backend #1437

Closed

7 tasks

refactor Sigmoid Focal Loss file structure.

ec3cd11

zhouzaida approved these changes Nov 2, 2021

View reviewed changes

grimoire approved these changes Nov 2, 2021

View reviewed changes

zhouzaida requested a review from ZwwWayne November 2, 2021 13:57

yushinliu and others added 4 commits November 3, 2021 18:51

fix python lint error

3f8a09e

fix import torch_mlu error type

78a56e3

fix lint

ccd7f32

refactor clang format style to google

2b4c227

ZwwWayne reviewed Nov 5, 2021

View reviewed changes

ChaojifeixiaDazhuang mentioned this pull request Nov 8, 2021

[Feature] Support NMS with cambricon MLU backend #1467

Merged

7 tasks

zhouzaida merged commit 682dba5 into open-mmlab:mlu Nov 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support SigmoidFocalLoss with Cambricon MLU backend #1346

[Feature] Support SigmoidFocalLoss with Cambricon MLU backend #1346

yushinliu commented Sep 18, 2021 •

edited

Loading

CLAassistant commented Sep 18, 2021 •

edited

Loading

zhouzaida commented Sep 20, 2021

codecov bot commented Sep 28, 2021 •

edited

Loading

grimoire Oct 14, 2021

yushinliu Oct 15, 2021 •

edited

Loading

grimoire left a comment

grimoire left a comment

ZwwWayne Nov 5, 2021 •

edited

Loading

yushinliu Nov 8, 2021 •

edited

Loading


		#define NFU_ALIGN_SIZE 128

		#define PAD_UP(x, y) (((x) / (y) + (int)((x) % (y) > 0)) * (y))

[Feature] Support SigmoidFocalLoss with Cambricon MLU backend #1346

[Feature] Support SigmoidFocalLoss with Cambricon MLU backend #1346

Conversation

yushinliu commented Sep 18, 2021 • edited Loading

Motivation

Modification

BC-breaking (Optional)

Use cases (Optional)

Checklist

CLAassistant commented Sep 18, 2021 • edited Loading

zhouzaida commented Sep 20, 2021

codecov bot commented Sep 28, 2021 • edited Loading

Codecov Report

grimoire Oct 14, 2021

Choose a reason for hiding this comment

yushinliu Oct 15, 2021 • edited Loading

Choose a reason for hiding this comment

grimoire left a comment

Choose a reason for hiding this comment

grimoire left a comment

Choose a reason for hiding this comment

ZwwWayne Nov 5, 2021 • edited Loading

Choose a reason for hiding this comment

yushinliu Nov 8, 2021 • edited Loading

Choose a reason for hiding this comment

yushinliu commented Sep 18, 2021 •

edited

Loading

CLAassistant commented Sep 18, 2021 •

edited

Loading

codecov bot commented Sep 28, 2021 •

edited

Loading

yushinliu Oct 15, 2021 •

edited

Loading

ZwwWayne Nov 5, 2021 •

edited

Loading

yushinliu Nov 8, 2021 •

edited

Loading