[MRCNN] allow lr 0.01 for when gbs < 16 #237

yuanzhedong · 2022-05-19T01:41:57Z

Together with this PR to change the training policy for small batch size for mrcnn

github-actions · 2022-05-19T01:42:18Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

emizan76 · 2022-05-19T04:36:53Z

How did you test that? At his point we need to make sure nothing breaks. Please attach test logs.

yuanzhedong · 2022-05-19T05:38:30Z

I tested in locally with three log files and three different learning rates 0.01, 0.12 and 0.24, with this PR all will pass, and without this pr the 0.01 one will fail, also showed the results the @johntran-nv and @shangw-nvidia

success output:

INFO - Running compliance on file: /home/yudong/github/logs/mrcnn_48.log
INFO -  Compliance checks: training_2.0.0/closed_maskrcnn.yaml
** Logging output also at compliance_checker.log
INFO - SUCCESS

Failed output:

INFO -  Compliance checks: training_2.0.0/closed_maskrcnn.yaml
WARNING -  CHECK for 'opt_base_learning_rate' failed in line 41:
:::MLLOG {"namespace": "", "time_ms": 1652781407082, "event_type": "POINT_IN_TIME", "key": "opt_base_learning_rate", "value": 0.01, "metadata": {"file": "maskrcnn/tools/train_mlperf.py", "lineno": 491}}
failed test:  is_integer(v['value'] / 0.02)

yuanzhedong · 2022-05-19T15:53:53Z

Just tried lr 0.03 as @shangw-nvidia suggested and it will fail too.

failed test:  is_integer(v['value'] / 0.02)
current context[s]={
  "initialized_tensors": [
    "FPN_inner_block1",
    "FPN_layer_block1",
    "FPN_inner_block2",
    "FPN_layer_block2",
    "FPN_inner_block3",
    "FPN_layer_block3",
    "FPN_inner_block4",
    "FPN_layer_block4",
    "RPNHead_conv",
    "RPNHead_cls",
    "RPNHead_bbox",
    "ROI_BOX_FEATURE_EXTRACTOR_fc6",
    "ROI_BOX_FEATURE_EXTRACTOR_fc7",
    "ROI_BOX_PREDICTOR_cls",
    "ROI_BOX_PREDICTOR_bbox",
    "ROI_MASK_FEATURE_EXTRACTOR_fcn1",
    "ROI_MASK_FEATURE_EXTRACTOR_fcn2",
    "ROI_MASK_FEATURE_EXTRACTOR_fcn3",
    "ROI_MASK_FEATURE_EXTRACTOR_fcn4",
    "ROI_MASK_PREDICTOR_fcn5",
    "ROI_MASK_PREDICTOR_fcn_logits"
  ]
}
current line[v]={
  "metadata": {
    "file": "maskrcnn/tools/train_mlperf.py",
    "lineno": 491
  },
  "value": 0.03
}
ERROR - FAILED
** Logging output also at compliance_checker.log

johntran-nv · 2022-05-19T16:13:28Z

Training WG approved. @emizan76 can you approve and merge, please?

[MRCNN] allow lr 0.01 for when gbs < 16

f89c380

emizan76 self-requested a review May 19, 2022 04:35

shangw-nvidia approved these changes May 19, 2022

View reviewed changes

shangw-nvidia merged commit 065e4ce into mlcommons:master May 19, 2022

github-actions bot locked and limited conversation to collaborators May 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MRCNN] allow lr 0.01 for when gbs < 16 #237

[MRCNN] allow lr 0.01 for when gbs < 16 #237

Uh oh!

yuanzhedong commented May 19, 2022 •

edited

Loading

Uh oh!

github-actions bot commented May 19, 2022

Uh oh!

emizan76 commented May 19, 2022

Uh oh!

yuanzhedong commented May 19, 2022 •

edited

Loading

Uh oh!

yuanzhedong commented May 19, 2022

Uh oh!

johntran-nv commented May 19, 2022

Uh oh!

Uh oh!

[MRCNN] allow lr 0.01 for when gbs < 16 #237

[MRCNN] allow lr 0.01 for when gbs < 16 #237

Uh oh!

Conversation

yuanzhedong commented May 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 19, 2022

Uh oh!

emizan76 commented May 19, 2022

Uh oh!

yuanzhedong commented May 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuanzhedong commented May 19, 2022

Uh oh!

johntran-nv commented May 19, 2022

Uh oh!

Uh oh!

yuanzhedong commented May 19, 2022 •

edited

Loading

yuanzhedong commented May 19, 2022 •

edited

Loading