Skip to content

Conversation

yuanzhedong
Copy link
Contributor

@yuanzhedong yuanzhedong commented May 19, 2022

Together with this PR to change the training policy for small batch size for mrcnn

@github-actions
Copy link

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@emizan76 emizan76 self-requested a review May 19, 2022 04:35
@emizan76
Copy link
Contributor

How did you test that? At his point we need to make sure nothing breaks. Please attach test logs.

@yuanzhedong
Copy link
Contributor Author

yuanzhedong commented May 19, 2022

I tested in locally with three log files and three different learning rates 0.01, 0.12 and 0.24, with this PR all will pass, and without this pr the 0.01 one will fail, also showed the results the @johntran-nv and @shangw-nvidia

success output:

INFO - Running compliance on file: /home/yudong/github/logs/mrcnn_48.log
INFO -  Compliance checks: training_2.0.0/closed_maskrcnn.yaml
** Logging output also at compliance_checker.log
INFO - SUCCESS

Failed output:

INFO -  Compliance checks: training_2.0.0/closed_maskrcnn.yaml
WARNING -  CHECK for 'opt_base_learning_rate' failed in line 41:
:::MLLOG {"namespace": "", "time_ms": 1652781407082, "event_type": "POINT_IN_TIME", "key": "opt_base_learning_rate", "value": 0.01, "metadata": {"file": "maskrcnn/tools/train_mlperf.py", "lineno": 491}}
failed test:  is_integer(v['value'] / 0.02)

@yuanzhedong
Copy link
Contributor Author

Just tried lr 0.03 as @shangw-nvidia suggested and it will fail too.

failed test:  is_integer(v['value'] / 0.02)
current context[s]={
  "initialized_tensors": [
    "FPN_inner_block1",
    "FPN_layer_block1",
    "FPN_inner_block2",
    "FPN_layer_block2",
    "FPN_inner_block3",
    "FPN_layer_block3",
    "FPN_inner_block4",
    "FPN_layer_block4",
    "RPNHead_conv",
    "RPNHead_cls",
    "RPNHead_bbox",
    "ROI_BOX_FEATURE_EXTRACTOR_fc6",
    "ROI_BOX_FEATURE_EXTRACTOR_fc7",
    "ROI_BOX_PREDICTOR_cls",
    "ROI_BOX_PREDICTOR_bbox",
    "ROI_MASK_FEATURE_EXTRACTOR_fcn1",
    "ROI_MASK_FEATURE_EXTRACTOR_fcn2",
    "ROI_MASK_FEATURE_EXTRACTOR_fcn3",
    "ROI_MASK_FEATURE_EXTRACTOR_fcn4",
    "ROI_MASK_PREDICTOR_fcn5",
    "ROI_MASK_PREDICTOR_fcn_logits"
  ]
}
current line[v]={
  "metadata": {
    "file": "maskrcnn/tools/train_mlperf.py",
    "lineno": 491
  },
  "value": 0.03
}
ERROR - FAILED
** Logging output also at compliance_checker.log

@johntran-nv
Copy link

Training WG approved. @emizan76 can you approve and merge, please?

@shangw-nvidia shangw-nvidia merged commit 065e4ce into mlcommons:master May 19, 2022
@github-actions github-actions bot locked and limited conversation to collaborators May 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants