[Feature] add auto_step flag in WandbVisBackend #1072

i-aki-y · 2023-04-12T06:32:54Z

Motivation

The wandb backend uses a mechanism of the auto-increment built in the wandb.log.
But this step information will be reset to zero after the resumption.
So when I resume the training, I get the following discontinuous graph.

This behavior is a little confusing, I think the graphs should be continuous before and after the resumption.

Modification

To fix this, I added an auto_step flag that allows the user to select the increment method.

auto_step == True: Keep the current behavior (Default)
auto_step == False: Use the step given by an argument. Note that other loggers also use this argument as the step.

BC-breaking (Optional)

No. Default auto_step is set to True which is consistent with the current behavior.

Use cases (Optional)

This makes wandb graph natural before and after the resumption (like the second image above).

Checklist

Pre-commit or other linting tools are used to fix the potential lint issues.
The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
If the modification has potential influence on downstream projects, this PR should be tested with downstream projects, like MMDet or MMCls.
The documentation has been modified accordingly, like docstring or example tutorials.

CLAassistant · 2023-04-12T06:33:00Z

All committers have signed the CLA.

HAOCHENYE

Thanks for your contribution. The overall modification LGTM 😄 .

mmengine/visualization/vis_backend.py

Co-authored-by: Mashiro <57566630+HAOCHENYE@users.noreply.github.com>

i-aki-y · 2023-04-12T10:32:06Z

@HAOCHENYE Thank you for your review.

HAOCHENYE · 2023-04-12T11:04:57Z

mmengine/visualization/vis_backend.py

+            New in version 0.7.3
+
+    Note:
+        If ``auto_step`` is False, ``log_metric_by_epoch`` in :cls:`mmengine.hook.LoggerHook` needs to be set to ``False`` too.


Needs a line break here.

mmengine/visualization/vis_backend.py

i-aki-y · 2023-04-14T04:54:43Z

Sorry, I am reconsidering this PR because my first version is not user-friendly. In my first version, the auto_step interacts with LoggerHook's log_metric_by_epoch, but I think this interaction is non-trivial for many users. And if users misconfigure them, the wandb will ignore the validation metrics without any errors. This misconfiguration could often happen in a complicated system like mmdet and confuse users.

zhouzaida · 2023-04-14T11:36:49Z

Sorry, I am reconsidering this PR because my first version is not user-friendly. In my first version, the auto_step interacts with LoggerHook's log_metric_by_epoch, but I think this interaction is non-trivial for many users. And if users misconfigure them, the wandb will ignore the validation metrics without any errors. This misconfiguration could often happen in a complicated system like mmdet and confuse users.

Hi @i-aki-y , the idea your proposed above looks good to me.

i-aki-y · 2023-04-14T12:00:42Z

@zhouzaida @HAOCHENYE I revised the PR, I would appreciate it if you review it.

As I answered @zhouzaida's comment, I introduced the self._last_step (+ 1 is needed) to avoid an invalid step being passed to wandb.log. As a result, the implementation becomes a bit complicated, but I think this is worth doing. Now I could remove the note for the log_metric_by_epoch.

I still left the auto_step flag to keep the previous behavior. But switching the auto_flag changes only the unit of step (from the "count of wandb.log calls" to the runner.iter) and resumption behavior.
If such a change in the step's unit is acceptable, let me know, and I will remove the auto_flag to always use wandb.log(...,step=step).

HAOCHENYE · 2023-04-17T06:13:56Z

Hi! The modification makes sense to me. Refers to wandb/wandb#2642 and https://docs.wandb.ai/ref/python/run#define_metric, I found that WandB can support logging data on different x axis like that:

import wandb

# 1. Start a W&B run
wandb.init(project='test')

# 2. Save model inputs and hyperparameters
config = wandb.config
config.learning_rate = 0.01
wandb.define_metric('loss1', step_metric='iter', step_sync=False)
wandb.define_metric('loss2', step_metric='epoch', step_sync=False)
# Model training code here ...

# 3. Log metrics over time to visualize performance
for i in range(100):
    wandb.log({"loss1": i, 'iter': i})
    if i % 10 == 0:
        wandb.log({"loss2": i, 'epoch': i // 10})
        wandb.log({"loss3": i * 10, 'epoch': i // 10})

Will this make it possible to record epoch-based data?

i-aki-y · 2023-04-22T07:40:34Z

@HAOCHENYE Thank you for pointing interesting feature.

It looks to work as expected!

Unfortunately, the current define_metric_cfg option of the WandbVisBackend only supports a case of define_metric(metric, summary=summary), but if we can fix this and makes include the "iter" (and “epoch" when by_epoch==True), to the tag of the log output, we can control the behavior of the wandb plot more flexibly.

HAOCHENYE · 2023-04-23T03:43:55Z

I think this kind of modification define_metric_cfg={'coco/bbox_mAP': 'max', step_metric: 'epoch'} is OK.

The epoch has been saved to tag here:

mmengine/mmengine/runner/log_processor.py

Line 182 in be347df

tag['epoch'] = cur_epoch

Maybe the iter could also be saved

i-aki-y · 2023-04-23T06:46:32Z

@HAOCHENYE thank you for the advice.

I think we should allow setting multiple define_metrics in the cfg.
Can I re-define the existing define_metric_cfg as a list of dicts from a dict, or should I introduce a new argument like define_metric_cfg"s”?

I think the following usage:

vis_backends = [
    dict(type='LocalVisBackend'),
    dict(
        type='WandbVisBackend',        
        init_kwargs={...},
        define_metric_cfgs=[
            dict(name="lr", step_metric='iter’),
            ...
            dict(name="coco/bbox_mAP", step_metric='epoch'),
            dict(name="coco/bbox_mAP_50", step_metric='epoch'),            
        ],
    )
]

i-aki-y · 2023-04-23T14:51:18Z

@HAOCHENYE @zhouzaida Since the solution that uses define_metric is based on a different idea and the implementation is also different, I made a separate PR (#1099).

If the new solution is better I will close this issue.

zhouzaida · 2023-04-23T16:29:41Z

@HAOCHENYE thank you for the advice.

I think we should allow setting multiple define_metrics in the cfg. Can I re-define the existing define_metric_cfg as a list of dicts from a dict, or should I introduce a new argument like define_metric_cfg"s”?

I think the following usage:
vis_backends = [
    dict(type='LocalVisBackend'),
    dict(
        type='WandbVisBackend',        
        init_kwargs={...},
        define_metric_cfgs=[
            dict(name="lr", step_metric='iter’),
            ...
            dict(name="coco/bbox_mAP", step_metric='epoch'),
            dict(name="coco/bbox_mAP_50", step_metric='epoch'),            
        ],
    )
]

Hi, we can re-define the existing define_metric_cfg which can be a dict or list of dict.

zhouzaida · 2023-04-23T16:30:39Z

@HAOCHENYE @zhouzaida Since the solution that uses define_metric is based on a different idea and the implementation is also different, I made a separate PR (#1099).

If the new solution is better I will close this issue.

The solution in #1099 seems like a more elegant solution so we can close this PR. Thanks for your contribution very much.

i-aki-y · 2023-04-26T04:17:17Z

I think PR (#1099) is a better solution, so I am closing this PR

Feature: add auto_step flag

ea9751a

i-aki-y requested review from zhouzaida, HAOCHENYE and Harold-lkk as code owners April 12, 2023 06:32

i-aki-y added 3 commits April 12, 2023 15:43

fix lint error

b27f207

fix doc

91505a5

fix test and doc

09f21a4

HAOCHENYE reviewed Apr 12, 2023

View reviewed changes

mmengine/visualization/vis_backend.py Outdated Show resolved Hide resolved

mmengine/visualization/vis_backend.py Show resolved Hide resolved

i-aki-y and others added 2 commits April 12, 2023 19:31

Update mmengine/visualization/vis_backend.py

ec7cbc5

Co-authored-by: Mashiro <57566630+HAOCHENYE@users.noreply.github.com>

Update mmengine/visualization/vis_backend.py

208be15

Co-authored-by: Mashiro <57566630+HAOCHENYE@users.noreply.github.com>

HAOCHENYE reviewed Apr 12, 2023

View reviewed changes

i-aki-y added 2 commits April 12, 2023 20:11

Merge branch 'open-mmlab:main' into add-auto-step-flag

33e158a

fix doc format

280932b

HAOCHENYE added the awating-CI-pass label Apr 12, 2023

zhouzaida reviewed Apr 12, 2023

View reviewed changes

mmengine/visualization/vis_backend.py Outdated Show resolved Hide resolved

i-aki-y changed the title ~~[Feature] add auto_step flag in WandbVisBackend~~ [WIP] [Feature] add auto_step flag in WandbVisBackend Apr 14, 2023

add self._last_step

0bfd58c

i-aki-y changed the title ~~[WIP] [Feature] add auto_step flag in WandbVisBackend~~ [Feature] add auto_step flag in WandbVisBackend Apr 14, 2023

HAOCHENYE added this to the 0.7.4 milestone Apr 23, 2023

i-aki-y mentioned this pull request Apr 23, 2023

[Feature] Add support for full wandb's define_metric arguments #1099

Merged

i-aki-y closed this Apr 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] add auto_step flag in WandbVisBackend #1072

[Feature] add auto_step flag in WandbVisBackend #1072

i-aki-y commented Apr 12, 2023

CLAassistant commented Apr 12, 2023 •

edited

HAOCHENYE left a comment

i-aki-y commented Apr 12, 2023

HAOCHENYE Apr 12, 2023

i-aki-y commented Apr 14, 2023 •

edited

zhouzaida commented Apr 14, 2023

i-aki-y commented Apr 14, 2023

HAOCHENYE commented Apr 17, 2023

i-aki-y commented Apr 22, 2023

HAOCHENYE commented Apr 23, 2023 •

edited

i-aki-y commented Apr 23, 2023

i-aki-y commented Apr 23, 2023

zhouzaida commented Apr 23, 2023

zhouzaida commented Apr 23, 2023 •

edited

i-aki-y commented Apr 26, 2023

[Feature] add auto_step flag in WandbVisBackend #1072

[Feature] add auto_step flag in WandbVisBackend #1072

Conversation

i-aki-y commented Apr 12, 2023

Motivation

Modification

BC-breaking (Optional)

Use cases (Optional)

Checklist

CLAassistant commented Apr 12, 2023 • edited

HAOCHENYE left a comment

Choose a reason for hiding this comment

i-aki-y commented Apr 12, 2023

HAOCHENYE Apr 12, 2023

Choose a reason for hiding this comment

i-aki-y commented Apr 14, 2023 • edited

zhouzaida commented Apr 14, 2023

i-aki-y commented Apr 14, 2023

HAOCHENYE commented Apr 17, 2023

i-aki-y commented Apr 22, 2023

HAOCHENYE commented Apr 23, 2023 • edited

i-aki-y commented Apr 23, 2023

i-aki-y commented Apr 23, 2023

zhouzaida commented Apr 23, 2023

zhouzaida commented Apr 23, 2023 • edited

i-aki-y commented Apr 26, 2023

CLAassistant commented Apr 12, 2023 •

edited

i-aki-y commented Apr 14, 2023 •

edited

HAOCHENYE commented Apr 23, 2023 •

edited

zhouzaida commented Apr 23, 2023 •

edited