Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] add auto_step flag in WandbVisBackend #1072

Closed
wants to merge 9 commits into from

Conversation

i-aki-y
Copy link
Contributor

@i-aki-y i-aki-y commented Apr 12, 2023

Motivation

The wandb backend uses a mechanism of the auto-increment built in the wandb.log.
But this step information will be reset to zero after the resumption.
So when I resume the training, I get the following discontinuous graph.

resume_auto_step

This behavior is a little confusing, I think the graphs should be continuous before and after the resumption.

resume_manual_step

Modification

To fix this, I added an auto_step flag that allows the user to select the increment method.

  • auto_step == True: Keep the current behavior (Default)
  • auto_step == False: Use the step given by an argument. Note that other loggers also use this argument as the step.

BC-breaking (Optional)

No. Default auto_step is set to True which is consistent with the current behavior.

Use cases (Optional)

This makes wandb graph natural before and after the resumption (like the second image above).

Checklist

  1. Pre-commit or other linting tools are used to fix the potential lint issues.
  2. The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
  3. If the modification has potential influence on downstream projects, this PR should be tested with downstream projects, like MMDet or MMCls.
  4. The documentation has been modified accordingly, like docstring or example tutorials.

@CLAassistant
Copy link

CLAassistant commented Apr 12, 2023

CLA assistant check
All committers have signed the CLA.

Copy link
Collaborator

@HAOCHENYE HAOCHENYE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution. The overall modification LGTM 😄 .

mmengine/visualization/vis_backend.py Outdated Show resolved Hide resolved
mmengine/visualization/vis_backend.py Show resolved Hide resolved
i-aki-y and others added 2 commits April 12, 2023 19:31
Co-authored-by: Mashiro <57566630+HAOCHENYE@users.noreply.github.com>
Co-authored-by: Mashiro <57566630+HAOCHENYE@users.noreply.github.com>
@i-aki-y
Copy link
Contributor Author

i-aki-y commented Apr 12, 2023

@HAOCHENYE Thank you for your review.

New in version 0.7.3

Note:
If ``auto_step`` is False, ``log_metric_by_epoch`` in :cls:`mmengine.hook.LoggerHook` needs to be set to ``False`` too.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a line break here.

@i-aki-y i-aki-y changed the title [Feature] add auto_step flag in WandbVisBackend [WIP] [Feature] add auto_step flag in WandbVisBackend Apr 14, 2023
@i-aki-y
Copy link
Contributor Author

i-aki-y commented Apr 14, 2023

Sorry, I am reconsidering this PR because my first version is not user-friendly. In my first version, the auto_step interacts with LoggerHook's log_metric_by_epoch, but I think this interaction is non-trivial for many users. And if users misconfigure them, the wandb will ignore the validation metrics without any errors. This misconfiguration could often happen in a complicated system like mmdet and confuse users.

@zhouzaida
Copy link
Member

Sorry, I am reconsidering this PR because my first version is not user-friendly. In my first version, the auto_step interacts with LoggerHook's log_metric_by_epoch, but I think this interaction is non-trivial for many users. And if users misconfigure them, the wandb will ignore the validation metrics without any errors. This misconfiguration could often happen in a complicated system like mmdet and confuse users.

Hi @i-aki-y , the idea your proposed above looks good to me.

@i-aki-y
Copy link
Contributor Author

i-aki-y commented Apr 14, 2023

@zhouzaida @HAOCHENYE I revised the PR, I would appreciate it if you review it.

As I answered @zhouzaida's comment, I introduced the self._last_step (+ 1 is needed) to avoid an invalid step being passed to wandb.log. As a result, the implementation becomes a bit complicated, but I think this is worth doing. Now I could remove the note for the log_metric_by_epoch.

I still left the auto_step flag to keep the previous behavior. But switching the auto_flag changes only the unit of step (from the "count of wandb.log calls" to the runner.iter) and resumption behavior.
If such a change in the step's unit is acceptable, let me know, and I will remove the auto_flag to always use wandb.log(...,step=step).

@i-aki-y i-aki-y changed the title [WIP] [Feature] add auto_step flag in WandbVisBackend [Feature] add auto_step flag in WandbVisBackend Apr 14, 2023
@HAOCHENYE
Copy link
Collaborator

Hi! The modification makes sense to me. Refers to wandb/wandb#2642 and https://docs.wandb.ai/ref/python/run#define_metric, I found that WandB can support logging data on different x axis like that:

import wandb

# 1. Start a W&B run
wandb.init(project='test')

# 2. Save model inputs and hyperparameters
config = wandb.config
config.learning_rate = 0.01
wandb.define_metric('loss1', step_metric='iter', step_sync=False)
wandb.define_metric('loss2', step_metric='epoch', step_sync=False)
# Model training code here ...

# 3. Log metrics over time to visualize performance
for i in range(100):
    wandb.log({"loss1": i, 'iter': i})
    if i % 10 == 0:
        wandb.log({"loss2": i, 'epoch': i // 10})
        wandb.log({"loss3": i * 10, 'epoch': i // 10})

Will this make it possible to record epoch-based data?

@i-aki-y
Copy link
Contributor Author

i-aki-y commented Apr 22, 2023

@HAOCHENYE Thank you for pointing interesting feature.

It looks to work as expected!

Unfortunately, the current define_metric_cfg option of the WandbVisBackend only supports a case of define_metric(metric, summary=summary), but if we can fix this and makes include the "iter" (and “epoch" when by_epoch==True), to the tag of the log output, we can control the behavior of the wandb plot more flexibly.

@HAOCHENYE
Copy link
Collaborator

HAOCHENYE commented Apr 23, 2023

I think this kind of modification define_metric_cfg={'coco/bbox_mAP': 'max', step_metric: 'epoch'} is OK.

The epoch has been saved to tag here:

tag['epoch'] = cur_epoch

Maybe the iter could also be saved

@HAOCHENYE HAOCHENYE added this to the 0.7.4 milestone Apr 23, 2023
@i-aki-y
Copy link
Contributor Author

i-aki-y commented Apr 23, 2023

@HAOCHENYE thank you for the advice.

I think we should allow setting multiple define_metrics in the cfg.
Can I re-define the existing define_metric_cfg as a list of dicts from a dict, or should I introduce a new argument like define_metric_cfg"s”?

I think the following usage:

vis_backends = [
    dict(type='LocalVisBackend'),
    dict(
        type='WandbVisBackend',        
        init_kwargs={...},
        define_metric_cfgs=[
            dict(name="lr", step_metric='iter’),
            ...
            dict(name="coco/bbox_mAP", step_metric='epoch'),
            dict(name="coco/bbox_mAP_50", step_metric='epoch'),            
        ],
    )
]

@i-aki-y
Copy link
Contributor Author

i-aki-y commented Apr 23, 2023

@HAOCHENYE @zhouzaida Since the solution that uses define_metric is based on a different idea and the implementation is also different, I made a separate PR (#1099).

If the new solution is better I will close this issue.

@zhouzaida
Copy link
Member

@HAOCHENYE thank you for the advice.

I think we should allow setting multiple define_metrics in the cfg. Can I re-define the existing define_metric_cfg as a list of dicts from a dict, or should I introduce a new argument like define_metric_cfg"s”?

I think the following usage:

vis_backends = [
    dict(type='LocalVisBackend'),
    dict(
        type='WandbVisBackend',        
        init_kwargs={...},
        define_metric_cfgs=[
            dict(name="lr", step_metric='iter’),
            ...
            dict(name="coco/bbox_mAP", step_metric='epoch'),
            dict(name="coco/bbox_mAP_50", step_metric='epoch'),            
        ],
    )
]

Hi, we can re-define the existing define_metric_cfg which can be a dict or list of dict.

@zhouzaida
Copy link
Member

zhouzaida commented Apr 23, 2023

@HAOCHENYE @zhouzaida Since the solution that uses define_metric is based on a different idea and the implementation is also different, I made a separate PR (#1099).

If the new solution is better I will close this issue.

The solution in #1099 seems like a more elegant solution so we can close this PR. Thanks for your contribution very much.

@i-aki-y i-aki-y closed this Apr 26, 2023
@i-aki-y
Copy link
Contributor Author

i-aki-y commented Apr 26, 2023

I think PR (#1099) is a better solution, so I am closing this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants