Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to use logger #715

Closed
5 of 11 tasks
Tracked by #548
zhixiongzh opened this issue Aug 25, 2023 · 2 comments
Closed
5 of 11 tasks
Tracked by #548

how to use logger #715

zhixiongzh opened this issue Aug 25, 2023 · 2 comments
Labels
good first issue Good for newcomers

Comments

@zhixiongzh
Copy link

zhixiongzh commented Aug 25, 2023

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • system worker bug
    • system utils bug
    • code design/refactor
    • documentation request
    • new feature request
  • I have visited the readme and doc
  • I have searched through the issue tracker and pr tracker
  • I have mentioned version numbers, operating system and environment, where applicable:
    import ding, torch, sys
    print(ding.__version__, torch.__version__, sys.version, sys.platform)
    v0.4.9 2.0.1 3.10.11 (main, Apr 20 2023, 19:02:41) [GCC 11.2.0] linux

I am running the code with ding.bonus.ppof, I do not want to use wandb_online_logger but just a tensorboard, so I replace the wandb_online_logger with online_logger(train_show_freq=1000) like following

        with task.start(ctx=OnlineRLContext()):
            task.use(interaction_evaluator_ttorch(self.seed, self.policy, evaluator_env))
            task.use(PPOFStepCollector(self.seed, self.policy, collector_env, self.cfg.n_sample))
            task.use(ppof_adv_estimator(self.policy))
            task.use(multistep_trainer(self.policy, log_freq=n_iter_log_show))
            task.use(CkptSaver(self.policy, save_dir=self.exp_name, train_freq=n_iter_save_ckpt))
            task.use(online_logger(train_show_freq=1000)
                # wandb_online_logger(
                #     metric_list=self.policy.monitor_vars(),
                #     model=self.policy._model,
                #     anonymous=True,
                #     project_name=self.exp_name
                # )
            )
            task.use(termination_checker(max_env_step=step))
            task.run()

Then I got such error

  File "/opt/conda/lib/python3.10/site-packages/ding/framework/middleware/functional/logger.py", line 68, in _logger
    writer.add_scalar('basic/eval_episode_return_mean', ctx.eval_value, ctx.env_step)
AttributeError: 'NoneType' object has no attribute 'add_scalar'

It seems that I am using online_logger in the wrong way. Any example on how to use the online_logger from ding.framework.middleware? BTW, online_logger correspond to online RL algorithm?

@zhixiongzh
Copy link
Author

zhixiongzh commented Aug 25, 2023

I found the reason.
Before using online_logger, I have to call ding_init(cfg), but this function only has one function for now, which is DistributedWriter.get_instance(cfg.exp_name), this is a little bit too complex, because when I want to use logger, I have to know that I have to call DistributedWriter.get_instance(cfg.exp_name) beforehand, which is not written anywhere in the document. Why not just adding another argument exp_name in the function of online_logger such that

def online_logger(record_train_iter: bool = False, train_show_freq: int = 100, exp_name: str = None) -> Callable:
    """
    Create an online logger for recording training and evaluation metrics.
    
    Arguments:
        - record_train_iter (bool): Whether to record training iteration. Default is False.
        - train_show_freq (int): Frequency of showing training logs. Default is 100.
        - exp_name (str): Experiment name, should not be None.
        
    Returns:
        - _logger (Callable): A logger function that takes an OnlineRLContext object as input.
        
    Raises:
        - ValueError: If exp_name is None.
        
    Example:
        task.use(online_logger(record_train_iter=False, train_show_freq=1000, exp_name=cfg.exp_name))
    """
    if task.router.is_active and not task.has_role(task.role.LEARNER):
        return task.void()
    if exp_name is None:
        raise ValueError("exp_name cannot be None")
    writer = DistributedWriter.get_instance(exp_name)
    last_train_show_iter = -1

    def _logger(ctx: "OnlineRLContext"):
        # ... (rest of the code)
        
    return _logger

In this case, it is clear for everyone that they need to pass a exp_name for logger, and this is necessary. Maybe chatGPT can help to write the describtion of every function and provide a example how to use it, since the doc does not cover all use of function yet.

@PaParaZz1 PaParaZz1 added the good first issue Good for newcomers label Aug 27, 2023
@PaParaZz1
Copy link
Member

Thanks for you feedback, we will add some hints when call online_logger middleware with NoneType problem.
For DistributedWriter, we want to implement this module with singleton pattern, so it must be initialized at the beginning of the whole training program (e.g. ding_init function). We will add more comments and documents to indicate the necessary information here.

BTW, it is good practice to learn function usage through unittests, such this file for online_logger.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants