Should `TensorboardWriter` close its `tf.summary.FileWriter`? #855

shwang · 2020-05-14T04:29:08Z

PPO2 uses a with TensorboardWriter(...) as writer: context that flushes but doesn't ever close its tf.summary.FileWriter. This led to (in combination with another problem on my side) a "too many files are opened by this process" error in one of my runs when I called PPO2.learn() repeatedly.

Maybe the intention here is to allow us to access the same FileWriter later, but a second call to PPO2.learn() in facts opens a new events file and creates a new FileWriter, which again is not closed by the time that learn exits.

Relevant lines in TensorboardWriter:

stable-baselines/stable_baselines/common/base_class.py

Lines 1137 to 1145 in 6347da3

    
           def __enter__(self): 
        
               if self.tensorboard_log_path is not None: 
        
                   latest_run_id = self._get_latest_run_id() 
        
                   if self.new_tb_log: 
        
                       latest_run_id = latest_run_id + 1 
        
                   save_path = os.path.join(self.tensorboard_log_path, "{}_{}".format(self.tb_log_name, latest_run_id)) 
        
                   self.writer = tf.summary.FileWriter(save_path, graph=self.graph) 
        
               return self.writer

stable-baselines/stable_baselines/common/base_class.py

Lines 1161 to 1164 in 6347da3

    
           def __exit__(self, exc_type, exc_val, exc_tb): 
        
               if self.writer is not None: 
        
                   self.writer.add_graph(self.graph) 
        
                   self.writer.flush()

The text was updated successfully, but these errors were encountered:

shwang · 2020-05-14T05:21:18Z

Maybe the context flushes instead of closing because we should be reusing the old Tensorboard FileWriter when possible.

That way we don't create a new FileWriter, therefore a new events file every time we call PPO2.learn(reset_num_timesteps=False).

I'm ending up with long and growing list of files like:

├── sb_tb
│   └── PPO2_1
│       ├── events.out.tfevents.1589433242.spinach
│       ├── events.out.tfevents.1589433245.spinach
│       ├── events.out.tfevents.1589433248.spinach
│       ├── events.out.tfevents.1589433250.spinach
│       ├── events.out.tfevents.1589433253.spinach
│       ├── events.out.tfevents.1589433255.spinach
│       ├── events.out.tfevents.1589433257.spinach
│       ├── events.out.tfevents.1589433260.spinach
│       ├── events.out.tfevents.1589433262.spinach
│       └── events.out.tfevents.1589433265.spinach

Granted, I can just rely on the ep reward mean logs from Monitor and logger.logkv() which don't use this TensorboardWriter context, so it's not at all critical for me to activate it.

araffin · 2020-05-14T07:59:52Z

Hello,

Maybe a duplicate of #501
But really sounds like a bug

Jiankai-Sun · 2020-05-20T02:34:31Z

new_tb_log==False here does not work?

araffin · 2020-05-20T07:34:41Z

new_tb_log==False here does not work?

There is an issue about that: #599 (comment)

shwang mentioned this issue May 14, 2020

PPO2 producing new Tensorboard events files with every call to PPO2.learn() HumanCompatibleAI/imitation#192

Closed

araffin added the bug Something isn't working label May 14, 2020

araffin added the help wanted Help from contributors is needed label May 20, 2020

rolandgvc mentioned this issue May 30, 2020

Tensorboard integration DLR-RM/stable-baselines3#30

Merged

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should `TensorboardWriter` close its `tf.summary.FileWriter`? #855

Should `TensorboardWriter` close its `tf.summary.FileWriter`? #855

shwang commented May 14, 2020 •

edited

shwang commented May 14, 2020

araffin commented May 14, 2020

Jiankai-Sun commented May 20, 2020

araffin commented May 20, 2020

Should TensorboardWriter close its tf.summary.FileWriter? #855

Should TensorboardWriter close its tf.summary.FileWriter? #855

Comments

shwang commented May 14, 2020 • edited

shwang commented May 14, 2020

araffin commented May 14, 2020

Jiankai-Sun commented May 20, 2020

araffin commented May 20, 2020

Should `TensorboardWriter` close its `tf.summary.FileWriter`? #855

Should `TensorboardWriter` close its `tf.summary.FileWriter`? #855

shwang commented May 14, 2020 •

edited