New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SummaryWriter deletes data automatically when there are too many #46907
Comments
cc @orionr |
I'm surprised by that, but there might be some bad buffering going on. cc @cryptopic, @nataliakliushkina, @edward-io to help investigate. |
I encountered the same (similar?) problem with only 20 epochs but with epoch-wise image records. I simplfied my original code and made a minimal one that reproduces the error. Please see if they are related or I'm making a mistake somewhere.
After this, open Part of my pip freeze for you reference:
I didn't install full tensorflow. I did install it to see if that's the source of the problem but no. That doesn't help. |
I think I solved this issue (at least mine). It is not about Based on three observations, I suspect it is about displaying: 1) the images were there until certain epoch. Like, images of epoch 13 disappeared after those of 15 were recorded. (not the exact number, just as example of my observation). And 2) the size of log file (using Then I found this flag in tensorboard:
The document itself is not accurate. Setting |
❓ Questions and Help
I use SummaryWriter torch.utils.tensorboard to record data every epoch (1k global timesteps). At first SummaryWriter worked out fine, but I found out that after around 300 epochs(event file about 1M), the Summary starts to lose data. (Like when you see tensorboard visualizer at around 30 epochs, all data are kept and it goes like 1,2,3,4,5,6,7... and at around 300 epochs and you look at tensorboard again, the data becomes 1,5,9,12,13,15...)
I wonder if this is normal? and any idea what might have caused this?
I ues Python 3.8.3, tensorboard '2.3.0', torch '1.4.0',
I use
from torch.utils.tensorboard import SummaryWriter
in my code,and use
tensorboard --logdir=log
in server's bash.I use ssh tunnel tosee results on my pc.
The text was updated successfully, but these errors were encountered: