Skip to content

Conversation

@jakubczakon
Copy link
Contributor

Description:

  • Added NeptuneLogger that lets your log experiment metadata to neptune.ai
  • Logging involves handlers for Output, Grads(scalars), Optimizer Params, Model Weights (scalars), and logging best model checkpoints to the server.

Check list:

  • New tests are added (if a new feature is added)
  • New doc strings: description and/or example code are in RST format
  • Documentation is updated (if required)

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jan 26, 2020

@jakubczakon thanks a lot for the PR. Looks good!

A question on the link provided in the docs:
https://ui.neptune.ai/o/neptune-ai/org/pytorch-ignite-integration/e/PYTOR-20/charts
Should it work as is or we need to do something before as I get

Something went wrong :(
We're fixing the problem so please try again in a few minutes.

on your site.

About ModelCheckpointHandler I agree with this implementation, we are not stated on that, however I thought about reusing ignite.handlers.Checkpoint class with specific save_handler like DiskSaver.

@jakubczakon
Copy link
Contributor Author

I fixed that @vfdev-5 .
Made this project private by accident.

@jakubczakon
Copy link
Contributor Author

jakubczakon commented Jan 27, 2020

Thanks for pointing that out @vfdev-5 .

I initially started with ModelCheckpointHandler and wrote my own NeptuneServerSaver handler.
The problem was, ModelCheckpointHandler would save all the checkpoints to the server.
As of now, we don't have an option to remove things from the artifacts section (only overwrite them).

On a second thought, I can talk to the team and see if adding this is actually a problem for them.
Wouldn't I need to overwrite the ModelCheckpoint class anyway to change the os.listdir and stuff?
I could extend the Checkpoint class though.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jan 27, 2020

@jakubczakon I see. Yes, a similar situation is with MLflow for example. A solution can be to log during the training to a temp folder and after the training to send best n model to the server...

Wouldn't I need to overwrite the ModelCheckpoint class anyway to change the os.listdir and stuff? I could extend the Checkpoint class though.

I would suggest to use directly Checkpoint with a specific save_handler which sends(and optionally removes) best models to the server.

@jakubczakon
Copy link
Contributor Author

I would suggest to use directly Checkpoint with a specific save_handler which sends(and optionally removes) best models to the server.

I will talk to our folks and come back on that as it is a cleaner solution I think.

Side note
I really like this handlers and events system.
It gives you a lot of freedom to choose when to do what.
It takes a moment to see the difference vs callbacks but it is clear to me now :)

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jan 27, 2020

To merge this PR, I propose to remove for instance ModelCheckpointHandler. We can add something later once we are done with send/remove or mlflow like solution...

@jakubczakon
Copy link
Contributor Author

jakubczakon commented Jan 27, 2020

It is possible that this neptune.remove_artifact() functionality can be done quite quickly on our side so I will come back today with info on that.
If it's not going to be done quickly I will remove ModelCheckpointHandler.
Is that ok?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jan 27, 2020

@jakubczakon perfect! We can wait until the feedback from your side and do as you proposed.

@jakubczakon
Copy link
Contributor Author

jakubczakon commented Jan 27, 2020

Ok, I spoke to my team and we should have that in the next few days but I am not sure on the ETA and I think it is a better idea to drop ModelCheckpointHandler from this PR and do another PR once we have that feature implemented.
Fair @vfdev-5 ?

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jan 27, 2020

@jakubczakon sounds good, let's do it like that !

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jan 27, 2020

@jakubczakon yes, I see. Could you please, just drop from __future__ import print_function and we are OK and I can merge it

@jakubczakon
Copy link
Contributor Author

Sorry @vfdev-5 forgot that. It's on the way.

Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! @jakubczakon thanks!

@vfdev-5 vfdev-5 merged commit feff57f into pytorch:master Jan 27, 2020
@jakubczakon
Copy link
Contributor Author

Perfect, thank you @vfdev-5!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants