Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial pass at wandb Ludwig integration #514

Merged
merged 20 commits into from Feb 2, 2020
Merged

Initial pass at wandb Ludwig integration #514

merged 20 commits into from Feb 2, 2020

Conversation

vanpelt
Copy link
Contributor

@vanpelt vanpelt commented Sep 6, 2019

This is initial pass at a basic integration of wandb with Ludwig. It currently mirrors tensorboard events, stores hyperparameters, syncs artifacts, and stores evaluation results.

I'm using the python "black" code formatter which seems to be clashing with some of the existing code. Is there a preferred formatter to use?

@CLAassistant
Copy link

CLAassistant commented Sep 6, 2019

CLA assistant check
All committers have signed the CLA.

@vanpelt
Copy link
Contributor Author

vanpelt commented Sep 6, 2019

Just tried to sign the CLA and it's blank...

@w4nderlust
Copy link
Collaborator

Just tried to sign the CLA and it's blank...

@briankhsieh can you please look into the CLA issue?

@w4nderlust
Copy link
Collaborator

This is initial pass at a basic integration of wandb with Ludwig. It currently mirrors tensorboard events, stores hyperparameters, syncs artifacts, and stores evaluation results.

I'm using the python "black" code formatter which seems to be clashing with some of the existing code. Is there a preferred formatter to use?

Thanks for this contribution, will look into it shortly.
As for formatter, we usually use PyCharm's one with the line length set to 79.

@bhenhsi
Copy link

bhenhsi commented Sep 6, 2019

It looks like cla-assistant.io is having intermittent issues. It worked for me this morning. Can you try again later?

@vanpelt
Copy link
Contributor Author

vanpelt commented Sep 8, 2019

@briankhsieh got it this time, just took a little while to load.

@vanpelt
Copy link
Contributor Author

vanpelt commented Sep 10, 2019

@w4nderlust I'm using VSCode, if you have any ideas on settings I'm all ears. I'll do some googling as well. Other than formatting, any thoughts on the pull?

@w4nderlust
Copy link
Collaborator

@vanpelt sorry for the delay on this, was out for conferences. Will look into it by end of day Friday.

Comment on lines 213 to 214
save_csv(csv_filename.format(
output_field, output_type), values)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please format this on 4 lines:

save_csv(
    csv_filename.format(output_field, output_type),
    values
)

del config["output_features"]
wandb.config.update(config)

def train_init(self, experiment_directory, experiment_name, model_name, resume, output_directory):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks longer than 80 columns, please split accordingly

@w4nderlust
Copy link
Collaborator

I'm really sorry for the long delay @vanpelt , I had some personal issues that kept me away from working on Ludwig, now I'm back. The PR looks good to me as a starting point, just added a couple formatting suggestions. Regarding VSCode, I don't use it, but likely you can set the linter to 80 columns somehow (note sure this is still valid but https://stackoverflow.com/questions/29968499/vertical-rulers-in-visual-studio-code ). Do you prefer if I merge it now or do you want to keep on working on it adding the missing features?

@vanpelt
Copy link
Contributor Author

vanpelt commented Oct 8, 2019

Hey @w4nderlust no worries. Let me take a pass at getting formatting setup properly and address your comments. I'll have something shortly.

@vanpelt
Copy link
Contributor Author

vanpelt commented Oct 8, 2019

Ok, @w4nderlust addressed your feedback. Let me know if I missed anything, I'm still seeing DeepSource issues. Would love to get my linter / formatter setup correctly in vscode but not sure how best to do it.

@w4nderlust
Copy link
Collaborator

I looked at the deepsource complaints. They are msotly related with global and import.
Doing import more than one time in multiple threads would be a problem?
If not, maybe you could do the same thing I'm doing with the horovod import here: https://github.com/uber/ludwig/blob/master/ludwig/models/model.py#L91-L94 Assigning the imported library as a reference variable in self, so that in the rest of the class of the function you just call self.wandb.
It would look like:

try:
    import wandb
    wandb_obj = Wandb()
    setattr(wandb_obj, 'wandb', wandb)
    return wndb_obj

What do you think?

@vanpelt
Copy link
Contributor Author

vanpelt commented Oct 22, 2019

Sorry for the delay on this, I'll hopefully get to it this week.

@vanpelt
Copy link
Contributor Author

vanpelt commented Nov 1, 2019

Ok @w4nderlust finally got to this. Let me know if you need any other changes!

@vanpelt
Copy link
Contributor Author

vanpelt commented Nov 11, 2019

@w4nderlust Ping

@vanpelt
Copy link
Contributor Author

vanpelt commented Nov 18, 2019

Hey @w4nderlust any updates here?

@w4nderlust
Copy link
Collaborator

Checked it out, lgtm, will test it later this week and it everything works fine as expected will merge. please give me some instructions for testing, like an example command or something like that ;)

@vanpelt
Copy link
Contributor Author

vanpelt commented Nov 19, 2019

Awesome! To test it, create a wandb account here: https://app.wandb.ai then run wandb login on your machine. Then just run a Ludwig job with the —wandb flag.

@vanpelt
Copy link
Contributor Author

vanpelt commented Dec 9, 2019

@w4nderlust any updates here?

@vanpelt
Copy link
Contributor Author

vanpelt commented Dec 17, 2019

@w4nderlust I'm sure the holidays are hectic. Would love to get this in, do you have an eta?

@w4nderlust
Copy link
Collaborator

Hey @w4nderlust I'm having a colleague try the TF2 integration and add a test. Hopefully we'll have something next week.

Thank you, sounds good! Consider the TF2 integration is not there yet, the test would be there to avoid problems when we will do it.

dsblank added a commit to dsblank/ludwig that referenced this pull request Jan 17, 2020
I missed an injection point added in ludwig-ai#514
@borisdayma
Copy link
Contributor

The comments should now be addressed.
I just need to do a few manual tests and add an integration test for wandb.

Quick questions:

  • We currently rely on syncing tensorboard data but it seems to be missing validation/test logs. Is that correct? If so we will just use the train_epoch_end method to log them to W&B
  • It seems our predict_end is never called. I guess it is not used during training and only for inference on trained models. If so then we will remove the call to predict_end and you don't need to add it to the docs @dsblank
  • DeepSource does not like a few things and also wants to convert back the methods to static since we don't use self. Can we ignore it?

@w4nderlust
Copy link
Collaborator

The comments should now be addressed.
I just need to do a few manual tests and add an integration test for wandb.

Thank you!

Quick questions:

  • We currently rely on syncing tensorboard data but it seems to be missing validation/test logs. Is that correct? If so we will just use the train_epoch_end method to log them to W&B

Yes Currently the tensorboard contains only the batch level measures (that are only about the training set) and none of the epoch level measures we display in the cli. There is an open PR for that #571 that will be merged soon.

  • It seems our predict_end is never called. I guess it is not used during training and only for inference on trained models. If so then we will remove the call to predict_end and you don't need to add it to the docs @dsblank

That's fine. if other integrations need it we will add it, but if you don't need it we can do without for now.

  • DeepSource does not like a few things and also wants to convert back the methods to static since we don't use self. Can we ignore it?

It's fine to ignore them, in particular if they are warnings. I would say try to address as many as you can, but only do it if it makes sense. The static method complaint can be ignored for instance.

@borisdayma
Copy link
Contributor

Except for formatting issues everything should be working.

Here is an example run: https://app.wandb.ai/borisd13/ludwig_mnist/runs/qzu4b5qq?workspace=user-borisd13

Whenever the other metrics are added to tensorboard they should automatically become in sync with W&B.

Let me know if I need to change anything!

resume, output_directory):
import wandb
logger.info("wandb.train_init() called...")
wandb.init(project=os.getenv("WANDB_PROJECT", experiment_name),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add name=model_name to the init parameters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I was not aware of what model_name was used for initially but now it makes sense to use it as the W&B run name

@w4nderlust
Copy link
Collaborator

@borisdayma tested it and everything looks fine. Added just one little comment about the model name, as at the moment one gets names that are auto-generated by wandb I believe like "dandy-donkey-2". Once that is done, I'll merge.

@borisdayma
Copy link
Contributor

I added it and the runs are now named per model_name, see example:
https://app.wandb.ai/borisd13/ludwig_mnist/runs/vsf46v66?workspace=user-borisd13

Once PR #571 is merged, the visualization will be more useful as validation metrics should also be logged.

@w4nderlust
Copy link
Collaborator

@borisdayma thank you, will merge that one first, test again ,and then merge this. I Don't expect any problem to happen, but just want to be sure.

@w4nderlust w4nderlust merged commit e8af86c into ludwig-ai:master Feb 2, 2020
@w4nderlust
Copy link
Collaborator

THanks a lot for your work and your patience. Sorry if it took me longer than I expected to get to this, but in the end we merged it :)

@borisdayma
Copy link
Contributor

Thanks, I'll be playing with it!

@w4nderlust w4nderlust changed the title Initial pass at ludwig integration Initial pass atwandb ludwig integration Apr 2, 2020
@w4nderlust w4nderlust changed the title Initial pass atwandb ludwig integration Initial pass at wandb Ludwig integration Apr 7, 2020
@CharlyWargnier
Copy link

Thanks for that great integration guys! Is it now fully working?

@w4nderlust
Copy link
Collaborator

Yes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants