Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set the metadata only during first training run #3684

Merged
merged 7 commits into from
Oct 11, 2023
Merged

Conversation

Infernaught
Copy link
Contributor

This PR allows users to call model.train multiple times (such that the first training run is on a dataset that contains all possible outputs and all subsequent training runs are on datasets whose outputs are subsets of the first's) by setting the metadata only during the first training run.

@github-actions
Copy link

github-actions bot commented Oct 2, 2023

Unit Test Results

  6 files  ±0    6 suites  ±0   19m 13s ⏱️ - 3m 0s
12 tests ±0    9 ✔️ ±0    3 💤 ±0  0 ±0 
60 runs  ±0  42 ✔️ ±0  18 💤 ±0  0 ±0 

Results for commit 4571558. ± Comparison against base commit 2772e9a.

♻️ This comment has been updated with latest results.

ludwig/api.py Outdated
Comment on lines 449 to 450
"Previous metadata has been detected. Overriding `training_set_metadata` with metadata from previous "
"training run."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a huge fan of this warning. The concept of a training set metadata is an internal one to LudwigModel, the user doesn't need to know about it and does not know about it, so the warning is not super useful for that user.
Better warning would sound like "This model has been trained before and its architecture has been defined based on the original training set properties (i.e. the number of output classes for a category output). The new data provided will be mapped into the previous architecture and it is not possible to modify the architecture based on the new training data provided, if you want to achieve that you should concatenate the new data with the previous data and train a new model from scratch." Or soemthing along those lines

Copy link
Contributor

@justinxzhao justinxzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see if there's a test that we can write for this in test_api.py!

@justinxzhao justinxzhao self-requested a review October 3, 2023 14:59
Copy link
Contributor

@justinxzhao justinxzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Infernaught, the changes LGTM. However, it looks like there was a bad rebase since these changes from previous commits appear in the PR. Could you reconcile these diffs?

@Infernaught Infernaught merged commit 626d9fc into master Oct 11, 2023
17 checks passed
@Infernaught Infernaught deleted the keep_training branch October 11, 2023 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants