Set the metadata only during first training run #3684

Infernaught · 2023-10-02T20:57:23Z

This PR allows users to call model.train multiple times (such that the first training run is on a dataset that contains all possible outputs and all subsequent training runs are on datasets whose outputs are subsets of the first's) by setting the metadata only during the first training run.

github-actions · 2023-10-02T21:47:45Z

Unit Test Results

  6 files ±0   6 suites ±0 19m 13s ⏱️ - 3m 0s
12 tests ±0   9 ✔️ ±0   3 💤 ±0 0 ❌ ±0
60 runs ±0 42 ✔️ ±0 18 💤 ±0 0 ❌ ±0

Results for commit 4571558. ± Comparison against base commit 2772e9a.

♻️ This comment has been updated with latest results.

w4nderlust · 2023-10-02T23:07:01Z

ludwig/api.py

+                "Previous metadata has been detected. Overriding `training_set_metadata` with metadata from previous "
+                "training run."


Not a huge fan of this warning. The concept of a training set metadata is an internal one to LudwigModel, the user doesn't need to know about it and does not know about it, so the warning is not super useful for that user.
Better warning would sound like "This model has been trained before and its architecture has been defined based on the original training set properties (i.e. the number of output classes for a category output). The new data provided will be mapped into the previous architecture and it is not possible to modify the architecture based on the new training data provided, if you want to achieve that you should concatenate the new data with the previous data and train a new model from scratch." Or soemthing along those lines

justinxzhao

Let's see if there's a test that we can write for this in test_api.py!

…nto keep_training

justinxzhao

Hi @Infernaught, the changes LGTM. However, it looks like there was a bad rebase since these changes from previous commits appear in the PR. Could you reconcile these diffs?

Set the metadata only during first training run

7557df1

Infernaught requested review from w4nderlust and justinxzhao October 2, 2023 21:03

w4nderlust reviewed Oct 2, 2023

View reviewed changes

Change warning

9c2fd7d

justinxzhao approved these changes Oct 3, 2023

View reviewed changes

justinxzhao reviewed Oct 3, 2023

View reviewed changes

justinxzhao self-requested a review October 3, 2023 14:59

Infernaught added 4 commits October 4, 2023 09:52

Set the metadata only during first training run

cc3ae51

Change warning

00e76c4

Add test to verify metadata stays constant

8037e87

Merge branch 'keep_training' of https://github.com/ludwig-ai/ludwig i…

4817bf8

…nto keep_training

justinxzhao reviewed Oct 5, 2023

View reviewed changes

Merge branch 'master' of github.com:ludwig-ai/ludwig into keep_training

4571558

justinxzhao approved these changes Oct 10, 2023

View reviewed changes

Infernaught merged commit 626d9fc into master Oct 11, 2023
17 checks passed

Infernaught deleted the keep_training branch October 11, 2023 13:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set the metadata only during first training run #3684

Set the metadata only during first training run #3684

Infernaught commented Oct 2, 2023

github-actions bot commented Oct 2, 2023 •

edited

Loading

w4nderlust Oct 2, 2023

justinxzhao left a comment

justinxzhao left a comment

		"Previous metadata has been detected. Overriding `training_set_metadata` with metadata from previous "
		"training run."

Set the metadata only during first training run #3684

Set the metadata only during first training run #3684

Conversation

Infernaught commented Oct 2, 2023

github-actions bot commented Oct 2, 2023 • edited Loading

Unit Test Results

w4nderlust Oct 2, 2023

Choose a reason for hiding this comment

justinxzhao left a comment

Choose a reason for hiding this comment

justinxzhao left a comment

Choose a reason for hiding this comment

github-actions bot commented Oct 2, 2023 •

edited

Loading