Fix for 'upload_to_hf_hub()' path mismatch with 'save()' #3977

sanjaydasgupta · 2024-03-22T05:54:06Z

The code in this PR provides a solution for this issue: upload_to_hf_hub model path mismatch with model.save while also remaining backward compatible with the previous behavior of upload_to_hf_hub.

With this change, upload_to_hf_hub can accept any of the following kinds of path as its model_path argument:

a path to the experiment folder (existing implementation).
a path to the model folder (added by this PR).

The names experiment and model refer to places in the folder hierarchy created when a model is trained:

results / experiment / model / model_weights

The model folder is additionally distinguished by containing the model_hyperparameters.json file. The model_weights folder contains one or more of these files: pytorch_model.bin, adapter_model.bin, adapter_model.safetensors.

The changes to upload_to_hf_hub() cause it to test the model_path argument to determine which kind of path was passed. If the path is a model folder, then the parent path (the corresponding experiment path) is obtained and used instead. So the logic of upload_to_hf_hub and its supporting functions do not have to change at all. It uses that knowledge for its own purposes and also passes it down to all supporting functions (as an additional argument model_path_is_experiment_path: bool = True) for their use. The default value of model_path_is_experiment_path has been chosen to ensure the backward compatible behavior.

~~A unit test has been added (test_upload_to_hf_hub__validate_upload_parameters2()) for the new use-case.~~ And the feature has been integration tested with custom code on a local build.

for more information, see https://pre-commit.ci

github-actions · 2024-03-22T06:22:38Z

Unit Test Results

  6 files ±0   6 suites ±0 13m 40s ⏱️ -45s
12 tests ±0   7 ✔️ -   2   5 💤 +  2 0 ❌ ±0
60 runs ±0 30 ✔️ - 12 30 💤 +12 0 ❌ ±0

Results for commit aa1750e. ± Comparison against base commit b2cb8b0.

This pull request skips 2 tests.

tests.regression_tests.benchmark.test_model_performance ‑ test_performance[ames_housing.gbm.yaml]
tests.regression_tests.benchmark.test_model_performance ‑ test_performance[mercedes_benz_greener.gbm.yaml]

♻️ This comment has been updated with latest results.

…d-hf-issue-3925

for more information, see https://pre-commit.ci

…test_upload_to_hf_hub__validate_upload_parameters2()'

sanjaydasgupta · 2024-03-22T08:02:34Z

Hi @alexsherstinsky, please take a look. Thanks.

alexsherstinsky · 2024-03-22T15:37:18Z

ludwig/api.py

            The description of the generated commit

        # Returns

        :return: (bool) True for success, False for failure.
        """
+        if os.path.exists(os.path.join(model_path, "model", "model_weights")) and os.path.exists(


@sanjaydasgupta The idea/direction seems great! I just have a couple of minor suggestions for you to consider (because if they work, they might improve the code quality slightly):

Can we create the constants for the two directory names, "model" and "model_weights" and replace codebase-wise?

Is it possible instead of passing the boolean model_path_is_experiment_path, determine the "experiments" directory and standardize all methods on using it -- so the "experiment" directory would be passed to them. I think that even if a function is passed the "model" directory, and in those cases we have to make the assumption that "experiments" is "model/..", this would still be reasonable and acceptable. What do you think?

Thank you very much!

@alexsherstinsky, responding to your (2) above - sounds like a great idea! Let me take a closer look at the code.

Hi @alexsherstinsky, both your suggestions have been implemented.

There was already a label MODEL_WEIGHTS_FILE_NAME defined (and used in some places) for the literal string "model_weights". So the choice of a label for "model" fell naturally on MODEL_FILE_NAME -- not the best, but there it is for the time being.

ludwig/api.py

alexsherstinsky

@sanjaydasgupta This is great! I left some comments/suggestions to see if we can make it even better -- thank you!

…y all support functions to assume "model_path" is experiment-path

for more information, see https://pre-commit.ci

…d-hf-issue-3925

…instead of literal strings "model" and "model_weights" respectively

for more information, see https://pre-commit.ci

alexsherstinsky

@sanjaydasgupta I worry that it may not be safe to change a global constant as it may adversely impact backward compatibility (as in -- a user might rely on it). Is there any way for us in the present PR to constrain the changes in such a way that we do not modify the existing constants but reuse them and define a new one, if needed? Thank you!

sanjaydasgupta · 2024-03-23T16:49:11Z

@sanjaydasgupta I worry that it may not be safe to change a global constant as it may adversely impact backward compatibility (as in -- a user might rely on it). Is there any way for us in the present PR to constrain the changes in such a way that we do not modify the existing constants but reuse them and define a new one, if needed? Thank you!

I think there is a misunderstanding, let me clarify what I meant:

The global constant MODEL_WEIGHTS_FILE_NAME (for the literal string “model_weights”) existed previously, and continues to be used to denote “model_weights”. I have not changed that. In fact I have replaced “model_weights” with the global constant in some more places in the code.

I added the (previously unused) global constant MODEL_FILE_NAME for the literal string “model”. Most occurrences of “model” in the code have been replaced with the global constant. A few instances of “model” remain — their context was different (e.g. repository type etc.)

My comment about their being poor choices was due to the use of “FILE” instead of “DIRECTORY” or “FOLDER”. Both constants refer to folders, not file-names.

Please let me know if I misunderstood your comment. Thanks.

alexsherstinsky

@sanjaydasgupta You are correct (I just got concerned about a 38-module change!). Please sync with the latest "master" branch, and I will approve. Thank you very much!

sanjaydasgupta · 2024-03-24T03:40:16Z

@sanjaydasgupta You are correct (I just got concerned about a 38-module change!). Please sync with the latest "master" branch, and I will approve. Thank you very much!

Hi @alexsherstinsky this PR is now ready for merge.

alexsherstinsky

LGTM! Beautiful enhancement! Thank you very much, @sanjaydasgupta!

sanjaydasgupta and others added 2 commits March 22, 2024 10:55

fix for 'upload_to_hf_hub()' path mismatch with 'save()'

a4b9b3d

[pre-commit.ci] auto fixes from pre-commit.com hooks

2f80119

for more information, see https://pre-commit.ci

sanjaydasgupta and others added 4 commits March 22, 2024 11:59

added one unit test 'test_upload_to_hf_hub__validate_upload_parameters2'

e11a52c

Merge remote-tracking branch 'origin/upload-hf-issue-3925' into uploa…

b9f3819

…d-hf-issue-3925

[pre-commit.ci] auto fixes from pre-commit.com hooks

7257808

for more information, see https://pre-commit.ci

passed additional argument 'model_path_is_experiment_path=False' to '…

f749c0c

…test_upload_to_hf_hub__validate_upload_parameters2()'

sanjaydasgupta marked this pull request as ready for review March 22, 2024 08:01

sanjaydasgupta requested review from w4nderlust, tgaddair, justinxzhao, arnavgarg1, geoffreyangus, jeffkinnison, Infernaught and alexsherstinsky as code owners March 22, 2024 08:01

alexsherstinsky reviewed Mar 22, 2024

View reviewed changes

ludwig/api.py Outdated Show resolved Hide resolved

alexsherstinsky requested changes Mar 22, 2024

View reviewed changes

sanjaydasgupta and others added 8 commits March 23, 2024 09:19

always normalize any user-provided path to the experiment-path, modif…

5c58b4c

…y all support functions to assume "model_path" is experiment-path

[pre-commit.ci] auto fixes from pre-commit.com hooks

fa92d13

for more information, see https://pre-commit.ci

The unit test added earlier is no longer correct or relevant

c56c42f

Merge remote-tracking branch 'origin/upload-hf-issue-3925' into uploa…

a62b1f4

…d-hf-issue-3925

minor docstring edit - just to kick off CI

8cea3e2

updated the docstrings for 'model_weight' and other minor code changes

c68132e

used global literals 'MODEL_FILE_NAME' and 'MODEL_WEIGHTS_FILE_NAME' …

8b072d2

…instead of literal strings "model" and "model_weights" respectively

[pre-commit.ci] auto fixes from pre-commit.com hooks

c632b52

for more information, see https://pre-commit.ci

alexsherstinsky requested changes Mar 23, 2024

View reviewed changes

sanjaydasgupta and others added 2 commits March 24, 2024 07:56

Merge branch 'ludwig-ai:master' into upload-hf-issue-3925

43656be

fixed a few formatting errors within docstrings

aa1750e

alexsherstinsky approved these changes Mar 24, 2024

View reviewed changes

alexsherstinsky merged commit 4b07ce4 into ludwig-ai:master Mar 24, 2024
18 checks passed

alexsherstinsky mentioned this pull request Mar 24, 2024

upload_to_hf_hub model path mismatch with model.save #3925

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for 'upload_to_hf_hub()' path mismatch with 'save()' #3977

Fix for 'upload_to_hf_hub()' path mismatch with 'save()' #3977

sanjaydasgupta commented Mar 22, 2024 •

edited

Loading

github-actions bot commented Mar 22, 2024 •

edited

Loading

sanjaydasgupta commented Mar 22, 2024

alexsherstinsky Mar 22, 2024

sanjaydasgupta Mar 23, 2024

sanjaydasgupta Mar 23, 2024 •

edited

Loading

alexsherstinsky left a comment

alexsherstinsky left a comment

sanjaydasgupta commented Mar 23, 2024 •

edited

Loading

alexsherstinsky left a comment

sanjaydasgupta commented Mar 24, 2024

alexsherstinsky left a comment

Fix for 'upload_to_hf_hub()' path mismatch with 'save()' #3977

Fix for 'upload_to_hf_hub()' path mismatch with 'save()' #3977

Conversation

sanjaydasgupta commented Mar 22, 2024 • edited Loading

github-actions bot commented Mar 22, 2024 • edited Loading

Unit Test Results

sanjaydasgupta commented Mar 22, 2024

alexsherstinsky Mar 22, 2024

Choose a reason for hiding this comment

sanjaydasgupta Mar 23, 2024

Choose a reason for hiding this comment

sanjaydasgupta Mar 23, 2024 • edited Loading

Choose a reason for hiding this comment

alexsherstinsky left a comment

Choose a reason for hiding this comment

alexsherstinsky left a comment

Choose a reason for hiding this comment

sanjaydasgupta commented Mar 23, 2024 • edited Loading

alexsherstinsky left a comment

Choose a reason for hiding this comment

sanjaydasgupta commented Mar 24, 2024

alexsherstinsky left a comment

Choose a reason for hiding this comment

sanjaydasgupta commented Mar 22, 2024 •

edited

Loading

github-actions bot commented Mar 22, 2024 •

edited

Loading

sanjaydasgupta Mar 23, 2024 •

edited

Loading

sanjaydasgupta commented Mar 23, 2024 •

edited

Loading