Add Huggingface Integration #916

pranayasinghcsmpl · 2024-08-14T12:03:19Z

Fixes #727

Proposed Changes

Added Huggingface Upload & Download functionality in a subcommand.
Added library name, version & git hash in huggingface tags for huggingface uploads.
Added functionality to save a copy of config.yaml during training.

Checklist

github-actions · 2024-08-14T12:03:32Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

sarthakpati

Before I start reviewing this in earnest, I would need at least the following 2 pieces of information to be added to the PR:

Documentation: (it is absolutely fine to have a bullet point list of items that link to the main HF docs)
Tests

I believe both of these were present in the previous PR.

setup.py

…into hf_cli4

sarthakpati · 2024-09-06T18:48:03Z

Hi @Wauplin and @NielsRogge - this PR looks good from my end. Do you have any feedback?

Wauplin

Hey there 👋 Thanks for the ping!

I left a few comments from an outsider point of view. I do think that the CLI should be more opinionated (understand "have less options and decide things for the user") otherwise we pretty much end up with a CLI close to what huggingface-cli upload and huggingface-cli download do.

GANDLF/cli/huggingface_hub_handler.py

Wauplin · 2024-09-10T13:08:50Z

GANDLF/cli/huggingface_hub_handler.py

+from pathlib import Path
+from GANDLF.utils import get_git_hash
+
+readme_template = """


Is this a simple copy of the model card template found here? If yes, I can suggest to either:

directly reuse the template from huggingface_hub (i.e. ModelCard.from_template(card_data) without the template_str).

or define your own template but in this case you should only put the relevant fields and descriptions for your library (instead of having all fields as empty)

Thanks @Wauplin for making me aware of this ,I will definitely go through it and make required changes as you mentioned

Hi @Wauplin,

We had an internal discussion on what would be the best way for us to showcase potential model uploaders with a specific set of required options for the model card. Thus far, we have landed on using a custom model card. The reason to have all the fields present is provide the ability for a user to put in more information than what we require.

Here, we have put the string "REQUIRED_FOR_GANDLF" for the fields that are explicitly needed for the user to populate, and the rest have been left as present in the template.

In the code, we plan to add 2 checks:

If "REQUIRED_FOR_GANDLF" is found, we present an error to the user saying that this field needs to be populated with appropriate information.

The Repository key should always be https://github.com/mlcommons/GaNDLF.

Thoughts?

This seems a sensible idea to me yes!

Brilliant, thanks for the confirmation! We'll get on it right away. 😄

@sarthakpati @Wauplin so how can we test this file if we propose the upload functionality as we only have entry points tests, do we have to mention a specific directory there

Perhaps you can leverage one of the existing training tests to test the upload. I would recommend this one, since this would only upload a single model.

Ensure you put an appropriate description for it (such as Unit testing model or something) to make it clear for anyone viewing it. Is there a way to update an existing model, @Wauplin?

GANDLF/cli/huggingface_hub_handler.py

Wauplin · 2024-09-10T13:14:02Z

GANDLF/cli/huggingface_hub_handler.py

+        tags += [git_hash]
+
+    card_data = ModelCardData(library_name="GaNDLF", tags=tags)
+    card = ModelCard.from_template(card_data, template_str=readme_template)


See comment above about template_str

Wauplin · 2024-09-10T13:15:38Z

GANDLF/cli/huggingface_hub_handler.py

+def download_from_hub(
+    repo_id: str,
+    revision: Union[str, None] = None,
+    cache_dir: Union[str, None] = None,
+    local_dir: Union[str, None] = None,
+    force_download: bool = False,
+    token: Union[str, None] = None,
+):
+    snapshot_download(
+        repo_id=repo_id,
+        revision=revision,
+        cache_dir=cache_dir,
+        local_dir=local_dir,
+        force_download=force_download,
+        token=token,
+    )


I am not sure this alias is really needed. I would simply call snapshot_download in other places in the code.

I still think the alias is not needed and that snapshot_download could be used by default

GANDLF/cli/huggingface_hub_handler.py

Co-authored-by: Lucain <lucainp@gmail.com>

GANDLF/cli/huggingface_hub_handler.py

sarthakpati · 2024-09-10T17:03:08Z

@pranayasinghcsmpl some lint fixes (unused variables and whatnot) will be needed for this PR. Thanks for taking care of it!

Thanks for your comments and suggestions, @Wauplin!

codecov · 2024-09-13T00:04:34Z

Codecov Report

Attention: Patch coverage is 97.05882% with 4 lines in your changes missing coverage. Please review.

Project coverage is 94.61%. Comparing base (e066e88) to head (5e8a97b).
Report is 15 commits behind head on master.

Files with missing lines	Patch %	Lines
testing/test_full.py	94.23%	3 Missing ⚠️
GANDLF/entrypoints/hf_hub_integration.py	96.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #916      +/-   ##
==========================================
+ Coverage   94.58%   94.61%   +0.03%     
==========================================
  Files         161      164       +3     
  Lines        9567     9701     +134     
==========================================
+ Hits         9049     9179     +130     
- Misses        518      522       +4

Flag	Coverage Δ
unittests	`94.61% <97.05%> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

testing/test_full.py

…into hf_cli4

sarthakpati · 2024-09-25T15:03:28Z

Support ticket generated with Codacy to explore the coverage issue.

sarthakpati · 2024-10-01T08:38:52Z

Codacy folks suggested not to use coverage reporter for anything coming in from other forks 🙄

Anyway, we should be good to go from my end. @Wauplin is this PR good to merge for you?

Wauplin

Integration looks good yes :) I left a few comments but nothing blocking on my side.
Thanks for the iterations!

Wauplin · 2024-10-01T08:54:02Z

GANDLF/cli/huggingface_hub_handler.py

+    api = HfApi(token=token)
+
+    try:
+        api.create_repo(repo_id)
+    except Exception as e:
+        print(f"Error: {e}")
+
+    api = HfApi(token=token)
+
+    repo_id = api.create_repo(repo_id, exist_ok=True).repo_id


Suggested change

api = HfApi(token=token)

try:

api.create_repo(repo_id)

except Exception as e:

print(f"Error: {e}")

api = HfApi(token=token)

repo_id = api.create_repo(repo_id, exist_ok=True).repo_id

api = HfApi(token=token)

try:

repo_id = api.create_repo(repo_id).repo_id

except Exception as e:

print(f"Error: {e}")

no need to create the repo twice

GANDLF/cli/huggingface_hub_handler.py

Wauplin · 2024-10-01T08:55:28Z

GANDLF/cli/huggingface_hub_handler.py

+
+    api.upload_folder(
+        repo_id=repo_id,
+        token=token,


Suggested change

token=token,

no need for this since already provided in HfApi

Wauplin · 2024-10-01T08:56:13Z

GANDLF/cli/huggingface_hub_handler.py

+def download_from_hub(
+    repo_id: str,
+    revision: Union[str, None] = None,
+    cache_dir: Union[str, None] = None,
+    local_dir: Union[str, None] = None,
+    force_download: bool = False,
+    token: Union[str, None] = None,
+):
+    snapshot_download(
+        repo_id=repo_id,
+        revision=revision,
+        cache_dir=cache_dir,
+        local_dir=local_dir,
+        force_download=force_download,
+        token=token,
+    )


I still think the alias is not needed and that snapshot_download could be used by default

Wauplin · 2024-10-01T08:57:05Z

GANDLF/entrypoints/hf_hub_integration.py

+@click.option(
+    "--allow-patterns",
+    "-ap",
+    help="Uploading: If provided, only files matching at least one pattern are uploaded.",
+)
+@click.option(
+    "--ignore-patterns",
+    "-ip",
+    help="Uploading: If provided, files matching any of the patterns are not uploaded.",
+)
+@click.option(
+    "--delete-patterns",
+    "-dp",
+    help="Uploading: If provided, remote files matching any of the patterns will be deleted from the repo while committing new files. This is useful if you don't know which files have already been uploaded.",
+)


Suggested change

@click.option(

"--allow-patterns",

"-ap",

help="Uploading: If provided, only files matching at least one pattern are uploaded.",

)

@click.option(

"--ignore-patterns",

"-ip",

help="Uploading: If provided, files matching any of the patterns are not uploaded.",

)

@click.option(

"--delete-patterns",

"-dp",

help="Uploading: If provided, remote files matching any of the patterns will be deleted from the repo while committing new files. This is useful if you don't know which files have already been uploaded.",

)

I don't think this is needed since you are in control of what needs to be uploaded, right?

Wauplin · 2024-10-01T08:58:34Z

GANDLF/entrypoints/hf_hub_integration.py

+@click.option(
+    "--hf-template",
+    "-hft",
+    help="Adding the template path for the model card it is Required during Uploaing a model",
+    type=click.Path(exists=True, file_okay=True, dir_okay=False),
+)


Maybe default it to hugging_face.md to reduce friction? Users are free to provide another template if they want but having one by default should reduce friction and help grow usage.

Wauplin · 2024-10-01T09:01:01Z

setup.py

@@ -82,6 +82,7 @@
    "typer==0.9.0",
    "colorlog",
    "opacus==1.5.2",
+    "huggingface-hub==0.23.4",


(nit) latest is 0.25.1

Suggested change

"huggingface-hub==0.23.4",

"huggingface-hub==0.25.1",

pranayasinghcsmpl added 2 commits August 14, 2024 16:55

added hf cli

56dacad

updated setup.py

32a206e

pranayasinghcsmpl requested a review from a team as a code owner August 14, 2024 12:03

sarthakpati requested changes Aug 14, 2024

View reviewed changes

setup.py Outdated Show resolved Hide resolved

sarthakpati and others added 8 commits August 19, 2024 14:25

Merge branch 'master' into hf_cli4

f8c3e6a

Merge branch 'master' into hf_cli4

4651320

added hf cli tests & documentation

5a8a7f1

Merge branch 'master' into hf_cli4

350906f

added colorlog

26bdd11

Merge branch 'hf_cli4' of https://github.com/pranayasinghcsmpl/GaNDLF …

178b5ab

…into hf_cli4

added colorlog

0e6a297

added colorlog

97585b8

sarthakpati mentioned this pull request Sep 6, 2024

Getting a weird (and random) error with the cryptography libary securefederatedai/openfl#1015

Closed

Merge branch 'master' into hf_cli4

99d91bd

Wauplin reviewed Sep 10, 2024

View reviewed changes

Update GANDLF/cli/huggingface_hub_handler.py

3fdf152

Co-authored-by: Lucain <lucainp@gmail.com>

sarthakpati reviewed Sep 10, 2024

View reviewed changes

GANDLF/cli/huggingface_hub_handler.py Outdated Show resolved Hide resolved

Update GANDLF/cli/huggingface_hub_handler.py

9c414d9

sarthakpati reviewed Sep 10, 2024

View reviewed changes

GANDLF/cli/huggingface_hub_handler.py Outdated Show resolved Hide resolved

sarthakpati reviewed Sep 10, 2024

View reviewed changes

GANDLF/cli/huggingface_hub_handler.py Show resolved Hide resolved

sarthakpati added 2 commits September 10, 2024 11:45

Update GANDLF/cli/huggingface_hub_handler.py

2c70b01

Update GANDLF/cli/huggingface_hub_handler.py

57f22fc

sarthakpati reviewed Sep 10, 2024

View reviewed changes

GANDLF/cli/huggingface_hub_handler.py Outdated Show resolved Hide resolved

sarthakpati added 3 commits September 10, 2024 11:47

Update GANDLF/cli/huggingface_hub_handler.py

8ac070a

Update GANDLF/cli/huggingface_hub_handler.py

c4fc455

Update GANDLF/cli/huggingface_hub_handler.py

be4a3c9

Merge branch 'master' into hf_cli4

0d6b998

sarthakpati added 2 commits September 11, 2024 12:48

Merge branch 'master' into hf_cli4

5511c17

Merge branch 'master' into hf_cli4

4c3804f

pranayasinghcsmpl added 7 commits September 16, 2024 13:10

hf-template-added

f51d7a4

hf-template

8a7ad4c

hf-template

acd6bdf

resolved conflit

38984d0

resolved

912b7f8

resolved_2issue

a7f6335

resolved-lint

06564b5

sarthakpati reviewed Sep 19, 2024

View reviewed changes

testing/test_full.py Outdated Show resolved Hide resolved

sarthakpati and others added 7 commits September 19, 2024 14:44

Update testing/test_full.py

7ac6b94

huggingface_test updated

a955473

huggingface_test updated

1a9d6e9

Merge branch 'hf_cli4' of https://github.com/pranayasinghcsmpl/GaNDLF …

5e8a97b

…into hf_cli4

change coding style

a0c7e2d

Merge branch 'master' into hf_cli4

5e9374b

Merge branch 'master' into hf_cli4

8c788ee

sarthakpati added 3 commits September 30, 2024 18:27

Merge branch 'master' into hf_cli4

202230a

Merge branch 'master' into hf_cli4

d82e473

Merge branch 'master' into hf_cli4

3cadbfd

Wauplin approved these changes Oct 1, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Huggingface Integration #916

Add Huggingface Integration #916

pranayasinghcsmpl commented Aug 14, 2024

github-actions bot commented Aug 14, 2024 •

edited

Loading

sarthakpati left a comment

sarthakpati commented Sep 6, 2024

Wauplin left a comment

Wauplin Sep 10, 2024

pranayasinghcsmpl Sep 10, 2024

sarthakpati Sep 13, 2024

Wauplin Sep 13, 2024

sarthakpati Sep 13, 2024

pranayasinghcsmpl Sep 13, 2024

sarthakpati Sep 13, 2024

Wauplin Sep 10, 2024

Wauplin Sep 10, 2024

sarthakpati Sep 10, 2024

Wauplin Oct 1, 2024

sarthakpati commented Sep 10, 2024

codecov bot commented Sep 13, 2024 •

edited

Loading

sarthakpati commented Sep 25, 2024

sarthakpati commented Oct 1, 2024

Wauplin left a comment

Wauplin Oct 1, 2024

Wauplin Oct 1, 2024

Wauplin Oct 1, 2024

Wauplin Oct 1, 2024

Wauplin Oct 1, 2024

Wauplin Oct 1, 2024

Add Huggingface Integration #916

Are you sure you want to change the base?

Add Huggingface Integration #916

Conversation

pranayasinghcsmpl commented Aug 14, 2024

Proposed Changes

Checklist

github-actions bot commented Aug 14, 2024 • edited Loading

sarthakpati left a comment

Choose a reason for hiding this comment

sarthakpati commented Sep 6, 2024

Wauplin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sarthakpati commented Sep 10, 2024

codecov bot commented Sep 13, 2024 • edited Loading

Codecov Report

sarthakpati commented Sep 25, 2024

sarthakpati commented Oct 1, 2024

Wauplin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Aug 14, 2024 •

edited

Loading

codecov bot commented Sep 13, 2024 •

edited

Loading