Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] HF Hub Integration #171

Closed
wants to merge 5 commits into from

Conversation

natolambert
Copy link
Contributor

Working towards closing #169

Things to do (roughly):

  • Verify base functionality,
  • Colab example for loading / saving / visualizing models,
  • Upload pretrained models to hub from @luisenp.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 1, 2022
Copy link

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @natolambert , really nice implementation ! :)
I left a comment about the log directory. I think it should be deleted from the Hub before uploading otherwise you can end up with a messy foler. Other than that, looks good 👍

Comment on lines +163 to +164
if repo_logdir.exists():
shutil.rmtree(repo_logdir)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here repo_logdir is a tmp dir (at least when called from package_to_hub.
So deleting it locally will not delete it on the Hub. A possibility is to use delete_folder to delete the logs folder on the Hub before uploading the new one.

)
print(msg)

repo_url = HfApi().create_repo(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) for consistency with upload_folder calls, I would import create_repo directly from huggingface_hub.
If not called with any parameter, using HfApi(). is not needed.

:param token
"""

repo_url = HfApi().create_repo(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) same as above about HfApi

@natolambert
Copy link
Contributor Author

WIP Colab to showcase this workflow.

Copy link
Contributor

@luisenp luisenp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @natolambert! I left a bunch of comments, but mostly pedantic stuff about formatting and docstrings. I'm not too familiar with HF Hub, so most of the rest LGTM. However, I noticed that package_hub only receives a Model, even though we also have algorithms that require a trained Agent (e.g., MBPO, and eventually Dreamer). This seems like a potential issue?

Also, is the expectation that should be able to download saved model/agents from the hub and run them on our example scripts?

from ..models import Model


def get_model_id(org_name: str, env_id: str, tag: str = None, algo_name: str = None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing type annotation for return value.

Construct a name for the model on the hub.

Args:
org_name (str): HF organization to upload the model to.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you line break the docstrings so that they are always less than 88 chars long? These seem longer that that, not sure. Do note that black won't edit comments.


Args:
org_name (str): HF organization to upload the model to.
env_id: environment name for the data (often from Gym registration).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

env_id (str)
tag (str, optional)
algo_name (str, optional)


# Step 2: Get data (JSON containing infos) and read it
# TODO update this for cfg / yaml
with open(Path.joinpath(local_path, unzipped_model_folder, "data")) as json_file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not possible to do local_path / unzipped_model_folder / "data"?

with open(Path.joinpath(local_path, unzipped_model_folder, "data")) as json_file:
data = json.load(json_file)
# Add system_info elements to our JSON
# data["system_info"] = stable_baselines3.get_system_info(print_info=False)[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete commented out line.

This method does the complete pipeline:
- It generates the model card TODO(NOL)
- It pushes everything to the hub
:param model: trained model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing type info docstrings argument.

commit_message: str,
is_deterministic: bool = True,
token: Optional[str] = None,
logs=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing type annotations for logs, mean_reward, and std_reward (e.g., mean_reward: Optional[float] = None).

# Step 1: Save the model
model.save(tmpdirname / model_name)

# Retrieve VecNormalize wrapper if it exists
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this commented out section using during debugging? Should we remove?

token: Optional[str] = None,
):
"""
Upload a model to Hugging Face Hub (this is for updating an existing repo!).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing type info and periods in docstring.

def load_from_hub(repo_id: str, filename: str) -> str:
"""
Download a model from Hugging Face Hub.
:param repo_id: id of the model repository from the Hugging Face Hub
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing type info and periods in docstring.

@natolambert natolambert marked this pull request as draft February 12, 2023 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants