# Sharing Models and Tokenizers on HuggingFace Hub

Learning how to save and share trained models on HuggingFace Hub

| Date | User | Change Type | Remarks |  
| ---- | ---- | ----------- | ------- |
| 08/12/2025   | Martin | Created   | Notebook created for model sharing on HF Hub | 
| 09/12/2025   | Martin | Update   | Completed Ch4. Creating repos, uploading models files, and model cards | 

# Content

* [Introduction](#introduction)
* [Sharing Models](#sharing-models)
* [Saving Model File](#saving-model-files)

In [1]:
%load_ext watermark

# Introduction

- Each model is hosted as a Git repository
- Sharing models on the Hub automatically deploys a hosted Inference API for the model i.e anyone in the community can test and use it

In [None]:
# Importing using pipeline - Ensure the right pipeline task as stated on model card
from transformers import pipeline

camembert_fill_mask = pipeline("fill-mask", model="camembert-base")
results = camembert_fill_mask("Le camembert est <mask> :)")

In [None]:
# Importing using specific model card
from transformers import CamembertTokenizer, CamembertForMaskedLM

tokenizer = CamembertTokenizer.from_pretrained("camembert-base")
model = CamembertForMaskedLM.from_pretrained("camembert-base")

In [None]:
# Import using Auto* class - Recommended
from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("camembert-base")
model = AutoModelForMaskedLM.from_pretrained("camembert-base")

---

# Sharing Models

- Repository name will be what was specified as the output directory
  - Can be changed with `hub_model_id` argument
- Save frequency based on the `save_strategy` argument
- Final `trainer.push_to_hub()` to save final iteration of model

<u>3 methods to sharing</u>

1. `push_to_hub` API
2. `huggingface_hub` Python library
3. Web interface

## 1. Push to hub

- Requires authentication tokens to indicate user

In [1]:
# Login to HF hub
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.svâ€¦

In [2]:
from transformers import TrainingArguments

training_args = TrainingArguments(
  "bert-finetuned-mrpc",
  save_strategy="epoch",
  push_to_hub=True # >>: Automatically pushes to a new repository in the Hub
)

In [None]:
model.push_to_hub("dummy-model")
tokenizer.push_to_hub("dummy-model", organization="huggingface", use_auth_token="<TOKEN>")

## 2. huggingface_hub

Python library that offers tools to interact with the Hub.

- Also requires API token in cache to work
- Get information about repositories on the hub and managing them

In [None]:
from huggingface_hub import create_repo

create_repo(
  "dummy-model",
  # organization="martz",
  # private=False,
  # token="<HF User Token>",
  # repo_type="dataset or space"
)

RepoUrl('https://huggingface.co/Minimartzz/dummy-model', endpoint='https://huggingface.co', repo_type='model', repo_id='Minimartzz/dummy-model')

In [12]:
# List of common tasks performable through the API
from huggingface_hub import (
  # User management
  login,
  logout,
  whoami,

  # Repository creation and management
  create_repo,
  delete_repo,
  update_repo_visibility,

  # And some methods to retrieve/change information about the content
  list_models,
  list_datasets,
  # list_metrics, # >>: Deprecated
  list_repo_files,
  upload_file,
  delete_file,
)

## 3. Web interface

Most features here are also available on the web interface. Creating a repo, updating README, adding model cards, etc.

---

# Saving Model Files

system to manage files on the Hugging Face Hub is based on __git__ for regular files, and __git-lfs__ (which stands for Git Large File Storage) for larger files

1. `upload_file` - Does not require git or git-lfs, but has a size limit of 5GB
2. `Repository` class - Abstracts away the git commands into a Pythonic class
3. `git` based - Use the git CLI

In [None]:
# upload_file function
from huggingface_hub import upload_file

upload_file(
  '<path_to_file>/config.json',
  path_in_repo='config.json',
  repo_id="<namespace>/dummy-model"
)

In [None]:
# Repository class
from huggingface_hub import Repository

repo = Repository("<path_to_dummy_folder>", clone_from="<namespace>/dummy model")

# Commands
repo.git_pull()
repo.git_add()
repo.git_commit()
repo.git_push()
repo.git_tag()

In [None]:
# Example of a push
repo.git_add()
repo.git_commit("Add model and tokenizer files")
repo.git_push()

---

# Model Cards

The central definition of a model. Created to ensure reusability and reproducibility of results

- Document training and evaluation process
- Provide sufficient information about the data, including preprocessing and postprocessing
- Any limitations, biases, and context that model cannot cover

<u>Recommended Sections</u>

- High-level Overview of Model use
- Model description
- Intended uses & limitations
- How to use
- Limitations and bias
- Training data
- Training procedure
- Evaluation results

## Model description

- Architecture
- Version
- Original paper (if based on one)
- Author
- Copyright
- General info about training procedure, parameters and disclaimers

## Intended uses & limitations

- Languages, fields and domains it's applicable to
- Areas that are out-of-scope for the model

## How to use

- Usage of the model
- Tokeniser
- Other code

## Training data

- Which dataset(s) the model was trained on + description

## Training procedure

- Relevant aspects of training that are useful from a reproducibility perspective
- Preprocessing and postprocessing

## Variable and metrics

- Metrics you use for evaluation
- Metrics should be based on the intended users and use cases

## Evaluation results

- How well the model performs on the evaluation dataset
- Provide decision threshold (if applicable)

In [None]:
%watermark