# Creating checkpoints on the Hugging Face Hub

This short notebook explains how you can create a model checkpoint on [Hugging Face Hub](https://huggingface.co/docs/hub/repositories).

## Imports

In [1]:
import os

In [2]:
import numpy as np
import torch
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from torch import nn

In [3]:
from skorch import NeuralNetClassifier
from skorch.callbacks import TrainEndCheckpoint
from skorch.hf import HfHubWriter

In [4]:
from huggingface_hub import Repository, create_repo, HfApi

If not installed already, please install the [Hugging Face Hub](https://huggingface.co/docs/huggingface_hub/index) library:

`$ python -m pip install huggingface_hub`

Also, you need `skorch>=0.12` or installed from the master branch on GitHub.

<table align="left"><td>
<a target="_blank" href="https://colab.research.google.com/github/skorch-dev/skorch/blob/master/notebooks/Hugging_Face_Checkpoint.ipynb">
    <img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>  
</td><td>
<a target="_blank" href="https://github.com/skorch-dev/skorch/blob/master/notebooks/Hugging_Face_Checkpoint.ipynb"><img width=32px src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a></td></table>

In [5]:
! [ ! -z "$COLAB_GPU" ] && pip install torch "skorch>=0.12" huggingface_hub

## Settings

In [6]:
# set the token as an environment variable called HF_TOKEN, e.g. `HF_TOKEN=hf_...`
# the token can be found at: https://huggingface.co/settings/tokens
TOKEN = os.environ['HF_TOKEN']
# choose name for the whole model and for the model weights
# typically, you only need one of the two, we use both for demonstration purposes
MODEL_NAME = 'skorch-model.pkl'
WEIGHTS_NAME = 'weights.pt'
# choose a repo name within your user account or organization
REPO_NAME = 'BenjaminB/test-skorch'

In [7]:
torch.manual_seed(0)
np.random.seed(0)

## Create data

We use a toy dataset for this demo.

In [8]:
X, y = make_classification(10000, 20, n_informative=10, random_state=0)
X, y = X.astype(np.float32), y.astype(np.int64)

In [9]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

## Define model

### The module

In [10]:
class ClassifierModule(nn.Module):
    def __init__(
            self,
            num_units=10,
            nonlin=nn.ReLU(),
            dropout=0.5,
    ):
        super(ClassifierModule, self).__init__()
        self.num_units = num_units
        self.nonlin = nonlin
        self.dropout = dropout

        self.dense0 = nn.Linear(20, num_units)
        self.nonlin = nonlin
        self.dropout = nn.Dropout(dropout)
        self.dense1 = nn.Linear(num_units, num_units)
        self.output = nn.Linear(num_units, 2)
        self.softmax = nn.Softmax(dim=-1)

    def forward(self, X, **kwargs):
        X = self.nonlin(self.dense0(X))
        X = self.dropout(X)
        X = self.nonlin(self.dense1(X))
        X = self.softmax(self.output(X))
        return X

### Create a repository on Hugging Face Hub

Assuming the repo doesn't exist yet, create a new one using this function:

In [11]:
skorch_repo = create_repo(
    REPO_NAME,
    private=True,  # set to False if it should be public
    token=TOKEN,
    exist_ok=True,
)

In [12]:
skorch_repo

'https://huggingface.co/BenjaminB/test-skorch'

### Create a `HfHubWriter` instance to use with the `TrainEndCheckpoint` callback

The ingredient we need to save models on the hub is the `skorch.hf.HfHubWriter`. This writer can be used instead of a filename when you use `skorch.callbacks.TrainEndCheckpoint` (or `skorch.callbacks.Checkpoint`, but more on that later). Therefore, you can continue to use your existing checkpoints, only that models are stored on Hugging Face Hub instead of locally.

As a first step, we need to create a `HfApi` instance, which is used by the `HfHubWriter` to perform the upload.

In [13]:
hf_api = HfApi()

Then, we create a `hub_pickle_writer`, which is used by the checkpoint callback to write the whole skorch model as a pickle file to the indicated repository. We indicate the file path, repository name, and the Hugging Face token. Optionally, we can also set `verbose=1` to print a message when a file has been uploaded.

In [14]:
hub_pickle_writer = HfHubWriter(
    hf_api,
    path_in_repo=MODEL_NAME,
    repo_id=REPO_NAME,
    token=TOKEN,
    verbose=1,
)

Instead of writing the whole skorch model to the Hub, we can also decide to only write specific components, e.g. the `module`. This saves the `state_dict` of the module to the Hub using `torch.save` under the hood.

In [15]:
hub_params_writer = HfHubWriter(
    hf_api,
    path_in_repo=WEIGHTS_NAME,
    repo_id=REPO_NAME,
    token=TOKEN,
    verbose=1,
)

The other attributes (optimizer, criterion, training history) are not saved for this demo.

In [16]:
checkpoint = TrainEndCheckpoint(
    f_pickle=hub_pickle_writer,
    f_params=hub_params_writer,
    f_optimizer=None,
    f_criterion=None,
    f_history=None,
)

In [17]:
net = NeuralNetClassifier(
    ClassifierModule,
    max_epochs=20,
    lr=0.1,
    device='cpu',
    iterator_train__shuffle=True,
    callbacks=[checkpoint],
)

In [18]:
net.fit(X_train, y_train)

  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        [36m0.6513[0m       [32m0.7887[0m        [35m0.5560[0m  0.1043
      2        [36m0.5736[0m       [32m0.8527[0m        [35m0.4521[0m  0.0820
      3        [36m0.5219[0m       [32m0.8633[0m        [35m0.4045[0m  0.0806
      4        [36m0.4940[0m       [32m0.8820[0m        [35m0.3720[0m  0.0853
      5        [36m0.4728[0m       0.8787        [35m0.3509[0m  0.0808
      6        [36m0.4558[0m       [32m0.8867[0m        [35m0.3443[0m  0.1058
      7        [36m0.4378[0m       [32m0.8907[0m        [35m0.3293[0m  0.1181
      8        [36m0.4335[0m       [32m0.8993[0m        [35m0.3242[0m  0.1283
      9        [36m0.4289[0m       [32m0.9053[0m        [35m0.3084[0m  0.1084
     10        [36m0.4070[0m       [32m0.9067[0m        [35m0.3053[0m  0.1229
     11        0.4092       [32m0.9140[0m        [35

<class 'skorch.classifier.NeuralNetClassifier'>[initialized](
  module_=ClassifierModule(
    (nonlin): ReLU()
    (dense0): Linear(in_features=20, out_features=10, bias=True)
    (dropout): Dropout(p=0.5, inplace=False)
    (dense1): Linear(in_features=10, out_features=10, bias=True)
    (output): Linear(in_features=10, out_features=2, bias=True)
    (softmax): Softmax(dim=-1)
  ),
)

As you can see, both the weights of the PyTorch module and the whole skorch model were saved on Hub. Visit the printed URLs to see them on the Hub.

As a next step, think about adding a [Model Card](https://huggingface.co/docs/hub/models-cards) to your repository to provide further information about the model.

<div class="alert alert-block alert-info">
    <b>Info: Using the HfHubWriter with Checkpoint:</b><br>


Right now, we use `TrainEndCheckpoint`, which uploads the model only once, at the end of training. Instead, we could use `Checkpoint`, which uploads the model each time that the monitored metric improves. You should note, however, that at the moment, the upload is _synchronous_, i.e. we wait for the upload to finish. So if uploading the model takes a long time compared to training the model, your training process could be slowed down considerably, depending on how often the model improves.

If you still decide to use `Checkpoint`, you might want to keep a version of each upload file, instead of the latest one overwriting the previous one. This is possible by choosing a templated model name, e.g. `'skorch-model-{}.pkl'`. This way, the first upload will create the file `'skorch-model-0.pkl'`, the second one creates the file `'skorch-model-1.pkl'`, etc.
</div>

## Loading

In [19]:
import pickle
from huggingface_hub import hf_hub_download
from sklearn.metrics import accuracy_score

### Loading the whole model

The skorch model is just a normal pickle file and can be loaded like this:

In [20]:
hub_pickle_writer.latest_url_

'https://huggingface.co/BenjaminB/test-skorch/blob/main/skorch-model.pkl'

In [21]:
path = hf_hub_download(REPO_NAME, MODEL_NAME, use_auth_token=TOKEN)

Downloading:   0%|          | 0.00/68.0k [00:00<?, ?B/s]

In [22]:
with open(path, 'rb') as f:
    net_loaded = pickle.load(f)

In [23]:
accuracy_score(y, net_loaded.predict(X))

0.8822

### Loading the model weights

The model weights are stored as a PyTorch `state_dict`.

In [24]:
hub_params_writer.latest_url_

'https://huggingface.co/BenjaminB/test-skorch/blob/main/weights.pt'

In [25]:
path = hf_hub_download(REPO_NAME, WEIGHTS_NAME, use_auth_token=TOKEN)

In [26]:
with open(path, 'rb') as f:
    weights_loaded = torch.load(f)

In [27]:
for key, val in weights_loaded.items():
    print(f"Parameter name '{key}' and shape {val.shape}")

Parameter name 'dense0.weight' and shape torch.Size([10, 20])
Parameter name 'dense0.bias' and shape torch.Size([10])
Parameter name 'dense1.weight' and shape torch.Size([10, 10])
Parameter name 'dense1.bias' and shape torch.Size([10])
Parameter name 'output.weight' and shape torch.Size([2, 10])
Parameter name 'output.bias' and shape torch.Size([2])


Typically, when you store the whole skorch model, you don't need to store the weights separately, as they are already part of the whole model:

In [28]:
for key, val in net_loaded.module_.state_dict().items():
    print(f"Parameter name '{key}' and shape {val.shape}")

Parameter name 'dense0.weight' and shape torch.Size([10, 20])
Parameter name 'dense0.bias' and shape torch.Size([10])
Parameter name 'dense1.weight' and shape torch.Size([10, 10])
Parameter name 'dense1.bias' and shape torch.Size([10])
Parameter name 'output.weight' and shape torch.Size([2, 10])
Parameter name 'output.bias' and shape torch.Size([2])


However, there can be situations where you don't need the whole skorch model, in which case you can only store the model weights.