<a href="https://colab.research.google.com/github/mihaimaruseac/model-transparency/blob/main/Model_signing_with_%60model_signing%60_v1_0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Model signing with `model_signing`

This notebook showscases the `model_signing` library, which can be used to sign and verify the integrity of ML models, regardless of size, or format.

We provide both an API and a CLI. The API can be used to integrate the library with training pipelines, inference pipelines, model hub libraries, and ML frameworks. The CLI can be used for batch scripting or signing and validating the integrity of models on a case by case basis.

Use the following table of contents to navigate through the notebook, or run the entire notebook at once:

>[Model signing with model_signing](#scrollTo=1h_VBBxh2Vpx)

>>[Setup](#scrollTo=47cutIDjyu6U)

>>>[Obtaining some models](#scrollTo=_gEUv3j_zIqy)

>>[Model signing and verification using the CLI](#scrollTo=y2G3rDDv7T7e)

>>>[Signing and verification with Sigstore](#scrollTo=k1mS5MdJDkRk)

>>>[Signing and verification using key-based cryptography](#scrollTo=WWOqJzLpDv36)

>>>[When integrity is compromised](#scrollTo=DSozctBbFZ6E)

>>[Powerful integrations via the API](#scrollTo=0Yzsh29JHAEt)

>>>[Signing and verification for a single model](#scrollTo=Jx15Q-yEHvay)

>>>[Signing and verification with an explicit configuration](#scrollTo=85_wuHMXNjW-)

>>>[Signing multiple models with the same configuration](#scrollTo=QDEOTIUxRkvq)



## Setup

We first begin with a little bit of setup for setting up the Colab.

First, we need to install the `jq` utility which we can use later to inspect the signature.

In [1]:
!apt-get install jq > /dev/null

Next, we install the `model_signing` library and other needed Python dependencies

In [2]:
!pip install model_signing

Collecting model_signing
  Downloading model_signing-1.1.1-py3-none-any.whl.metadata (17 kB)
Collecting asn1crypto (from model_signing)
  Downloading asn1crypto-1.5.1-py2.py3-none-any.whl.metadata (13 kB)
Collecting blake3 (from model_signing)
  Downloading blake3-1.0.7-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (217 bytes)
Collecting in-toto-attestation (from model_signing)
  Downloading in_toto_attestation-0.9.3-py3-none-any.whl.metadata (2.6 kB)
Collecting sigstore-models>=0.0.5 (from model_signing)
  Downloading sigstore_models-0.0.5-py3-none-any.whl.metadata (1.3 kB)
Collecting sigstore>=4.0 (from model_signing)
  Downloading sigstore-4.0.0-py3-none-any.whl.metadata (16 kB)
Collecting id>=1.1.0 (from sigstore>=4.0->model_signing)
  Downloading id-1.5.0-py3-none-any.whl.metadata (5.2 kB)
Collecting rfc8785~=0.1.2 (from sigstore>=4.0->model_signing)
  Downloading rfc8785-0.1.4-py3-none-any.whl.metadata (3.4 kB)
Collecting rfc3161-client<1.1.0,>=1.0.3 (from s

### Obtaining some models

To showcase the library, we will download a few models that we can use throughout the demo.

First, to allow re-runing the notebook without restarting the kernel, we delete the models, in case they have been downloaded already.

In [3]:
!rm -rf bert-base-uncased

In [4]:
!rm -rf finbert

In [5]:
!rm -rf resnet-50

In [6]:
!rm -rf vision-perceiver-learned

Next, we obtain the models from the internet.

In [7]:
!git clone --depth=1 "https://huggingface.co/bert-base-uncased"

Cloning into 'bert-base-uncased'...
remote: Enumerating objects: 24, done.[K
remote: Counting objects: 100% (24/24), done.[K
remote: Compressing objects: 100% (20/20), done.[K
remote: Total 24 (delta 0), reused 24 (delta 0), pack-reused 0 (from 0)[K
Unpacking objects: 100% (24/24), 324.36 KiB | 3.21 MiB/s, done.
Filtering content: 100% (7/7), 3.21 GiB | 46.94 MiB/s, done.


In [8]:
!git clone --depth=1 "https://huggingface.co/ProsusAI/finbert"

Cloning into 'finbert'...
remote: Enumerating objects: 11, done.[K
remote: Counting objects: 100% (11/11), done.[K
remote: Compressing objects: 100% (11/11), done.[K
remote: Total 11 (delta 0), reused 10 (delta 0), pack-reused 0 (from 0)[K
Unpacking objects: 100% (11/11), 110.48 KiB | 6.90 MiB/s, done.
Filtering content: 100% (3/3), 1.22 GiB | 38.97 MiB/s, done.


In [9]:
!git clone --depth=1 "https://huggingface.co/microsoft/resnet-50"

Cloning into 'resnet-50'...
remote: Enumerating objects: 10, done.[K
remote: Counting objects: 100% (10/10), done.[K
remote: Compressing objects: 100% (10/10), done.[K
remote: Total 10 (delta 0), reused 7 (delta 0), pack-reused 0 (from 0)[K
Unpacking objects: 100% (10/10), 26.25 KiB | 5.25 MiB/s, done.
Filtering content: 100% (4/4), 391.25 MiB | 65.04 MiB/s, done.


In [10]:
!git clone --depth=1 "https://huggingface.co/deepmind/vision-perceiver-learned"

Cloning into 'vision-perceiver-learned'...
remote: Enumerating objects: 7, done.[K
remote: Counting objects: 100% (7/7), done.[K
remote: Compressing objects: 100% (7/7), done.[K
remote: Total 7 (delta 0), reused 6 (delta 0), pack-reused 0 (from 0)[K
Unpacking objects: 100% (7/7), 26.88 KiB | 6.72 MiB/s, done.


Since we used `git clone` to download the models, we also have the `.git` directory associated with the repository. This is a directory which should not exist in the signature, so we can delete it.

In [11]:
!rm -rf bert-base-uncased/.git

In [12]:
!rm -rf finbert/.git

In [13]:
!rm -rf resnet-50/.git

In [14]:
# NOTE: We don't remove the git directory from here! (no !rm -rf vision-perceiver-learned/.git)

Finally, we can compare the size of the downloaded models

In [15]:
!du -sh bert-base-uncased/ finbert/ resnet-50/ vision-perceiver-learned/

3.3G	bert-base-uncased/
1.3G	finbert/
392M	resnet-50/
476M	vision-perceiver-learned/


## Model signing and verification using the CLI

We can now use the `model_signing` package to sign and verify the integrity of models. We will demonstrate both the CLI and the API in this section.

First, let's use `model_signing` as a CLI tool to sign and check the integrity of models.

### Signing and verification with Sigstore

 Let's sign the `bert-base-uncased` model and sign using the default arguments:

In [16]:
!model_signing sign bert-base-uncased

Key a687e5bf4fab82b0ee58d46e05c9535145a2c9afb458f43d42b45ca0fdce2a70 failed to verify targets
Go to the following link in a browser:

	https://oauth2.sigstore.dev/auth/auth?response_type=code&client_id=sigstore&client_secret=&scope=openid+email&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&code_challenge=DuQYD8ynfHyvVTULWbqTXaf9uw6z_7ZnROE7f9gs2FQ&code_challenge_method=S256&state=04001c44-64e2-4c8c-ac05-5a1447f23f6f&nonce=0608e473-6ca5-448c-822c-64234e372a55
Enter verification code: sqb3yw6bcrazibhvimyyw7jgw
Signing succeeded


During signing we were presented with an OIDC flow to obtain a token that represents the identity used during signing. The identity and the identity provider are arguments that we need to pass when we verify the signature.

In [17]:
identity = "mihaimaruseac@google.com" # @param {type:"string"}
oidc_provider = "https://github.com/login/oauth" # @param {type:"string"}

By default, the signature is in `model.sig`. First, we can look at its size:

In [18]:
!ls -l model.sig

-rw-r--r-- 1 root root 11345 Oct 10 18:00 model.sig


Next, before looking at the signature contents, let's try to also verify the model. Here we have to use the identity options used during signing:

In [19]:
!model_signing verify bert-base-uncased --signature model.sig --identity "$identity" --identity_provider "$oidc_provider"

Key a687e5bf4fab82b0ee58d46e05c9535145a2c9afb458f43d42b45ca0fdce2a70 failed to verify targets
Verification succeeded


### Signing and verification using key-based cryptography

The package also supports signing using traditional methods. We support signing with a private key (and verifying with the matching public key) or with a signing certificate. For this demo, let's try the example of a private key.

We support a family of elliptic curve based keys, so let's generate a keypair:

In [20]:
!openssl ecparam -name prime256v1 -genkey -noout -out key

And the public key of the pair:

In [21]:
!openssl ec -in key -pubout > key.pub

read EC key
writing EC key


Now we can sign a different model using the keys. We also save the signature in a different file.

In [22]:
!model_signing sign key resnet-50 --private_key key --signature resnet.sig

Signing succeeded


Signing succeeded, and we can look at the signature file:

In [23]:
!ls -l resnet.sig

-rw-r--r-- 1 root root 2707 Oct 10 18:00 resnet.sig


Next, we can verify the integrity of the model, using the paired public key:

In [24]:
!model_signing verify key resnet-50 --signature resnet.sig --public_key key.pub

Verification succeeded


### When integrity is compromised

To conclude, let's look at the behavior when the integrity cannot be verified. For this demo, we will "mistakenly" use the signature for one model to verify the integrity of the other.

In [25]:
!model_signing verify resnet-50 --signature model.sig --identity "$identity" --identity_provider "$oidc_provider"

Key a687e5bf4fab82b0ee58d46e05c9535145a2c9afb458f43d42b45ca0fdce2a70 failed to verify targets
Verification failed with error: Signature mismatch: ["Extra files found in model 'resnet-50': preprocessor_config.json", "Missing files in model 'resnet-50': LICENSE, coreml/fill-mask/float32_model.mlpackage/Data/com.apple.CoreML/model.mlmodel, coreml/fill-mask/float32_model.mlpackage/Data/com.apple.CoreML/weights/weight.bin, coreml/fill-mask/float32_model.mlpackage/Manifest.json, model.onnx, rust_model.ot, tokenizer.json, tokenizer_config.json, vocab.txt", 'Hash mismatch for \'README.md\': Expected \'Digest(algorithm=\'sha256\', digest_value=b\'\\x91\\x87\\xb6\\x01\\x8e\\xa0\\x01\\r\\x88Nx\\xe0\\x98\\xe3(\\xfa\\xa1\\xb8\\x8b0\\x15p\\xd0\\xcc\\xe6\\x06\\xbb5\\xe4\\x06~\\x17\')\', Actual \'Digest(algorithm=\'sha256\', digest_value=b\'\\x04\\xd0a\\x88\\x07\\x08\\xf0h9\\x99\\x82\\xdf&\\x8d\\xcb\\xf2l"\\x18G\\xa3\\xfe\\\\ \\x05; \\x9a\\xf5\\x94\\xbd\\x04\')\'', "Hash mismatch for 'config.json': Ex

Here we see that verification failed with some error regarding manifests not matching. We will discuss what this means in a later section. For now, let's see what happens when the identity itself is wrong.

In [26]:
!model_signing verify bert-base-uncased --signature model.sig --identity "FAKE_IDENTITY" --identity_provider "$oidc_provider"

Key a687e5bf4fab82b0ee58d46e05c9535145a2c9afb458f43d42b45ca0fdce2a70 failed to verify targets
Verification failed with error: Certificate's SANs do not match FAKE_IDENTITY; actual SANs: {'mihaimaruseac@google.com'}


The error message clearly specifies that we have passed the wrong identity.

A similar error can occur when the identity provider is wrong:

In [27]:
!model_signing verify bert-base-uncased --signature model.sig --identity "$identity" --identity_provider "FAKE_PROVIDER"

Key a687e5bf4fab82b0ee58d46e05c9535145a2c9afb458f43d42b45ca0fdce2a70 failed to verify targets
Verification failed with error: Certificate's OIDCIssuer does not match (got 'https://github.com/login/oauth', expected 'FAKE_PROVIDER')


Finally, we can look at what happens when a signature generated by one method is verified from another type of verification method:

In [28]:
!model_signing verify bert-base-uncased --signature resnet.sig --identity "$identity" --identity_provider "$oidc_provider"

Key a687e5bf4fab82b0ee58d46e05c9535145a2c9afb458f43d42b45ca0fdce2a70 failed to verify targets
Verification failed with error: expected certificate in bundle


The signature cannot be parsed correctly, so nothing about it can be verified.

## Powerful integrations via the API

For more powerful integrations, we provide an API that can be used from any library. First, let's do some imports:

In [29]:
import model_signing

The API is split into 3 main components:

- `model_signing.hashing`: responsible with generating a list of hashes for every component of the model. A component could be a file, a file shard, a tensor, etc., depending on the method used. We currently support only files and file shards. The result of hashing is a manifest, a listing of hashes for every object in the model.
- `model_signing.signing`: responsible with taking the manifest and generating a signature, based on a signing configuration. The signing configuration can select the method used to sign as well as the parameters.
- `model_signing.verifying`: responsible with taking a signature and verifying it. If the cryptographic parts of the signature can be validated, the verification layer would return an expanded manifest which can then be compared agains a manifest obtained from hashing the existing model. If the two manifest don't match then the model integrity was compromised and the `model_signing` package detected that.

The first two of these components allows configurability but can also be used directly, with a default configuration. The only difference is for the verification component where we need to configure the verification method since there are no sensible defaults that can be used.

### Signing and verification for a single model

The default signing configuration hashes the model file by file and signs the manifest using Sigstore. Here, we sign the `finbert` model to generate the `finbert.sig` signature:

In [30]:
model_signing.signing.sign("finbert", "finbert.sig")

Go to the following link in a browser:

	https://oauth2.sigstore.dev/auth/auth?response_type=code&client_id=sigstore&client_secret=&scope=openid+email&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&code_challenge=FvFSw_KmXVezMBLTWcZC1RwcYctbyHwaATZrFQP9RyI&code_challenge_method=S256&state=6e5c7955-a48b-43dc-9752-233c6422163e&nonce=0c1e1ffb-8f31-47af-9fa5-fd1b3ac1f26a


Enter verification code: wwjgduj4imr6esduwz2yx4mja


This does the OIDC flow again, so we need to pass the same parameters during verification.

In [31]:
identity = "mihaimaruseac@google.com" # @param {type:"string"}
oidc_provider = "https://github.com/login/oauth" # @param {type:"string"}

For verification there is no default configuration

In [32]:
model_signing.verifying.Config().use_sigstore_verifier(
    identity=identity,
    oidc_issuer=oidc_provider,
).verify("finbert", "finbert.sig")

Verification passed, as expected.

### Signing and verification with an explicit configuration

The API of `model_signing` allows us to explictly set configurations for hashing, signing and verifying. For this section we look at how these can be used to control what paths to be excluded during serialization, and to set-up signing using private keys.

From the 4 models we downloaded, we did not delete the `.git` directory of `vision-perceiver-learn` model. Instead, we will use configuration to explictly exclude it. We can also exclude other files from the model:

In [33]:
!ls -a vision-perceiver-learned/

.   config.json  .gitattributes		   pytorch_model.bin
..  .git	 preprocessor_config.json  README.md


First, we set up the hashing configuration, to exclude Git related files and the `README.md` file:

In [34]:
hashing_config = model_signing.hashing.Config().set_ignored_paths(
    paths=["README.md"], ignore_git_paths=True
)

Next, we configure a singing configuration to use the private key we generated above as well as the hashing configuration we just set up.

In [35]:
signing_config = model_signing.signing.Config().use_elliptic_key_signer(
    private_key="key"
).set_hashing_config(hashing_config)

And now, we can use this configuration to sign the model

In [36]:
signing_config.sign("vision-perceiver-learned", "vision-perceiver-learned.sig")

Similarly, we can construct the matching verification configuration and verify the signature:

In [37]:
verification_config = model_signing.verifying.Config().use_elliptic_key_verifier(
    public_key="key.pub"
).set_hashing_config(hashing_config)

verification_config.verify("vision-perceiver-learned", "vision-perceiver-learned.sig")

Verification passed.

### Signing multiple models with the same configuration

The power of explicit configurations is that a pipeline could set the configurations for hashing and signing once and then sign multiple models:

In [38]:
config = model_signing.signing.Config().set_hashing_config(
    model_signing.hashing.Config().use_shard_serialization()
).use_elliptic_key_signer(private_key="key")

Here we set up a hashing configuration to use file shards, and a signing configuration to sign using private keys. We can now use this configuration to sign all the models in a loop:

In [39]:
all_models = ["bert-base-uncased", "finbert", "resnet-50", "vision-perceiver-learned"]
for model in all_models:
    config.sign(model, f"{model}_sharded.sig")