Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Secure persistence: Idea: Add fingerprint #144

Open
BenjaminBossan opened this issue Sep 19, 2022 · 11 comments
Open

Secure persistence: Idea: Add fingerprint #144

BenjaminBossan opened this issue Sep 19, 2022 · 11 comments
Labels
persistence Secure persistence feature

Comments

@BenjaminBossan
Copy link
Collaborator

Maybe it's possible to add a cryptographically secure fingerprint of the whole saved object. Then, during loading, we can compare whether the loaded object's fingerprint is identical. This could prevent certain types of attacks where the object that's being loaded is not identical to the one being advertised.

@BenjaminBossan BenjaminBossan added the persistence Secure persistence feature label Sep 19, 2022
@adrinjalali
Copy link
Member

Are you talking about a hash of the stored object, or a fingerprint created by a private key of the person/entity creating the object?

We can add the capability to sign the stored object, if that's what you mean.

@BenjaminBossan
Copy link
Collaborator Author

I'm not an expert in cryptography, so maybe I'm not always using the correct words.

a hash of the stored object

That's what I meant. I'm not exactly sure how to do that, if we just want a checksum and then the platform can provide some code to verify the checksum (which would be outside of the scope of skops).

Maybe we can also do something within skops, where skops checks that the checksum/fingerprint is as expected. My initial idea is that each attribute we store creates a hash of itself and then on the object containing these attributes creates a hash of these hashes, recursively up to the root (which would be the schema), like a Merkle tree. Then the user only needs to verify this top level hash against a secure source and skops can check the hashes of all nodes when loading.

We can add the capability to sign the stored object

Not sure how useful that is, maybe?

@adrinjalali
Copy link
Member

I don't think that's very useful, since whoever can tamper the artifact, can tamper with the checksum as well.

Signing would be nicer since we can trust certain people who have signed artifacts, the same way we can do for git commits.

@BenjaminBossan
Copy link
Collaborator Author

I don't think that's very useful, since whoever can tamper the artifact, can tamper with the checksum as well.

That's true, but say, I download the model from a trusted website but get MITMed. Then I can still check that the final checksum is identical to the one on the website and decide not to open it if it differs.

Whether there is an additional benefit in skops checking internally, I'm not really sure.

Signing would be nicer since we can trust certain people who have signed artifacts, the same way we can do for git commits.

Could be, I would consider this to be a separate issue though.

@adrinjalali
Copy link
Member

That's true, but say, I download the model from a trusted website but get MITMed. Then I can still check that the final checksum is identical to the one on the website and decide not to open it if it differs.

If you get a MITM for the file, you can also get a MITM for the checksum.

Generally I'm not sure how much checksums add to the security, I see them more like "check if a file is corrupted" kinda thing.

Maybe @Narsil or @McPatate would have other ideas.

@BenjaminBossan
Copy link
Collaborator Author

If you get a MITM for the file, you can also get a MITM for the checksum.

I think there can be situations where the model file was tampered with but I can still trust the website. But maybe that's too narrow, I don't know.

@McPatate
Copy link

McPatate commented Sep 29, 2022

Usually if the HTTPS certificate is valid, it's pretty unlikely that you got MiTM-ed.

As Adrin says, imo signing is the best way to do this, but I don't see added value from signing commits - do you want to do something separate from the Hub ?

EDIT: though I guess signing and checksum are basically the same thing, you need a trusted authority to compare the result to.

@BenjaminBossan
Copy link
Collaborator Author

A possible exploit could be if there is a website that mirrors the content. Of course, those could have been tampered with, but if the fingerprint matches with the one from the original source, I could still trust the file. But maybe that situation is too narrow?

@McPatate
Copy link

McPatate commented Sep 29, 2022

Since github uses SHA-1 to create ids for files, I you could always compute it when you download a file from a mirror and then compare that to the blob's id like this :

$ git ls-tree HEAD
100644 blob 4c850c0da7bec718eb41dec4e96a8980ec1b3ea9	.gitignore
from hashlib import sha1

with open('.gitignore', 'r') as f:
    gitignore = f.read()
content = "blob " + f"{len(gitignore)}" + "\0" + gitignore
print(sha1(content.encode("utf8")).hexdigest())
# prints 4c850c0da7bec718eb41dec4e96a8980ec1b3ea9

I don't see why people would need to go through mirrors when interacting with the hub, but I may be mistaken :)

@BenjaminBossan
Copy link
Collaborator Author

I don't see why people would need to go through mirrors when interacting with the hub, but I may be mistaken :)

If skops persistence finds adoption, it could be used completely independently of the hub. Though of course we can wait and see before implementing a feature that's not needed.

@Narsil
Copy link

Narsil commented Oct 5, 2022

@adrinjalali is on point.

Checksum is for verifying validity. It could be just flaky network random flips, or actual MiTM attacks.
In order to check against MiTM you need two different channels for the file and the checksum (so that the attacker needs to handle BOTH channels to change things under you).

If you really want to protect against MiTM attacks, then signing files is more efficient. That's because usually the certificate/public key distribution mecanism is by design on a different channel. There is a 3rd trusted party that handles the management of these certificates, usually hardcoded in your browser/package manager. For iOS apps, Apple is responsible for handling the signed binaries.

It's still a pretty big endeavor if you have ever done iOS/MacOS dev you would know that the signing/registering of apps is a bit of a pain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
persistence Secure persistence feature
Projects
None yet
Development

No branches or pull requests

4 participants