-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bincode serialization #352
Comments
Or actually, |
Ok it's not only in the key, but also in the credential id object. I get that credential id sometimes needs to be encoded as base64 (e.g. when sent to the client), but I don't think it should be when stored on the server. Besides being incompatible with low-memory-footprint libraries like bincode, it also introduces storage overhead from the base64 encoding. |
We store them as base64 on the server side, so that they are more compact in json dumps. We could probably change it from deserialize_any though to improve this - I think we were doing this to handle both vec and string inputs. So I think this is an area we can improve base64urlsafe so that it works with other formats outside of cbor/json. |
Yes, I agree that when sending things to the client, serializing as json and wrapping things in base64 makes sense. It does introduce a lot of overhead though if you want to store things on the server. On our side, I was able to reduce the memory footprint of a passkey by 2.6x (from 370 bytes per passkey to 140 bytes per passkey), by copying the implementation of the We were able to do this for Is there a way we can change the webauthn-rs library so it only uses base64 when necessary, e.g. when sending things to the client? Anything serialized on the server would be better off not using base64. |
There actually is a way to access the internals, there is a feature called "danger passkey internals" I think. But I don't recommend it for obvious reasons. Messing with the credential internals can break things in amazing ways, and it may prevent future credential upgrades working between versions. I'll need to think about that some more though. For us the opposite is true - we have a number of applications that all store data in json, and for them base64 is a huge space saving over vec due to how json represents u8 arrays. I'm also curious about what you are doing where you are so constrained on byte sizes / memory availability. Are you using webauthn-rs on some embedded system or similar? Perhaps the way to approach this is a feature flag or similar in webauthn-rs-core that changes how we store the data or similar. I'm not sure how it'd work cleanly though, it feels like it could be an invasive change. |
We're using Let me check what I can share about our use case. For now, let's just assume that we have a huge number of users, need to store passkeys in replicated in-memory databases, and storage is costly. Json itself has a lot of overhead as well and our database is not json based but can store binary blobs. |
I believe your 2.6x comparison is misleading: you're comparing an optimally-packed, TLV-style format with field IDs with JSON and Base64. Using Base64 encoding in a similar TLV-style format should result in around 1.33x size increase – and that's the number we should be discussing. 😄 For better or worse, JSON is the primary serialisation format of the library right now, and the difference in using Base64 there is stark: storing a Unfortunately, serde's API doesn't allow us to special case JSON (or any other format) for those fields, so we will need rather invasive modifications on our side to accommodate both, lest we cause a (best-case!) 2.7x storage cost increase for JSON. The other challenge with allowing non-Base64 serialisation is going to be migration – all existing users are going to be using the Base64 format, and so we'll need to at least support both for deserialisation for some time (though, I suspect we already do to a degree). A user of the library would also need to have everything migrated to a "multi-format" reader before they can change over any writer. I would posit that there are an extremely small set of organisations with credential databases large enough that any of this has a meaningful impact. While I agree it's not as efficient as it could be, it's sometimes cheaper for an organisation to accept an increased resource cost (in this case, storage and network) than paying (a) software engineer(s) to build a more efficient solution. Longer term, I think we should replace |
Correct. My concern was around serialized at-rest storage (which we happen to use an in-memory database for). But we do need to serialize it into a database (including PasskeyRegistration and PasskeyAuthentication) instead of just keeping them in-memory in the server process, because a user might hit different computing nodes on our end, even during the same session.
The 2.6x is comparing the best of what is possible with today's implementation of the webauthn-rs library (which is cbor with base64) against what I was able to achieve by duplicating the data structures into our code base (which is bincode without base64). But you're correct that those are two independent optimizations.
Yes this is the heart of the issue I believe.
Also correct. We will be running into this issue as well since we already have a few beta users with existing passkeys using the old serialization, and we don't want to break the passkeys, PasskeyAuthentication or PasskeyRegistration objects we have already stored for them. Enabling a feature changing serialization behavior would be painful if there is no way to get the old behavior to read those existing ones.
True. We are one of those few organizations. The PR attached by @Firstyear, doing it by using a feature, seems fragile. As @Firstyear correctly pointed out, features taint. If application A uses webauthn-rs and depends on its default serialization, then transitively a library they depend on, B, depends on webauthn-rs with the feature, things break for A in unexpected ways. Or maybe they work today but then B changes later and adds the feature and breaks things for A. It doesn't sound great. This is why usually, features are meant to be purely additive features, not changes of an existing behavior. Is it possible to instead have it in the API somehow? e.g. wrap |
Hm another potential approach could be this. We could use |
@smessmer @micolous and I had a big talk about it yesterday evening and we do agree we should move away from base64urlsafe through out the codebase, but it will take some time to achieve. There are lots of possible ways to make this happen, but as we mentioned, the priority for us is making sure we can upgrade for users and making sure that we do this in a really reliable and safe way. I think he is working on a few possible ideas today that could help here. |
There are some tricks which can be done here on a routing level, like pinning that authentication "journey" to a single serving task. While this makes things less reliable if a task goes away (as the challenge would be unusable by other tasks), it avoids the need to serialise quite so much data during an authentication – but all of this is a trade-off. Once the authentication is done, I presume the Passkey stuff doesn't matter at all – you've issued a session cookie, and the Webauthn stuff is no longer in the serving path of all authenticated requests.
Note that serde's default serialization mode for CBOR for structs is to store field names as strings, not numeric IDs.
Doing some back-of-the-envelope maths here: assume you have 1010 With a 230 byte increase (using your comparison of CBOR-with-string-field-names + Base64 vs. bincode + raw bytes), that makes the dataset 2.09 TiB larger. I suspect the actual difference with bincode + base64 is around 50 bytes per record more (rounded up to allow for an increase in a length field's integer size), the dataset is 0.45 TiB larger. While registrations are low-churn, authentications can update the counter and/or backup state flags. Within the limits of the serde API, I don't think there are any fancy things we could do to avoid entirely rewriting that serialised record, unless we allow clients to consider the This isn't the true amount of disk and network usage because of redundancy and replication requirements, at the scale of 1010 records, I don't think it makes a whole lot of difference when the total size is still less than 5 TiB. 😄
Yup, I agree. Yesterday (before knowing about
That API looks like exactly what we want. It still has change-over risks, but there may be ways to manage that. We could also make |
But at least now we know what not to do :)
As mentioned externally - I think we can "fix" base64urlsafe data to accept deserialising both vec and base64 so that "any representation" works, and then for the serialisation we choose based on is_human_readable flag. That way we maintain compatibility to anything that already exists, but new or re-serialised credentials will get the correct output for that type. Separately, we likely should have base64urlsafe "removed" from places around the codebase that aren't serialised. We also could consider renaming base64urlsafe to "SerialisableBinaryData" to better show that it's just a wrapper for serialising/deserialising. All of this probably needs to be done in some smaller steps to make sure we don't muck anything up in the process. |
#354 is a first step towards making this better from a serialisation perspective – it'll use Base64 encoding when Deserialisation will still accept multiple input types; but this still relies on using a self-describing format, so:
|
Also, it looks like |
* Add "HumanBinaryData" as alternative to "Base64UrlSafeData" (#352) * Add bytes support to Base64UrlSafeData, and copy across the tests * Rework the docs * dedupe functionality into macros, make tests consistent * more tests and conversions * move Borrow impl into common * add some more vec features * fix clippy
@micolous Why do you believe this won't work with bincode? Afaik, |
The problem is deserialisation: if we never attempted string/Base64 deserialisation for |
Oh I see. For our usecase, we have a version number in the serialization format so we could tell the library which format to deserialize. Other library users would likely know as well whether they're serializing an old-format or new-format passkey, but it would mean they have to go through an explicit migration process which we're willing to do but not every library user might be happy about. Also, I'm not sure what the best way would be to pass that configuration to the deserializer. |
There's a potentially additional use case to consider that came up in one of my side projects (much smaller project just for fun, not what I talked about above). There, I am experimenting with the leptos web framework, which allows building full-stack apps, using Rust both on server and client side with wasm. In WASM, I tried to serialize the webauthn-rs challenge objects using serde_json directly into the JavaScript-side Webauthn Browser API, and then deserialize the client responses back to webauthn-rs objects. It mostly worked, but the base64 serialization broke it because the browser expected an |
Compatibility is one of the considerations when changing over to I feel like versioning is something As for browser APIs, from memory, some of the types are different because the browser API specs don't agree with the CTAP specs. But that sounds like a separate issue that'd need further investigation. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This is the wrong place to ask about how to use
#433 was about improving the way it uses As a side note, using bincode specifically is only really useful to you if you have an extremely large number of credentials (as in, literally billions) or have extreme storage constraints (eg: embedded systems). Otherwise, this is a bunch of complexity that you do not need, especially when prototyping your application. |
I did this
I tried to serialize and then deserialize a passkey using bincode
I expected the following
Getting the original passkey back
What actually happened
A runtime error from bincode,
DeserializeAnyNotSupported
Version (and git commit)
0.5.0-dev
,840a6f5cc7f65bd3c585ab00293addf76f70d973
Operating System / Version
Linux
Any other comments
This happens because
Base64UrlSafeData
usesDeserializer::deserialize_any
in its deserializer but that isn't supported by bincode. I see that the serializer usesserialize_str
. Is it possible to also just usedeserialize_str
instead ofdeserialize_any
? Or, maybe even better, use{serialize,deserialize}_bytes
?Bincode is an important serialization format because it has the lowest memory footprint of all formats I know. If you want to store a lot of passkeys for a large user base, the memory footprint matters.
This issue applies not just to bincode, but to all serialization formats that don't serialize type information, e.g. also postcard and others.
deserialize_any
only works for verbose serialization formats that store the type. See documentation Doesn't seem like that would be necessary for something likeBase64UrlSafeData
?The text was updated successfully, but these errors were encountered: