Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JsonSchema Support for Several Types #501

Closed
wants to merge 5 commits into from

Conversation

JeremyRubin
Copy link
Contributor

This patch adds JsonSchema support for types which don't require JsonSchema patched bitcoin_hashes or secp256k1. Those types can be added as follow up work once those jsonschema patches are merged & released, but a broad enough set of types can be implemented without any version bump that it's reasonable to do in two parts.

A new type, CoinAmount, is introduced for contexts where one might want to specify either a floating point Bitcoin or integral satoshi amount of coin. This is useful in contexts where you might be specifying a type for user input and are able to take either a Btc or Sats amount, but don't know ahead of time what the user might send; it generates the following schema.

  {
    "CoinAmount": {
      "description": "Type for dealing with unknown amounts",
      "anyOf": [
        {
          "description": "Sats type",
          "type": "object",
          "required": [
            "Sats"
          ],
          "properties": {
            "Sats": {
              "type": "integer",
              "format": "uint64",
              "minimum": 0.0
            }
          }
        },
        {
          "description": "Btc type",
          "type": "object",
          "required": [
            "Btc"
          ],
          "properties": {
            "Btc": {
              "type": "number",
              "format": "double"
            }
          }
        }
      ]
    }
  }

@@ -29,8 +29,8 @@ bitcoinconsensus = { version = "0.19.0-1", optional = true }
serde = { version = "1", optional = true }

[dev-dependencies]
serde_derive = "<1.0.99"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't understand these. Under what circumstances would cargo use a different major version than the one that is specified? Does <1.0.99 really work this way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://jubianchi.github.io/semver-check/#/%3C1.0.99/0.1.2

This constraint does not provide a lower bound which means you will probably get unexpected breaking changes.

Comment on lines 1008 to 1016
/// Derived externally using serde_derive
#[cfg(feature = "serde")]
#[doc(hidden)]
#[allow(non_upper_case_globals, unused_attributes, unused_qualifications)]
const _: () = {
#[allow(rust_2018_idioms, clippy::useless_attribute)]
extern crate serde as _serde;
impl _serde::Serialize for CoinAmount {
fn serialize<__S>(&self, __serializer: __S) -> _serde::export::Result<__S::Ok, __S::Error>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is auto generated by serde_derive I'd recommend cleaning it manually since if later needs to be updated one would need to go through the process of regenerating the serde code for the enum instead of modifying the code directly.

Something along the lines of https://github.com/rust-bitcoin/rust-bitcoin/blob/master/src/blockdata/script.rs#L757-L767 could be done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually to me (maybe incorrectly) it seems simpler to only minimally modify the generated code for that exact reason -- one can just regenerate it and patch as needed.

In any case the example you shared is also simpler, since that is a struct and this is an enum.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah I'd prefer not to have these code-generated impls in here :/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where or what you want exactly then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO renaming the variables for reading them and formatting the code to fit the rest of the file would be enough.

src/util/amount.rs Outdated Show resolved Hide resolved
@stevenroose
Copy link
Collaborator

#508 might be relevant for the serde_derive issue.

Box<[u8]>);

#[cfg(feature = "schemars")]
mod schemas {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it makes sense to have an special module for all the schemas. Like json_schemas.rs and then refer with schema_with = "::json_schemas::script"? I think that'd be cleaner.


#[cfg(all(feature = "schemars", feature = "serde"))]
#[test]
fn schema_correct_outpoint() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if those tests could also fit in the schema module :)

serde_struct_human_string_impl!(OutPoint, "an OutPoint", txid, vout);

#[cfg(feature = "schemars")]
impl JsonSchema for OutPoint {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those can't be moved into a module, I suppose. Hmm, a bit unfortunate IMO, but I suppose it's fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends; in this case we need to do a custom impl like this because the representation is weird. If we have per-field representations then we don't need to, but because it becomes one string we need this approach.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually in some crates I have seen impl on types within the crate moved into separate modules. I thought that was not possible? Can this block be moved to a file like f.e. json_schemas.rs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's cross crate impls that are banned, cross module is fine actually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more specifically, either the trait or the struct has to be defined in the crate for impl to be allowed.

Comment on lines 1008 to 1016
/// Derived externally using serde_derive
#[cfg(feature = "serde")]
#[doc(hidden)]
#[allow(non_upper_case_globals, unused_attributes, unused_qualifications)]
const _: () = {
#[allow(rust_2018_idioms, clippy::useless_attribute)]
extern crate serde as _serde;
impl _serde::Serialize for CoinAmount {
fn serialize<__S>(&self, __serializer: __S) -> _serde::export::Result<__S::Ok, __S::Error>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah I'd prefer not to have these code-generated impls in here :/

/// Type for dealing with unknown amounts
#[cfg_attr(feature = "schemars", derive(schemars::JsonSchema))]
#[derive(Clone, Copy, PartialEq, PartialOrd, Debug)]
pub enum CoinAmount {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is roundtripping for this important? There is already a serde-related module in util::amount that does flexible serialization with sats or btc. It's possible to add a deserializer that can have both, but it's harder to then roundtrip that, so you'd have to chose one to serialize with.

How about something like this: #509

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idk what roundtripping means in this cirumstance. Do you mean the From impls?

The issue with the flexible serialization is that it isn't actually flexible and requires you to annotate with a serializer anywhere you use that type, and then the selection of schemars schema is also not type safe.

A type like CoinAmount can be used in a publicly facing API that can accept either a Bitcoin or a Satoshis amount.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah I don't know how the different serde deserializers interact with the schema mechanics.

I'm personally not the biggest fan to catering our internal types to the likes of very specific serialization methods, but always building the serialization support around the existing types. That's what has been done with the Amount type. It makes sense internally and there has been provided some infrastructure to use it with serde. Ideally if there are difficulties with schemas, they'd be solved by providing functionality around the Amount and SignedAmount types instead of given a new type.

There's various reasons for this, one is that we want to have a single type per concept, that all serializations can use, not a type per serialization. I can imagine that the CoinAmount doesn't work with other serialization methods like protobuf or so. Also, we shouldn't have types reflect serialization-specific information. Meaning that for the sake of the type itself, it's not useful to keep track on whether the value was derived from serializing 0.01 BTC or 1000000 sat, which the enum seems to do.

What I meant with roundtripping and #509 is that using the flexible deserializer in #509, when re-serializing, the original serialization will be lost.

Ultimately, I think if a specific implementation needs a specific serialization of amounts, it's quite trivial to quickly (1) wrote a serde serialization module around Amount or (2) write a newtype for it and use that for the API of that implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just seems remarkably easy to, for a programmer, mess up an incoming type as sats or btc. I think an explicitly tagged enum is a much better API for all users as it is completely self-documenting and unambiguous. This isn't just an issue generating the schemas, it's just odd to me that we don't have a type-level way of disambiguatiing the amounts.

Another alternative would be to make Amounts split into AmountsSats and AmountsBtc and then you can use Either<AmountsSats, AmountsBtc> (or result...) to express an api taking either.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, there is only 1 amount "concept", right? "an amount", aka CAmount in Core.. BTC is just a denomination. And denominations only matter when you're representing the amount. And representation is an application-specific thing, so I think providing flexible tools to help represent our amount type easily in various ways is the way to go instead of creating different types for different presentations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've figured out the right API here -- what we probably want is to have the type be Amount and use try_from for Amount and serialize_with that defaults to CoinAmount::Sats wrapper. That way the type in the structs is an amount, but we can handle inbound of either.

Then if you need the specific encoding of Amount as a plain u64 or float (without tag), you need to ask for it with an override to the default behavior.

(I also think we should unify Signed and Unsigned amount as CAmount is signed, but that's another question).

In short: at the type level, have a single simple newtype. At the serialization level, have explicit tagged representations with a tagged representation and a fallback mechanism. I'll need to think through the deduction rules and type them up

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah I'm not a fan of that approach, but if others disagree, that's fine.

As for Amount and SignedAmount, I think it's useful to have amounts of which you know they cannot be negative.

@JeremyRubin
Copy link
Contributor Author

I pushed changes which implement the concept based on #501 (comment).

Amount is now default Signed, Amount/UnsignedAmount default serialize/deserialize through a tagged enum; you can still override to have your preferred representation.

@stevenroose
Copy link
Collaborator

Can you perhaps repeat what exact problem you're trying to solve? This seems like a hugely invasive change and also seems to add a ton of new code.

If the only feature this gets us is "the ability to parse amounts in various formats from JSON", then I'd certainly NACK it. Any API should specify in what format amounts are formatted and can use one of the specified serializers for that (or write his own). For any other freeform user input parsing, we have str::FromStr. If FromStr is not powerful enough, feel free to extend it or perhaps implement a custom parse_amount in your application.

To bridge the gap between FromStr and JSON formatting, #509 implements a serde deserializer using FromStr.

@JeremyRubin
Copy link
Contributor Author

I would suggest this changes irrespective of if you want the JSON API stuff or not. But that is the context in which I first came across the choice made for the Amount type.

The primary issue is that the Amount type is an impedance mismatch for the underlying type used in Bitcoin, which is a signed integer. I think even though it was a poor design choice by Satoshi, it makes sense to not represent that field differently/perform mathematical operations on it differently. For example, why would we care to represent a value > max i64 as an amount?

A second issue is that floating points and integers are not unambiguous. E.g., JSON 1 and 1.0 are valid as both number and integer types (JSON exists with or without JSONSchema mind you). Then it would be possible to -- without indication to a user/api consumer, present a value field where Sats and BTC might ambiguously be accepted and misinterpreted, the worst case of which would lead to destruction/overspending of funds. If there's a concern that various consumers of rust bitcoin APIs might encode values of the same type differently and incompatibly, then it is preferable to use an unambiguous serialization/deserialization with a tagged enum rather than risking incompatibility.

A final point is that suppose we do have a series of floats/btc and ints/sats. We should not make the users perform conversions on their own, it's much preferable to make any conversions happen automatically with rust-bitcoin's correct code.

In my patch it is also possible to not implement Serialize/De at all for Amount/UnsignedAmount therefore forcing the user to always decide between BTC/Sats/Tagged serialization. This would also be OK, I just figured that a tagged representation is a sane default unless you know you want something else.

@JeremyRubin
Copy link
Contributor Author

I know it needs some rebasing, but I would love to find a path forward for integrating this functionality.

As for a context where I use it, you can see https://github.com/sapio-lang/sapio.

I compile bitcoin smart contract plugins into WASM bundles. Being able to output a JSONSchema allows the contract plugins to communicate to other software what parameters the API expects.

Currently I'm maintaining forks of many of the rust crates mainly for this (e.g. https://crates.io/crates/sapio-bitcoin/0.26.0). I can continue doing so, but it's a bit annoying because it means my crates end up incompatible with the main rust-bitcoin crates. Cargo's patching functionality isn't a full fix for this either, because the patches cannot currently be pinned.

If I rebase this (and prs for other crates) can it be merged? Or is this a permanent no-go?

Use for User Generated Forms

In https://github.com/sapio-lang/tux, these JSONSchemas are used to automatically generate user intelligible forms to create contracts.

Use for Dynamic Linking Modules

I'm still working on implementing functionality for processing these schemas and reifying them into types, but this would permit for contract modules to inherit from a common message type. E.g., suppose I define a struct:

#[derive(JsonSchema)
struct HotStorageModule {
    a: PublicKey,
    b: PublicKey
}

I would be able to define

#[derive(JsonSchema)
struct PluginA(HotStorageModule);
#[derive(JsonSchema)
struct PluginB(HotStorageModule);

And a third party module could be defined such that:

struct GiveMeHotStorage {
  module: ImplsInterfaceFor<HotStorageModule>
}

where the ImplsInterfaceFor dynamically loads a contract module.

@sgeisler
Copy link
Contributor

I'm fine with it as long as it is feature-gated, I think our general policy around features is that they don't necessarily have to follow our MSRV. While I'd like to reduce our features to keep combinatorial complexity low I think this is probably worth it because you need it for sapio. Currently nobody tests the feature-powerset anyway 😦

I remember the very unfortunate incident in bitcoin_hashes, which was caused by dev-deps not being feature-gateable, so I guess you'd need a way to test it that doesn't break the other tests' MSRV. Maybe the tests can be put into an extra crate which is only tested for stable and nightly (if the same problem exists here)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants