Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DO NOT MERGE: Public version of FixedDecimal proposal #7039

Closed
wants to merge 1 commit into from

Conversation

jtattermusch
Copy link
Contributor

This is basically a copy of the internal proposal (as we currently don't have a open source proposal process in place for protobuf).

See #4406

@jtattermusch
Copy link
Contributor Author

@JamesNK @JunTaoLuo @shirhatti FYI

@mgravell
Copy link

The units being int64 and nanos being sfixed32 could perhaps make negative values unnecessarily large? Any reason not to go sint64 for units? (given the overhead of a message header, the potential extra byte in some zig-zag edge cases of positive numbers seems marginal)

nanos says "Must be same sign as units" - probably needs a "when not zero"? i.e. -12345.0 would be a negative and non-negative (integers don't usually have both positive and negative zero).

@jtattermusch
Copy link
Contributor Author

The units being int64 and nanos being sfixed32 could perhaps make negative values unnecessarily large? Any reason not to go sint64 for units? (given the overhead of a message header, the potential extra byte in some zig-zag edge cases of positive numbers seems marginal)

this is discussed in the proposal:
The type of "units" can be either int64 or sint64. sint64 is more efficient if negative values are common, but money.proto uses int64. So the choice is either staying more consistent with money.proto or aiming for the highest efficiency."

I understand you're leaning more towards using sint64. How much would of an overhead is the zigzag encoding itself in your opinion?
Also, it is hard to tell in advance how often will users want negative values - for financial applications, I assume messages with positive numbers only could be quite common?

nanos says "Must be same sign as units" - probably needs a "when not zero"? i.e. -12345.0 would be a negative and non-negative (integers don't usually have both positive and negative zero).

You're right but this probably won't lead to any confusion (intuitively integer 0 has "no sign" and it's fair to use it for -123.0)

@mgravell
Copy link

mgravell commented Jan 27, 2020

How much would of an overhead is the zigzag encoding itself in your opinion?

My understanding without checking the spec is that initially this costs at most one bit, which will in 1/7th of the cases be enough to cost us an extra byte (with small magnitude numbers still being cheaper than large magnitude). Compared to negative without zig-zag, which costs a hard 10 bytes regardless of size. Is that the question you were asking?

@jtattermusch
Copy link
Contributor Author

How much would of an overhead is the zigzag encoding itself in your opinion?

My understanding without checking the spec is that initially this costs at most one bit, which will in 1/7th of the cases be enough to cost us an extra byte (with small magnitude numbers still being cheaper than large magnitude). Compared to negative without zig-zag, which costs a hard 10 bytes regardless of size. Is that the question you were asking?

I meant the computation overhead of performing the zigzag transformation.
I looked it up and it's very likely negligible: DecodeZigZag64 does this return (long)(n >> 1) ^ -(long)(n & 1);

@mgravell
Copy link

oh, indeed: computationally it is very CPU-friendly - no branching, so it should just pipeline beautifully

@vivainio
Copy link

vivainio commented Feb 3, 2020

Since there is no particular need to pessimize negative numbers, and negative numbers are not rare in financial applications - can the proposal be updated to have type (sint64, sfixed32)?

I'm not sure how much consistence with money.proto matters here. This is generic decimal number support, not specific to money, and money type may not have much need for negative numbers.

Also @jtattermusch are there any updates where the discussion is going within Google?

@Fleshgrinder
Copy link
Contributor

Fleshgrinder commented Feb 12, 2020

100% compatibility with google.type.Money is very important (speaking as someone who is currently working in the banking sector). The reason is that there are often definitions that do not use the money definition because they would basically repeat the currency multiple times. Think of a simple account balance message:

message AccountBalance {
    string currency = 1;
    google.protobuf.FixedDecimal available = 2;
    google.protobuf.FixedDecimal usable = 3;
}

Here available represents the money in the account and usable how much the owner is actually able to use (which can be more than is available in case the owner has overdraft). Using money in this case makes little sense but being able to switch between the decimal and money is very important for writing abstractions and what not.


It‘s unfortunate that Money has currency_code in position 1, otherwise we would even get wire compatibility. Maybe FixedDecimal could be defined with field 2 and 3 and leave out 1.

@vivainio
Copy link

@Fleshgrinder this makes sense. I forgot that someone may want to read Money wire protocol to FixedDecimal

@flozano
Copy link

flozano commented Mar 2, 2020

Whatever is the outcome of this, it should be supported in Value.proto so that it can be used inside Structs and Lists and conversion of untyped messages to/from JSON is actually possible without loss of information. A related issue was closed (wrongly I think) with regard to this.

@jufemaiz
Copy link
Contributor

Interested to hear the likely timeline on acceptance and publication for this PR.

@jtattermusch
Copy link
Contributor Author

This is still under internal review and we're considering alternative representations of the type.

@jufemaiz
Copy link
Contributor

Thanks @jtattermusch !

@vivainio
Copy link

Montly "ping" for the status :)

@jufemaiz
Copy link
Contributor

jufemaiz commented Jun 11, 2020

Dumb question perhaps, but the following claim is made:

nanos must be same sign as units

How much support is there for a "negative zero" value in the units or the nanos (depending upon the number)? Or is zero considered to have "no sign" so to speak?

Or should the proposal be updated to reflect the wording in money.proto? For example,

    // Number of nano (10^-9) units of the amount.
    // The value must be between -999,999,999 and +999,999,999 inclusive.
    // If `units` is positive, `nanos` must be positive or zero.
    // If `units` is zero, `nanos` can be positive, zero, or negative.
    // If `units` is negative, `nanos` must be negative or zero.
    // For example:
    // 1.75 is represented as `units`=1 and `nanos`=750,000,000.
    // -1.75 is represented as `units`=-1 and `nanos`=-750,000,000.
    // 0.75 is represented as `units`=0 and `nanos`=750,000,000.
    // -0.75 is represented as `units`=0 and `nanos`=-750,000,000.

@jtattermusch
Copy link
Contributor Author

After a lot of internal discussions, there are some new type defintions that we want to evaluate.
Basically there are a few layouts modeled after IEEE754-2008 decimal types (Decimal64 and Decimal128 - these two types seem to be the future standards for decimal types so it make sense to try to reuse them for protobuf's decimal WKT).
and there is also a "DecimalString" message that is perhaps little harder to (de)serialize, but the representation is very human-friendly and it has an arbitrary range of representable values (but the concern is (de)serialization speed and storage efficiency).

/*
 * A decimal number represented by a string.
 *
 * The string format must match this syntax:
 * number    = [ "+" | "-" ] float .
 * number    = "nan" | "+inf" | "-inf" .
 * float     = digits [ "." digits ] [ exponent ] .
 * float     = "." digits [ exponent ] .
 * exponent  = "e" [ "+" | "-" ] digits .
 * digits    = digit { digit } .
 * digit     = "0" ... "9" .
 *
 *
 * TODO: add documentation about which value range is representable
 * by all protobuf languages.
 */
message DecimalString {
  string value = 1;
}

message Decimal64 {
  // matches IEEE 754-2008 decimal64

  // interpreting the values:
  // both significand and exponent within allowed range of IEEE decimal64,
  // exponent in range -> the value is significand * 10^exponent. significand
  // exactly [-10^17, 10^17], exponent is 0 (i.e. not present) -> +/-Infinity
  // significand out of range OR exponent out of range -> NaN

  // Allowed range is [-10^17, 10^17]. Exceeding these bounds is an error.
  // A significand of exactly -10^17/10^17 represent +/- Infinity.
  sint64 significand = 1;

  // Allowed range is [−383, 384].
  sint32 exponent = 2;
}

message Decimal128 {
  // matches IEEE 754-2008 decimal128

  // interpreting the values:
  // both significand and exponent within allowed range of IEEE decimal128,
  // exponent in range -> the value is sign * significand * 10^exponent.
  // significand exactly 10^34, exponent is 0 (i.e. not present) -> +/-Infinity
  // significand out of range OR exponent out of range OR sign out of range ->
  // NaN

  sint32 sign = 1;  // -1 for negative, +1 for positive, 0 for zero. If sign is
                    // zero, the significand must also be zero.

  // low and high 64 bits of the significand
  // allowed range is [0, 10^34]. Exceeding these bounds is an error.
  // A significand of exactly 10^34 represents +/- Infinity (based on sign)
  uint64 significand_low = 2;
  uint64 significand_high = 3;

  // Allowed range is [−6143,+6144].
  sint32 exponent = 4;
}

message Decimal64Version2 {
  // matches IEEE 754-2008 decimal64
  // Allowed range is (-10^17, 10^17). Exceeding these bounds is an error.
  sint64 significand = 1;

  // TODO: should we use oneof? the issue is that if special_value is used, both
  // significand and exponent are supposed to be 0 (not just the exponent), at
  // which point there's not that much value in having a oneof.

  // Interpretation.  If none of these are set then it will be
  // interpreted as if "exponent = 0".
  oneof mode {
    sint32 exponent = 2;  // Allowed range is [−383, 384].

    // if special_value is set, both significand and exponent must be zero (not
    // present)
    SpecialEncoding special_value = 3;
  }
}

message Decimal128Version2 {
  // matches IEEE 754-2008 decimal128

  bool negative =
      1;  // true for negative numbers, false for positive numbers and zero

  // low and high 64 bits of the significand
  // allowed range is [0, 10^34). Exceeding these bounds is an error.
  uint64 significand_low = 2;
  uint64 significand_high = 3;

  // Interpretation.  If none of these are set then it will be
  // interpreted as if "exponent = 0"
  oneof mode {
    // Must be zero when special_value is set
    sint32 exponent = 4;  // Allowed range is [−6143,+6144] -- TBD

    // if special_value is set, all of significand, negative and exponent must
    // have their default value (not present)
    SpecialEncoding special_value = 5;
  }
}

enum SpecialEncoding {
  SPECIAL_VALUE_UNSPECIFIED = 0;
  NAN = 1;
  POSITIVE_INFINITY = 2;
  NEGATIVE_INFINITY = 3;
}

@JamesNK would you be interested in trying to prototype the ToDecimal/FromDecimal C# language bindings and benchmarking their performance / memory efficiency using BenchmarkDotNet?

@JamesNK
Copy link
Contributor

JamesNK commented Aug 25, 2020

Sure. I'll try to do it sometime this week.

@TrayanZapryanov
Copy link
Contributor

I am sorry if I miss something in the discussion, but currently we are using following format and functions to encode/decode decimals in our software. I didn't invented them - just copied somewhere over the net and they are working till now.
So I am putting here just as an information:
proto schema:
message Decimal
{
uint64 lo = 1; // the first 64 bits of the underlying value
uint32 hi = 2; // the last 32 bis of the underlying value
uint32 signScale = 3; // the number of decimal digits (bits 1-16), and the sign (bit 0)
}

converters:

public static Protobuf.WellKnownTypes.Decimal ToProtoDecimal(this decimal value)
{
var bits = decimal.GetBits(value);
ulong a = ((ulong) bits[1]) << 32, b = ((ulong) bits[0]) & 0xFFFFFFFFL;
var low = a | b;
var high = (uint) bits[2];
var signScale = (uint) (((bits[3] >> 15) & 0x01FE) | ((bits[3] >> 31) & 0x0001));

        return new Protobuf.WellKnownTypes.Decimal
        {
            Lo = low,
            Hi = high,
            SignScale = signScale
        };
    }

    public static decimal ToDecimal(this Protobuf.WellKnownTypes.Decimal dec)
    {
        var low = dec.Lo;
        var high = dec.Hi;
        var signScale = dec.SignScale;

        if (low == 0 && high == 0) return decimal.Zero;

        int lo = (int) (low & 0xFFFFFFFFL),
            mid = (int) ((low >> 32) & 0xFFFFFFFFL),
            hi = (int) high;

        var isNeg = (signScale & 0x0001) == 0x0001;
        var scale = (byte) ((signScale & 0x01FE) >> 1);

        return new decimal(lo, mid, hi, isNeg, scale);
    }

@mgravell
Copy link

@TrayanZapryanov that is from protobuf-net, and as the person who wrote that, I say officially "oh gawd, burn it with fire, it hurts, it hurts". Or rather: "that is a decision that I regret, and it is horrible". More recently, protobuf-net uses a "compatibility level" concept that allows me to change conventions over time (without breaking existing code), and at the more recent compatibility levels: it does not use that layout.

@TrayanZapryanov
Copy link
Contributor

@mgravell Thank you for pointing me this. Yes I can remember now and will check how things developed in protobuf-net repo.
Sorry for extending discussion here - just wanted to help somehow.

@mgravell
Copy link

@D3MaxT
Copy link

D3MaxT commented Sep 6, 2020

Impatiently waiting on this in C# 👍

@akulich
Copy link

akulich commented Feb 5, 2021

Hello from 2021. Did you give up on this?

@St333p
Copy link

St333p commented Mar 16, 2021

Any updates on this? I'd love to have support for python's decimals. If you're not planning to add it soon I'll go for a workaround, probably casting to strings. If at all possible I'd like to avoid many backwards incompatible changes to my API.

@jufemaiz
Copy link
Contributor

@St333p in the meantime we have opted to approach this with a units + nanos approach. It's not perfect but it's reasonable.

@fowles fowles closed this Apr 8, 2021
@fowles fowles deleted the proposal_decimal_wkt branch April 8, 2021 14:32
@flozano
Copy link

flozano commented Apr 8, 2021

Is this discarded? There will be no standardized decimal number support?

@fowles
Copy link
Member

fowles commented Apr 8, 2021

I was cleaning up old branches and this one hasn't seen any motion for over a year.

We definitely don't have short term plans to build this and are already stretched a bit thin on other large efforts.

@flozano
Copy link

flozano commented Apr 8, 2021

Thanks for the insight. Handling "money" values is just not possible with existing protobuf well-known types, it's a little sad. #4406 is important and the lack of this causes many people either to (wrongly) use float types for this kind of magnitude, or use custom non-interoperable messages to properly represent decimal values.

@fowles
Copy link
Member

fowles commented Apr 8, 2021

I am sympathetic to where you are coming from here having worked on fintech in the past. The best advice I can give here is to have strong internal style guidelines. You should either create opaque types to handle money and have dedicated message for them with custom adaptors or store everything as an integral number of pips.

The closing of this is not a statement that the suggestion is a bad idea. It is a simple statement of scare resources and trying to set expectations for what we are/are not planning to pursue right now.

@flozano
Copy link

flozano commented Apr 8, 2021

Thank you for your answer. It feels a little underwhelming and I was just expressing some frustration with how the protobuf well-known types are (not) evolving. I understand this is just because of lack of resources. This PR by @jtattermusch seemed a very good direction to me.

The problem here is everyone will be defining their own "opaque", non-interoperable message types. Handling numbers is not easy and the lack of this leaves open the door for bad implementations and/or bad decisions.

This also leaks to the entire ecosystem built around protobuf... for example, automated grpc-gateway translation of those types for JSON exposure won't happen as there's no standard type to support... so there will be no straightforward "money" support for any grpc-defined API exposed as HTTP in a way that is easy to understand by developers consuming that HTTP api.

@Fleshgrinder
Copy link
Contributor

Hmmm… we have Money, what's wrong with it? This PR was about a fixed decimal without currency. As I stated above, a fixed decimal definition ideally is simply compatible with Money.

The easiest workaround for the things you mentioned really is a string. JSON can handle it, every language can handle it.

That said, I'm also sad that this gets closed. 😿

@flozano
Copy link

flozano commented Apr 8, 2021

I didn't know this type existed 🤦 my use-case doesn't involve actual money but other type of fixed-decimal magnitudes, I was arguing for the "money" case because it seems easiest to understand. My bad.

@Fleshgrinder
Copy link
Contributor

I think that there are plenty of use cases for it, and nobody questions that, but especially the one you mentioned wouldn't be covered by this. JS(ON) has no support for 64 bit numbers, so anyone trying to do that there will end up with strings or funny surprises. The safest route really is a string here.

@flozano
Copy link

flozano commented Apr 8, 2021

Sorry for the offtopic. I understand it's JS that doesn't support 64-bit integer, but not JSON itself, both JSON document format and JSON schema are fine with arbitrary precision numbers.

@Fleshgrinder
Copy link
Contributor

Absolutely, but when you mention gateway I think of JS. If JS isn't involved, why the gateway and not gRPC? But yeah, getting too off topic. 😉

@jufemaiz
Copy link
Contributor

Hmmm… we have Money, what's wrong with it? This PR was about a fixed decimal without currency. As I stated above, a fixed decimal definition ideally is simply compatible with Money.

The easiest workaround for the things you mentioned really is a string. JSON can handle it, every language can handle it.

That said, I'm also sad that this gets closed. 😿

Sadly three decimal places is insufficient. Working in electricity sector, with normalised pricing in the 6-7 decimal places space.

Further, floating point issues for values that need decimal support is sorely needed.

:sadpanda: (I missed the closure of this issue).

ericsampson added a commit to ericsampson/kdl that referenced this pull request Sep 9, 2021
I think it would be useful to get these few more common data types into 
For the decimal floating types, I referenced [this protobuf discussion](protocolbuffers/protobuf#7039 (comment)) and [this](https://github.com/googleapis/googleapis/blob/master/google/type/decimal.proto)

I'm not sure what to call out for the currency format, I can't find a standard for that yet. There's a [protobuf money type](https://github.com/googleapis/googleapis/blob/master/google/type/money.proto) that's basically a ISO 4217 currency code plus a decimal number, but I don't really want to invent a suggested money format if there's a real one out there somewhere.
For this PR, can we just leave it TBD in order to reserve the type keyword?
zkat pushed a commit to kdl-org/kdl that referenced this pull request Sep 12, 2021
I think it would be useful to get these few more common data types into 
For the decimal floating types, I referenced [this protobuf discussion](protocolbuffers/protobuf#7039 (comment)) and [this](https://github.com/googleapis/googleapis/blob/master/google/type/decimal.proto)

I'm not sure what to call out for the currency format, I can't find a standard for that yet. There's a [protobuf money type](https://github.com/googleapis/googleapis/blob/master/google/type/money.proto) that's basically a ISO 4217 currency code plus a decimal number, but I don't really want to invent a suggested money format if there's a real one out there somewhere.
For this PR, can we just leave it TBD in order to reserve the type keyword?
@JasonPolisAdmin
Copy link

Need support for Decimal128, so new fixed width wire type.

Workaround is to use with additional validation, but would prefer native scalar Decimal128.

Big Decimal can also be represented with bytes, or else string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet