Numeric type and decimal handling: harmonization, casts, config defaults #845
Replies: 2 comments
-
Full design rationale (the why behind each decision)The opening post is the map. This comment is the reasoning behind the calls, for Why two axes"What type is a numeric result" and "how does a decimal render" are independent, Why
|
| Driver | actor_id=1 | actor_id=58 | Semantics |
|---|---|---|---|
| Postgres | 0 | 7 | integer, truncating |
| SQLite | 0 | 7 | integer, truncating |
| SQL Server | 0 | 7 | integer, truncating |
| rqlite | 0 | 7 | integer, truncating |
| MySQL | 0.1250 | 7.2500 | decimal |
| ClickHouse | 0.125 | 7.25 | float, non-truncating |
| DuckDB | 0.125 | 7.25 | float, non-truncating |
| Oracle | 0.125 | 7.25 | NUMBER, non-truncating (but sq currently errors at scan, #844) |
Beta Was this translation helpful? Give feedback.
-
|
Filed #853 to track one concrete outlier from this discussion: The issue includes an implementation sketch: the fix is the kind-pin path already used by Closing it would make the UPDATE: #853 has been merged. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
What this is
sq has started harmonizing how numeric results behave across SQL drivers, so the
same SLQ means the same thing on every source. Two pieces have already landed
(unreleased), and several follow-ups are in design. This thread lays out the
whole shape and asks for feedback on the open decisions before they are locked,
since some of them change output (breaking). Depth lives in the linked issues;
this is the map and the questions.
Already shipped (unreleased, breaking)
avg()tofloaton every driver (avg() returns inconsistent types across drivers (no portable result type) #594). Previously a float on somedrivers, a lossless decimal on Postgres/MySQL, and a truncated int on SQL Server
(an actual bug). Now uniformly float64, matching jq's numeric model. Lossy past
~15-16 significant digits.
sum()todecimalon every driver (sum() returns inconsistent types across drivers (int64 / decimal / float); float is the wrong target #839), for integer and decimalcolumns. A sum of exact values is itself exact, so float would regress
precision. In JSON a decimal renders as a quoted string, so
sum(.actor_id)isnow
"20100"rather than20100.So
avgandsumdeliberately default to different types: avg to float (alwaysfractional/approximate), sum to decimal (exact). That asymmetry is load-bearing
below.
The two axes
Numeric handling splits into two independent questions that earlier got
conflated:
float64vsdecimal.Decimal. Affects precision andthe Go type that flows through.
(
100.5) vs a quoted string ("100.5").Proposed: one config option per axis, plus a per-query lever.
result.numeric.type=auto | float | decimal(defaultauto): picks thetype wherever sq has a float-vs-decimal choice (avg, sum, division, future
stddev/sqrt).
autokeeps the per-construct defaults;float/decimalforceone everywhere. (Config option for numeric result type (result.numeric.type), global or per-source #837)
format.decimal=string | number: picks how a decimal renders in JSON/YAML.(No ticket yet.)
decimal(x),int(x),float(x)in SLQ, to optinto a type per query without leaving SLQ:
avg(decimal(.price))averages indecimal (lossless). (SLQ: add portable cast functions (decimal(), int(), float(), ...) #836)
A correctness bug in the same family
/returns different values per driver (SLQ division operator: integer vs decimal division differs across drivers (non-portable) #838)..actor_id / 8for
actor_id = 58is7on Postgres/SQLite/SQL Server/rqlite (integertruncation) and
7.25on MySQL/ClickHouse/DuckDB/Oracle (verified live). SameSLQ, different answer. The fix normalizes the truncating drivers.
(oracle: non-integer NUMBER result from integer operands misclassified as int (scan crash) #844).
Open decisions (input wanted here)
format.decimaldefault:stringornumber?stringis precision-safe(preserves sum() returns inconsistent types across drivers (int64 / decimal / float); float is the wrong target #839, safe beyond float64 range) but jq users dislike quoted
numbers.
numberis jq-ergonomic but lossy on read for large values andreverts sum() returns inconsistent types across drivers (int64 / decimal / float); float is the wrong target #839's
"20100". Which should be the default?/always benon-truncating (so
58/8is7.25everywhere)? And should the normalizedresult be float (jq-style) or decimal (lossless)? Current leaning: float.
result.numeric.typeis the current recommendation (theresult.*namespace signals "changes the type," vsformat.*which isdisplay-only). Reactions? Terser alternative:
result.number.decimal(x)/int(x)/float(x)(jq-idiomatic, and they compose for operand-vs-result casting). The SQL-style
cast(x, type)and a:typesuffix were rejected (the suffix collides withSLQ's alias syntax). Agree?
sum() returns inconsistent types across drivers (int64 / decimal / float); float is the wrong target #839) plus the Oracle crash fix (oracle: non-integer NUMBER result from integer operands misclassified as int (scan crash) #844) next; do division (SLQ division operator: integer vs decimal division differs across drivers (non-portable) #838) either with it
(one breaking wave) or as a fast follow; then a "controls" release for
format.decimal, casts (SLQ: add portable cast functions (decimal(), int(), float(), ...) #836), and config (Config option for numeric result type (result.numeric.type), global or per-source #837). Should division ride withthe aggregates or wait?
Heads up: these change output
#594 and #839 already change numeric output and are breaking;
format.decimaland the division normalization would too. If you consume sq's numeric output
(especially JSON), this is the moment to flag concerns.
Links
Beta Was this translation helpful? Give feedback.
All reactions