Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(common): implement logical type for int256 #9028

Merged
merged 32 commits into from
Apr 12, 2023
Merged

feat(common): implement logical type for int256 #9028

merged 32 commits into from
Apr 12, 2023

Conversation

shanicky
Copy link
Contributor

@shanicky shanicky commented Apr 6, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

  • introduce rw_int256 (DataType::Int256)
  • support casting int256 from and into string
  • support implicit casting from smallint / int / bigint to rw_int256
  • support comparison
  • support binary + / - / * / / / % and unary -, abs
  • fix memcomparable encoding

A new type should also be working automatically for is null, case when, coalesce, nullif, in.

Limitation:

  • HashKeySerDe cannot be implemented, so HashKeySize::Variable is used.

Not implemented until necessary:

  • unary + (useless)
  • bitwise

Checklist For Contributors

- [ ] I have written necessary rustdoc comments
- [ ] I have added necessary unit tests and integration tests
- [ ] I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features #7934).
- [ ] I have demonstrated that backward compatibility is not broken by breaking changes and created issues to track deprecated features to be removed in the future. (Please refer to the issue)
- [] All checks passed in ./risedev check (or alias, ./risedev c)

Checklist For Reviewers

  • I have requested macro/micro-benchmarks as this PR can affect performance substantially, and the results are shown.

Documentation

  • My PR DOES NOT contain user-facing changes.
Click here for Documentation

Types of user-facing changes

Please keep the types that apply to your changes, and remove the others.

  • Installation and deployment
  • Connector (sources & sinks)
  • SQL commands, functions, and operators
  • RisingWave cluster configuration changes
  • Other (please specify in the release note below)

Release note

@shanicky shanicky marked this pull request as draft April 6, 2023 09:07
Base automatically changed from peng/num256-1 to main April 6, 2023 10:31
@shanicky shanicky force-pushed the peng/num256-2 branch 2 times, most recently from de8bfa6 to 7102e8f Compare April 6, 2023 14:02
@shanicky
Copy link
Contributor Author

shanicky commented Apr 7, 2023

It seems like int256 should be sufficient enough and uint256 may not be necessary? cc @lmatz @neverchanje

@neverchanje
Copy link
Contributor

neverchanje commented Apr 7, 2023

AFAIK int256 (equivalent to uint128 when all integers are positive) is enough. uint256 can be reconsidered when necessary. I think we had a discussion here: #7410. @lmatz Please confirm if the uint256 requirement is still valid.

@lmatz
Copy link
Contributor

lmatz commented Apr 7, 2023

By skimming through some smart contract implementations, uint256 seems to be the default choice, I suppose it is because:

  1. the balance most of the time cannot be negative.
  2. even if we want to represent a negative thing (e.g. transaction, one address owed/borrowed some crypto), we can still use uint256 to represent it and give it a special meaning (in this case owe/borrow)
  3. uint256 has a larger range.

Besides, looking into the two most popular implementations of ETH nodes, i.e. https://github.com/ethereum/go-ethereum and https://github.com/ledgerwatch/erigon, both of them seem to use https://pkg.go.dev/github.com/holiman/uint256#section-readme library and this library only provides uint256 support instead of int256.

But the current user who requested uint256/int256 seems to have no strong opinion on this.
And I suppose any serious and reasonable smart contract implementation does not want to have such a large number that can only be represented by uint256 but not int256, which is error-prone. Does anyone use uint256::MAX to represent any special thing? I guess even if there is, it will stay as some internal implementation details.

Therefore, I have no strong opinion on this. We can support int256 first, and support the other if the user specifically requests and it cannot be solved by int256.

@shanicky shanicky requested a review from lmatz April 7, 2023 08:33
@shanicky shanicky self-assigned this Apr 7, 2023
@shanicky shanicky marked this pull request as ready for review April 7, 2023 08:33
src/connector/src/parser/common.rs Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Apr 7, 2023

Codecov Report

Merging #9028 (13ed03a) into main (a002e05) will decrease coverage by 0.03%.
The diff coverage is 40.43%.

@@            Coverage Diff             @@
##             main    #9028      +/-   ##
==========================================
- Coverage   70.87%   70.85%   -0.03%     
==========================================
  Files        1197     1197              
  Lines      199031   199250     +219     
==========================================
+ Hits       141062   141173     +111     
- Misses      57969    58077     +108     
Flag Coverage Δ
rust 70.85% <40.43%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/common/src/array/mod.rs 72.77% <ø> (ø)
src/common/src/hash/dispatcher.rs 83.17% <0.00%> (-0.79%) ⬇️
src/common/src/test_utils/rand_chunk.rs 79.16% <0.00%> (-3.45%) ⬇️
src/common/src/types/postgres_type.rs 14.28% <0.00%> (-0.26%) ⬇️
src/common/src/types/to_text.rs 70.83% <ø> (ø)
src/common/src/util/memcmp_encoding.rs 93.52% <0.00%> (-0.23%) ⬇️
src/common/src/util/value_encoding/mod.rs 91.23% <0.00%> (-1.53%) ⬇️
src/connector/src/parser/common.rs 70.21% <0.00%> (-0.76%) ⬇️
src/connector/src/sink/kafka.rs 37.97% <0.00%> (-0.08%) ⬇️
src/frontend/src/expr/literal.rs 91.58% <ø> (ø)
... and 9 more

... and 4 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@xiangjinwu
Copy link
Contributor

It seems this PR now does at least 2 things:

  • remove uint256 scalar and array
  • add int256 data type

Can we spilt it into 2 parts focusing on each item?

@shanicky
Copy link
Contributor Author

It seems this PR now does at least 2 things:

  • remove uint256 scalar and array

  • add int256 data type

Can we spilt it into 2 parts focusing on each item?

Sure, i'll do it

@shanicky shanicky force-pushed the peng/num256-2 branch 3 times, most recently from 82e2714 to bbfe860 Compare April 10, 2023 08:29
Copy link
Contributor

@xiangjinwu xiangjinwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still reading some parts ... but posting existing comments first

src/common/src/hash/dispatcher.rs Outdated Show resolved Hide resolved
src/common/src/types/mod.rs Outdated Show resolved Hide resolved
src/common/src/types/mod.rs Outdated Show resolved Hide resolved
src/common/src/hash/dispatcher.rs Outdated Show resolved Hide resolved
src/connector/src/parser/common.rs Outdated Show resolved Hide resolved
src/connector/src/sink/kafka.rs Outdated Show resolved Hide resolved
src/frontend/src/binder/expr/mod.rs Outdated Show resolved Hide resolved
src/tests/sqlsmith/src/sql_gen/types.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@xiangjinwu xiangjinwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly done, except for type conversions and arithmetic...

src/common/src/types/postgres_type.rs Show resolved Hide resolved
src/expr/macro/src/types.rs Outdated Show resolved Hide resolved
src/common/src/types/mod.rs Outdated Show resolved Hide resolved
src/expr/src/vector_op/cast.rs Show resolved Hide resolved
src/frontend/src/expr/literal.rs Outdated Show resolved Hide resolved
src/common/src/types/num256.rs Outdated Show resolved Hide resolved
src/common/src/types/num256.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@xiangjinwu xiangjinwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The auto-generated PR description is meaningless and even misleading. To summarize:

  • introduce rw_int256 (DataType::Int256)
  • support casting int256 from and into string
  • support implicit casting from smallint / int / bigint to rw_int256
  • support comparison
  • support binary + / - / * / / / % and unary -
  • fix memcomparable encoding

Limitation:

  • HashKeySerDe cannot be implemented, so HashKeySize::Variable is used.

In addition, let's add e2e tests to demonstrate the basic usage (interoperate with int, no interoperate with decimal / float, quoted for numbers larger than bigint), as well as the panic / wrong queries we fixed. A new type should also be working automatically for is null, case when, coalesce, nullif, in.

FYI there also also other polymorphic arithmetic operations for integral types:

  • unary + handled in frontend binder
  • abs
  • bitwise (maybe until necessary)

src/common/src/types/num256.rs Outdated Show resolved Hide resolved
src/common/src/types/num256.rs Outdated Show resolved Hide resolved
src/common/src/types/num256.rs Outdated Show resolved Hide resolved
src/common/src/util/memcmp_encoding.rs Outdated Show resolved Hide resolved
@shanicky shanicky added the user-facing-changes Contains changes that are visible to users label Apr 11, 2023
@github-actions github-actions bot removed the user-facing-changes Contains changes that are visible to users label Apr 11, 2023
Comment on lines +321 to +324
impl Signed for Int256 {
fn abs(&self) -> Self {
self.0.abs().into()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately num-traits does not have CheckedAbs. We can expose checked_abs from ethnum directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we can use the same neg method that's used for general_abs to implement the abs method directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes. I just want to avoid introducing too many useless methods that panics when misused, for example abs could panic for MIN. For the same reason I do not like num-traits requires CheckedAdd: Add.

But to reuse general_abs, let's keep this impl Signed

@@ -165,6 +166,7 @@ fn calculate_encoded_size_inner(
DataType::Jsonb => deserializer.skip_bytes()?,
DataType::Varchar => deserializer.skip_bytes()?,
DataType::Bytea => deserializer.skip_bytes()?,
DataType::Int256 => Int256::MEMCMP_ENCODED_SIZE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still incorrect. Although deserializer.skip_bytes() returns Int256::MEMCMP_ENCODED_SIZE here, it also has a side effect of advancing the offset. For fixed size types, the offset advancing is done on line 175 below.

Since we encoded int256 as bytes, it has extra marks and advancing 32 bytes is not enough.. But I am not familiar with how this code is used and how to construct a query to trigger this error or panic.

Now talking about the fix:

  • Easiest is just to use skip_bytes, because the encoding is bytes.
  • But actually we could avoid the extra marks in bytes encoding, and use exactly 32 bytes. There is i256::into_words. It returns (i128, i128) but logically it should be (i128, u128). The memcomparable ser/de is not implemented for i128/u128 yet, but we can play the same trick again to split i128 to (i64, u64) and u128 to (u64, u64). Let me enhance memcomparable for i128/u128 at the same time, which I should have done weeks earlier ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure about code here, so let's change it to skip_bytes for now.

Copy link
Contributor

@xiangjinwu xiangjinwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Add a distinct / group-by test to prevent someone "fix" HashKeySize accidentally.

select distinct v::int256 from generate_series(1, 1, 1) as t(v);

@shanicky shanicky enabled auto-merge April 12, 2023 04:25
@shanicky shanicky added this pull request to the merge queue Apr 12, 2023
@xiangjinwu
Copy link
Contributor

  • My PR DOES NOT contain user-facing changes.

@lmatz What is the recommended way of documenting such experimental features?

@BugenZhao BugenZhao added the user-facing-changes Contains changes that are visible to users label Apr 12, 2023
Merged via the queue into main with commit 1cee797 Apr 12, 2023
10 of 11 checks passed
@shanicky shanicky deleted the peng/num256-2 branch April 12, 2023 05:06
@CharlieSYH CharlieSYH added 📖✓ Covered or will be covered in the user docs. 📖✗ No user documentation is needed. and removed 📖✓ Covered or will be covered in the user docs. labels Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
experimental type/feature user-facing-changes Contains changes that are visible to users 📖✗ No user documentation is needed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants