New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed decimal serialization and deserialization #361
Conversation
Codecov Report
@@ Coverage Diff @@
## master #361 +/- ##
==========================================
- Coverage 97.98% 97.05% -0.94%
==========================================
Files 26 26
Lines 1740 1732 -8
==========================================
- Hits 1705 1681 -24
- Misses 35 51 +16
Continue to review full report at Codecov.
|
tests/test_logical_types.py
Outdated
(Decimal("-123.456"), b'\x06\xfe\x1d\xc0'), | ||
(Decimal("3245.234"), b'\x061\x84\xb2'), | ||
(Decimal("-3245.234"), b'\x06\xce{N'), | ||
(Decimal("9999999999999999.456"), b'\x12\x00\x8a\xc7#\x04\x89\xe7\xfd\xe0'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests are complaining because this line is too long. Can you add a # noqa
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reformated the lines. It's not complaining now ;)
Thanks! It's definitely much simpler, but I'm not sure if we can switch to this as it seems like the |
98e67a4
to
5af3c62
Compare
Your welcome :) I implemented I can add some benchmarks for the original version and the new.
I suggest adding the tests to test suite but not run them in CI. What do you think? It should be different PR because we want to run it to the original (not fixed) code. |
16b5342
to
5570470
Compare
It looks like the tests are failing on python 2 so maybe something needs to be tweaked there. As for benchmarks, we currently don't run any benchmarks during CI (though that probably would be a good thing to do but it just hasn't happened). If you have a simple script you've been using to do the benchmarks and can just paste that here I would love to run it locally as well. |
Hi, I hope it is finally fixed :) The benchmarks I used (with pytest-benchmark plugin): @pytest.fixture(name='schema_bytes_decimal')
def schema_bytes_decimal_fixture(scale):
return {
"name": "n",
"namespace": "namespace",
"type": "bytes",
"logicalType": "decimal",
"precision": 20,
"scale": scale,
}
@pytest.mark.benchmark(
group = 'serialize',
disable_gc = True,
warmup = True,
)
@pytest.mark.parametrize('scale', [3, 20])
@pytest.mark.parametrize(
'input_data',
[
Decimal("0.456"),
Decimal("-0.456"),
Decimal("9999999999999999.456"),
Decimal("-999999999999999.456"),
],
ids=['0.456', '-0.456', '9999999999999999.456', '-9999999999999999.456']
)
def test_bytes_decimal_serialize_benchmark(schema_bytes_decimal, input_data, benchmark):
benchmark(serialize, schema_bytes_decimal, input_data)
@pytest.mark.benchmark(
group = 'deserialize',
disable_gc = True,
warmup = True,
)
@pytest.mark.parametrize('scale', [3, 20])
@pytest.mark.parametrize(
'input_data',
[
b'\x04\x01\xc8',
b'\x04\xfe8',
b'\x12\x00\x8a\xc7#\x04\x89\xe7\xfd\xe0',
b'\x10\xf2\x1fILX\x9c\x02 ',
],
ids=['0.456', '-0.456', '9999999999999999.456', '-9999999999999999.456']
)
def test_bytes_decimal_deserialize_benchmark(schema_bytes_decimal, input_data, benchmark):
benchmark(deserialize, schema_bytes_decimal, input_data) |
I run the benchmarks on the original and new version of code in Python 3.7 and 2.7: Python 3.7:
New version (fix/decimal):
So in Python 3.7 deserialization is almost the same fast in both versions. Serialization is faster in the new version. It is more significant with bigger scale and more digits in a number. It is mainly because of elimination two for loops which are related to scale and digits. Python 2.7:
New version (fix/decimal):
In Python 2.7 is the situation similar. Deserialization is almost the same fast (btw it is faster than in Python 3.7). Performance gain during serialization is significant only with bigger scale or/and more digits. There is almost no performance loss :) |
@scottbelden Hi, is there plan to publish the new version? We would like to use it :) |
Should be up now as |
Hi,
this is a fix of decimal serialization and deserialization. It is also about 2x faster than the original implementation.
It is related to #360.
I think the serialization of fixed bytes in
prepare_fixed_decimal
can be also simplified. It will follow in another PR.Best regards