-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regression v1 v2 by id2 id4: results has changed in 0.6.2 #357
Comments
I believe I've warned future me for this. https://github.com/ritchie46/polars/blob/65184670c07efc6a5b891bd6cbf24e03a18187ed/polars/polars-core/src/functions.rs#L5 😅 Thanks for noting, I will get into this. |
I am now checking |
This comment has been minimized.
This comment has been minimized.
I see. It starts to make sense now. The summations don't do checked addition. For now the quickest solution would be changing the type of the data we read. x = pl.read_csv(src_grp, dtype={"id4":pl.Int32, "id5":pl.Int32, "id6":pl.Int32, "v1":pl.Int32, "v2":pl.Int32, "v3":pl.Float64})
Then there are still some |
q10 is OOM killed, I don't think it has anything to do with NAs. |
@jangorecki . Do you know if this is still relevant now the overflow was fixed? |
Overflow was fixed. Not matching check sums is still an issue. |
Updated comparison of all checks using polars 0.7.9
|
I just spotted that casts to Int64 have been applied to aggressively in h2oai/db-benchmark@c7ad6b0#diff-b9f18f8b66c6e7e35cdae2fc80bb752351481552f8f4004e1e311fd92a77fb0d |
So the differences in the checksum seem to be mostly by I think I've fixed the regression behavior, but it is not yet released. I plan to release this afternoon, together with a fix for that |
I amended types for |
@jangorecki it is on pypi. |
0.7.11
|
I looked through those checksum briefly. Most of the issues got fixed but there seems to be still issues for data with NAs:
|
Great, were tuning in. I will investigate the last ones. Thanks for the feedback. |
@jangorecki Ok, I think all checksums are ok now:
I also made a new patch release, because this is important to do correctly. |
AFAIU this patch release was 0.7.11, right? if so then it doesn't seem to be fixed |
@jangorecki No the patch release was 0.7.12. And of course the latest version is also patched. |
@jangorecki I found another null handling issue which only came up on |
@jangorecki https://pypi.org/project/polars/0.7.16/ is the patch. |
I can confirm it is fixed using 0.7.16. If possible please ensure you have unit tests for those so we can avoid debugging same issues in future. |
Yes! |
Hi Ritchie,
I noticed that results of one of the queries for one data case only has changed in recent version
This might be due to a bug fix, meaning that previously computed results were considered incorrect.
Ultimately comparing to other solutions it still looks to be incorrect
Could you please have a look at this question just to ensure it should produce matching results to other tools?
Note that I am looking at
chk
value, which isans["r2"].sum()
, so not really a result but "signature" of a result. This has been observed only onG1_1e8_1e2_5_0
data case.The text was updated successfully, but these errors were encountered: