Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pl.UInt32 subtraction leading to weird results #15415

Open
2 tasks done
miroslaavi opened this issue Apr 1, 2024 · 3 comments
Open
2 tasks done

pl.UInt32 subtraction leading to weird results #15415

miroslaavi opened this issue Apr 1, 2024 · 3 comments
Labels
invalid A bug report that is not actually a bug python Related to Python Polars

Comments

@miroslaavi
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

pl.DataFrame(
    {
        "A": [21506, 1719, 1983, 5736],
        "B": [21546, 1711, 1918, 5861],
    },
    schema={"A": pl.UInt32, "B": pl.UInt32}
).with_columns(
    (pl.col("A") - pl.col("B")).alias("diff")
)

Out:

┌───────┬───────┬────────────┐
│ A     ┆ B     ┆ diff       │
│ ---   ┆ ---   ┆ ---        │
│ u32   ┆ u32   ┆ u32        │
╞═══════╪═══════╪════════════╡
│ 21506 ┆ 21546 ┆ 4294967256 │
│ 1719  ┆ 1711  ┆ 8          │
│ 1983  ┆ 1918  ┆ 65         │
│ 5736  ┆ 5861  ┆ 4294967171 │
└───────┴───────┴────────────┘

Log output

No response

Issue description

I have a LazyFrame workflow that casts some values to pl.UInt32 and noticed that the subtraction leading to strange results as shown in the above example.

Expected behavior

Subtraction to work normally

Installed versions

<html>
<body>
<!--StartFragment--><div id="5cc6a54b-8061-4226-beb9-997188fe14b6" class="cell_container" style="width: 1074.4px; color: rgb(204, 204, 204); font-family: &quot;Segoe WPC&quot;, &quot;Segoe UI&quot;, sans-serif; font-size: 13px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; position: absolute; outline: 0px; top: 4981px;"><div class="output_container" data-vscode-context="{&quot;preventDefaultContextMenuItems&quot;:true}" style="width: 1074.4px; position: absolute; overflow: hidden; max-height: 37px; top: 158px; height: 37px;"><div id="bde56f91-4b91-4f20-99f1-281aa0dd1a77" class="output remove-padding" style="font-size: var(--notebook-cell-output-font-size); width: var(--notebook-output-width); margin-left: var(--notebook-output-left-margin); background-color: var(--theme-notebook-output-background); padding-top: var(--notebook-output-node-padding); padding-right: 0px; padding-bottom: var(--notebook-output-node-padding); padding-left: 0px; box-sizing: border-box; border-top: none; overflow-x: auto; position: absolute; top: 0px; left: 0px;"><div output-item-id="bde56f91-4b91-4f20-99f1-281aa0dd1a77" class="output-plaintext" tabindex="0" style="display: inline-block; width: 1014.4px; line-height: var(--notebook-cell-output-line-height); font-family: var(--notebook-cell-output-font-family); font-size: var(--notebook-cell-output-font-size); user-select: text; cursor: auto; overflow-wrap: break-word; white-space: pre; padding-left: var(--notebook-output-node-left-padding); padding-right: var(--notebook-output-node-padding); box-sizing: border-box; border-width: 1px; border-style: solid; border-color: transparent;"><span><span class="">polars.config.Config</span></span></div></div></div></div><div id="focus-sink-5cc6a54b-8061-4226-beb9-997188fe14b6" tabindex="0" style="color: rgb(204, 204, 204); font-family: &quot;Segoe WPC&quot;, &quot;Segoe UI&quot;, sans-serif; font-size: 13px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"></div><div id="focus-sink-5909bd96-5866-4155-a7b9-36dfb8659869" tabindex="0" style="color: rgb(204, 204, 204); font-family: &quot;Segoe WPC&quot;, &quot;Segoe UI&quot;, sans-serif; font-size: 13px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"></div><div id="5909bd96-5866-4155-a7b9-36dfb8659869" class="cell_container" style="width: 1074.4px; color: rgb(204, 204, 204); font-family: &quot;Segoe WPC&quot;, &quot;Segoe UI&quot;, sans-serif; font-size: 13px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; position: absolute; outline: 0px; top: 8683px;"><div class="output_container" data-vscode-context="{&quot;preventDefaultContextMenuItems&quot;:true}" style="width: 1074.4px; position: absolute; overflow: hidden; max-height: 312px; top: 63px; height: 312px;"><div id="ac8ac842-b3d2-47f5-b7ef-585830f99b03" class="output remove-padding" style="font-size: var(--notebook-cell-output-font-size); width: var(--notebook-output-width); margin-left: var(--notebook-output-left-margin); background-color: var(--theme-notebook-output-background); padding-top: var(--notebook-output-node-padding); padding-right: 0px; padding-bottom: var(--notebook-output-node-padding); padding-left: 0px; box-sizing: border-box; border-top: none; overflow-x: auto; position: absolute; top: 0px; left: 0px;"><div class="output_html" tabindex="0" style="padding-left: var(--notebook-output-node-left-padding); padding-right: var(--notebook-output-node-padding); box-sizing: border-box; border-width: 1px; border-style: solid; border-color: transparent;"><div><small>shape: (6, 12)</small>
Category | revenue | units | price | Category_LY | revenue_LY | units_LY | price_LY | volume impact | mix impact | price impact | total impact
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
str | f64 | u32 | f64 | str | f64 | u32 | f64 | f64 | f64 | f64 | f64
"Bracelets" | 332726.735364 | 5736 | 58.006753 | "Bracelets" | 344314.626605 | 5861 | 58.746737 | 1.7974e11 | 7.2579e10 | -4244.549093 | 2.5232e11
"Rings" | 238523.950732 | 4255 | 56.057333 | "Rings" | 235330.836392 | 4190 | 56.164877 | 2720.1274 | 930.589632 | -457.602693 | 3193.114339
"Other" | 24464.686408 | 2186 | 11.191531 | "Other" | 24799.855995 | 2157 | 11.497383 | 1213.595302 | -880.171183 | -668.593705 | -335.169587
"Charms" | 737751.979534 | 21506 | 34.304472 | "Charms" | 740521.83544 | 21546 | 34.369342 | 1.7974e11 | -3.2121e10 | -1395.08224 | 1.4762e11
"Necklaces & Pendants" | 119051.138037 | 1719 | 69.256043 | "Necklaces & Pendants" | 117704.075248 | 1711 | 68.792563 | 334.784911 | 215.555593 | 796.722285 | 1347.062789
"Earrings" | 104449.774471 | 1983 | 52.672604 | "Earrings" | 101736.810394 | 1918 | 53.043175 | 2720.1274 | 727.679 | -734.842324 | 2712.964077

</div></div></div></div></div><div id="focus-sink-15b3ee68-ee4b-47e1-a1e6-50ed668ebebd" tabindex="0" style="color: rgb(204, 204, 204); font-family: &quot;Segoe WPC&quot;, &quot;Segoe UI&quot;, sans-serif; font-size: 13px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"></div><div id="focus-sink-bb76f330-020d-4c58-a5e3-c8bf8c1cf260" tabindex="0" style="color: rgb(204, 204, 204); font-family: &quot;Segoe WPC&quot;, &quot;Segoe UI&quot;, sans-serif; font-size: 13px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"></div><div id="bb76f330-020d-4c58-a5e3-c8bf8c1cf260" class="cell_container" style="width: 1074.4px; color: rgb(204, 204, 204); font-family: &quot;Segoe WPC&quot;, &quot;Segoe UI&quot;, sans-serif; font-size: 13px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; position: absolute; outline: 0px; top: 7227px;"><div class="output_container" data-vscode-context="{&quot;preventDefaultContextMenuItems&quot;:true}" style="width: 1074.4px; position: absolute; overflow: hidden; max-height: 227px; top: 253px; height: 227px;"><div id="087f308c-189e-4bdb-94b0-4747495ce485" class="output remove-padding output-stream" style="display: inline-block; width: var(--notebook-output-width); line-height: var(--notebook-cell-output-line-height); font-family: var(--notebook-cell-output-font-family); font-size: var(--notebook-cell-output-font-size); user-select: text; cursor: auto; overflow-wrap: break-word; white-space: pre; margin-left: var(--notebook-output-left-margin); background-color: var(--theme-notebook-output-background); padding-top: var(--notebook-output-node-padding); padding-right: 0px; padding-bottom: var(--notebook-output-node-padding); padding-left: 0px; box-sizing: border-box; border-top: none; overflow-x: auto; position: absolute; top: 0px; left: 0px;"><div tabindex="0" style="padding-left: var(--notebook-output-node-left-padding); padding-right: var(--notebook-output-node-padding); box-sizing: border-box; border-width: 1px; border-style: solid; border-color: transparent;"><div output-item-id="087f308c-189e-4bdb-94b0-4747495ce485"><span><span class=""><span>shape: (4, 3)
</span><span>┌───────┬───────┬────────────┐
</span><span>│ A     ┆ B     ┆ diff       │
</span><span>│ ---   ┆ ---   ┆ ---        │
</span><span>│ u32   ┆ u32   ┆ u32        │
</span><span>╞═══════╪═══════╪════════════╡
</span><span>│ 21506 ┆ 21546 ┆ 4294967256 │
</span><span>│ 1719  ┆ 1711  ┆ 8          │
</span><span>│ 1983  ┆ 1918  ┆ 65         │
</span><span>│ 5736  ┆ 5861  ┆ 4294967171 │
</span><span>└───────┴───────┴────────────┘
</span></span></span></div></div></div></div></div><div id="focus-sink-bb76f330-020d-4c58-a5e3-c8bf8c1cf260" tabindex="0" style="color: rgb(204, 204, 204); font-family: &quot;Segoe WPC&quot;, &quot;Segoe UI&quot;, sans-serif; font-size: 13px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"></div><div id="focus-sink-d7ce6329-6e2c-407d-b821-463cf8cb85bc" tabindex="0" style="color: rgb(204, 204, 204); font-family: &quot;Segoe WPC&quot;, &quot;Segoe UI&quot;, sans-serif; font-size: 13px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"></div><div id="d7ce6329-6e2c-407d-b821-463cf8cb85bc" class="cell_container" style="width: 1074.4px; color: rgb(204, 204, 204); font-family: &quot;Segoe WPC&quot;, &quot;Segoe UI&quot;, sans-serif; font-size: 13px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; position: absolute; outline: 0px; top: 7731px;"><div class="output_container" data-vscode-context="{&quot;preventDefaultContextMenuItems&quot;:true}" style="width: 1074.4px; position: absolute; overflow: hidden; max-height: 531px; top: 63px; height: 531px;"><div id="3630af3e-1fe9-47ee-95b5-b1ec7a4d7860" class="output remove-padding output-stream" style="display: inline-block; width: var(--notebook-output-width); line-height: var(--notebook-cell-output-line-height); font-family: var(--notebook-cell-output-font-family); font-size: var(--notebook-cell-output-font-size); user-select: text; cursor: auto; overflow-wrap: break-word; white-space: pre; margin-left: var(--notebook-output-left-margin); background-color: var(--theme-notebook-output-background); padding-top: var(--notebook-output-node-padding); padding-right: 0px; padding-bottom: var(--notebook-output-node-padding); padding-left: 0px; box-sizing: border-box; border-top: none; overflow-x: auto; position: absolute; top: 0px; left: 0px;"><div tabindex="0" style="padding-left: var(--notebook-output-node-left-padding); padding-right: var(--notebook-output-node-padding); box-sizing: border-box; border-width: 1px; border-style: solid; border-color: var(--theme-input-focus-border-color); outline: 0px;"><div output-item-id="3630af3e-1fe9-47ee-95b5-b1ec7a4d7860"><span><span class=""><span>--------Version info---------
</span><span>Polars:               0.20.17
</span><span>Index type:           UInt32
</span><span>Platform:             Windows-10-10.0.22621-SP0
</span><span>Python:               3.11.4 (tags/v3.11.4:d2340ef, Jun  7 2023, 05:45:37) [MSC v.1934 64 bit (AMD64)]
</span><span>
</span><span>----Optional dependencies----
</span><span>adbc_driver_manager:  &lt;not installed&gt;
</span><span>cloudpickle:          3.0.0
</span><span>connectorx:           &lt;not installed&gt;
</span><span>deltalake:            &lt;not installed&gt;
</span><span>fastexcel:            0.8.0
</span><span>fsspec:               2024.2.0
</span><span>gevent:               &lt;not installed&gt;
</span><span>hvplot:               &lt;not installed&gt;
</span><span>matplotlib:           3.8.2
</span><span>nest_asyncio:         1.6.0
</span><span>numpy:                1.26.3
</span><span>openpyxl:             3.1.2
</span><span>pandas:               2.2.0
</span><span>pyarrow:              15.0.0
</span><span>pydantic:             &lt;not installed&gt;
</span><span>pyiceberg:            &lt;not installed&gt;
</span><span>pyxlsb:               &lt;not installed&gt;
</span><span>sqlalchemy:           &lt;not installed&gt;
</span><span>xlsx2csv:             0.8.2
</span><span>xlsxwriter:           3.2.0</span></span></span></div></div></div></div></div><!--EndFragment-->
</body>
</html>
@miroslaavi miroslaavi added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Apr 1, 2024
@miroslaavi miroslaavi reopened this Apr 1, 2024
@ritchie46
Copy link
Member

That's underflow. We don't check for over/underflow for performance reasons. This is similar to what numpy does:

np.array([1, 2, 3], dtype=np.uint32) - 10
array([4294967287, 4294967288, 4294967289], dtype=uint32)

So cast to signed integers when doing arithmetic and subtraction.

@ritchie46 ritchie46 added invalid A bug report that is not actually a bug and removed bug Something isn't working needs triage Awaiting prioritization by a maintainer labels Apr 1, 2024
@miroslaavi
Copy link
Author

Got it, thank you. Just general feedback from non-technical user that this might be confusing at times especially if not familiar with the differences of pl.Int32 and pl.UInt32.

For instance in my case a group_by("category").agg(pl.col("Item ID").count()) leads to the type pl.UInt32, so would it be better to be defaulting to pl.Int32 to avoid such cases?

@reswqa
Copy link
Collaborator

reswqa commented Apr 1, 2024

For instance in my case a group_by("category").agg(pl.col("Item ID").count()) leads to the type pl.UInt32, so would it be better to be defaulting to pl.Int32 to avoid such cases?

Count returning signed types is even more mind-boggling to me, and would make the range it can represent smaller.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid A bug report that is not actually a bug python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

3 participants