Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decimals: string to decimal fails if more decimal places than "scale" #13987

Open
2 tasks done
Julian-J-S opened this issue Jan 25, 2024 · 5 comments
Open
2 tasks done
Labels
A-dtype-decimal Area: decimal data type bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@Julian-J-S
Copy link
Contributor

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

pl.DataFrame({
    'x': ['1', '1.23', '1.2345'],
}).with_columns(
    d=pl.col("x").cast(pl.Decimal(scale=2)),
)

Log output

ComputeError: conversion from `str` to `decimal[*,2]` failed in column 'x' for 1 out of 3 values: ["1.2345"]

Issue description

Converting from str to decimal fails if the str values contain more decimal places than the specified "scale" of the Decimal.

Expected behavior

This works in other libraries (e.g. pyspark) and should also work in polars. Also polars supports casting from f64 "1.2345" to Decimal "1.23"

pyspark:
image


Note: I am opening a lot of Decimal Issues (Decimals are required for many financial calculations where I currently use pyspark). I heard a bigger Decimal update in coming soon? Not sure abount the roadmap but happy to help.

Installed versions

0.20.5
@Julian-J-S Julian-J-S added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Jan 25, 2024
@Julian-J-S Julian-J-S changed the title Decimals: string to decimal fails if too many decimal places Decimals: string to decimal fails if more decimal places than "scale" Jan 25, 2024
@alexander-beedie alexander-beedie added the A-dtype-decimal Area: decimal data type label Jan 26, 2024
@petrosbar
Copy link
Contributor

Not all libraries handle it the same. For example, in pyarrow this would raise:

import pyarrow as pa
import pyarrow.compute as pc

pc.cast(pa.array(["1.2345"]), pa.decimal128(5, 2))
# ...
# ArrowInvalid: Rescaling Decimal128 value would cause data loss

In any case, it seems that this has been fixed since polars==0.20.6.

import polars as pl

pl.__version__
# '0.20.6'

pl.DataFrame({
    'x': ['1', '1.23', '1.2345'],
}).with_columns(
    d=pl.col("x").cast(pl.Decimal(scale=2)),
)

Output:

  shape: (3, 2)
┌────────┬──────────────┐
│ x      ┆ d            │
│ ---    ┆ ---          │
│ str    ┆ decimal[*,2] │
╞════════╪══════════════╡
│ 1      ┆ 1            │
│ 1.23   ┆ 1.23         │
│ 1.2345 ┆ 1.23         │
└────────┴──────────────┘

@Julian-J-S
Copy link
Contributor Author

@petrosbar thanks for the info! You are right, it works now.

Maybe we should add a dedicated to_decimal with a allow_data_loss / allow_decimal_truncation parameter that allows to explicitly opt in/out of this functionality 🤔

@matquant14
Copy link

Any update on this? I try to cast a string value to a decimal, but it returns NULL values.

image

I'm on polars 1.81
image

@cmdlineluser
Copy link
Contributor

@matquant14 It's a bit easier to help if you post an example as runnable code.

It seems commas need to be stripped from strings in order for .cast / .str.to_decimal to work properly.

>>> pl.select(pl.lit("1,000.12").str.to_decimal())
shape: (1, 1)
┌──────────────┐
│ literal      │
│ ---          │
│ decimal[*,2] │
╞══════════════╡
│ null         │
└──────────────┘
>>> pl.select(pl.lit("1,000.12").str.replace_all(r",", "").str.to_decimal())
shape: (1, 1)
┌──────────────┐
│ literal      │
│ ---          │
│ decimal[*,2] │
╞══════════════╡
│ 1000.12      │
└──────────────┘

(It may be useful for the docs to explain this?)

The original example in this issue also seems to work now.

pl.DataFrame({
    'x': ['1', '1.23', '1.2345'],
}).with_columns(
    d=pl.col("x").cast(pl.Decimal(scale=2)),
)

# shape: (3, 2)
# ┌────────┬──────────────┐
# │ x      ┆ d            │
# │ ---    ┆ ---          │
# │ str    ┆ decimal[*,2] │
# ╞════════╪══════════════╡
# │ 1      ┆ 1.00         │
# │ 1.23   ┆ 1.23         │
# │ 1.2345 ┆ 1.23         │
# └────────┴──────────────┘

@matquant14
Copy link

ahh, thanks @cmdlineluser. I thought that was being done under the hood, but my oversight. I got it to work now. appreciate the feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-dtype-decimal Area: decimal data type bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

5 participants