Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] zfill needs to align with python standard library in some cases #11632

Closed
galipremsagar opened this issue Aug 31, 2022 · 3 comments · Fixed by #11634
Closed

[BUG] zfill needs to align with python standard library in some cases #11632

galipremsagar opened this issue Aug 31, 2022 · 3 comments · Fixed by #11634
Assignees
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python)

Comments

@galipremsagar
Copy link
Contributor

galipremsagar commented Aug 31, 2022

Describe the bug
Similar to pandas-dev/pandas#20868

When the string column has data which has +/- prefix signs, they will aswell need to be moved during the zfill operation.

Steps/Code to reproduce bug

In [9]: import cudf

In [10]: s = cudf.Series(["-10", "+100", "+abc", "-/bcd"])

In [11]: s.str.zfill(10)
Out[11]: 
0    0000000-10
1    000000+100
2    000000+abc
3    00000-/bcd
dtype: object

In [12]: "-10".zfill(10)
Out[12]: '-000000010'

In [13]: "+100".zfill(10)
Out[13]: '+000000100'

In [14]: "+abc".zfill(10)
Out[14]: '+000000abc'

In [15]: "-/bcd".zfill(10)
Out[15]: '-00000/bcd'

In [16]: "a+b".zfill(10)
Out[16]: '0000000a+b'

In [17]: "100-".zfill(10)
Out[17]: '000000100-'

Expected behavior
Match python standard library behavior shown above for some special cases.

Environment overview (please complete the following information)

  • Environment location: [Bare-metal]
  • Method of cuDF install: [from source]
@galipremsagar galipremsagar added bug Something isn't working Needs Triage Need team to review and classify libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) labels Aug 31, 2022
@github-actions github-actions bot added this to Needs prioritizing in Bug Squashing Aug 31, 2022
@galipremsagar
Copy link
Contributor Author

cc: @davidwendt for visibility

@davidwendt davidwendt self-assigned this Aug 31, 2022
@davidwendt
Copy link
Contributor

This was implemented to match Pandas

>>> import pandas as pd
>>> s = pd.Series(["-10", "+100", "+abc", "-/bcd"])
>>> s.str.zfill(10)
0    0000000-10
1    000000+100
2    000000+abc
3    00000-/bcd
dtype: object

Certainly seems the Python zfill is more correct.
@galipremsagar Let me know what is preferred for cuDF.

@galipremsagar
Copy link
Contributor Author

This was implemented to match Pandas


>>> import pandas as pd

>>> s = pd.Series(["-10", "+100", "+abc", "-/bcd"])

>>> s.str.zfill(10)

0    0000000-10

1    000000+100

2    000000+abc

3    00000-/bcd

dtype: object

Certainly seems the Python zfill is more correct.

@galipremsagar Let me know what is preferred for cuDF.

Yup. The current version of zfill was implemented to match pandas. But pandas fixed this issue in 1.5.0 because python standard library zfill is correct. So we aswell need to align similarly.

@rapids-bot rapids-bot bot closed this as completed in dc0d8d1 Sep 2, 2022
Bug Squashing automation moved this from Needs prioritizing to Closed Sep 2, 2022
@bdice bdice removed the Needs Triage Need team to review and classify label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python)
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

3 participants