BUG/CLN: Clean float / complex string formatting #36799

dsaxton · 2020-10-02T01:24:54Z

Noticed while working on another bug. The _is_number helper here is wrong and can cause incorrect results given that this code path is hit by arbitrary strings (e.g., it thinks "foo" is a number). Also the _trim_zeros_complex helper apparently does nothing:

[ins] In [3]: _trim_zeros_float(["0.00000"])
Out[3]: ['0.0']

[ins] In [4]: _trim_zeros_complex(["1.000+1.000000j"])
Out[4]: ['1.000+1.000000j']

simonjayhawkins · 2020-10-02T09:03:56Z

Also the _trim_zeros_complex helper apparently does nothing:

something is wrong then.

_trim_zeros_complex was 'fixed' in #25745, why is test_to_string_complex_float_formatting not failing? has there been some other refactor since?

dsaxton · 2020-10-02T14:05:21Z

_trim_zeros_complex was 'fixed' in #25745, why is test_to_string_complex_float_formatting not failing? has there been some other refactor since?

My guess is it's passing because there are no zeros to trim in the test case.

Instead of trimming zeros it's as though zeros are added; I would expect the formatting behavior instead to be similar to float for both the real and imaginary parts (below is master):

[ins] In [4]: s = pd.Series([0.000, 1.000])
         ...: print(s)
         ...:
         ...: s = pd.Series([0.000+1.000j, 1.000+1.000j])
         ...: print(s)
         ...:
0    0.0
1    1.0
dtype: float64
0    0.000000+1.000000j
1    1.000000+1.000000j
dtype: complex128

Should I open an issue about this, and either close this PR or turn it into a bug fix?

…lper

pandas/io/formats/format.py

…lper

jreback

can you run some asv's i am not sure this is really hit much, but maybe

jreback · 2020-10-02T21:43:41Z

pandas/io/formats/format.py

 ) -> List[str]:
    """
    Trims zeros, leaving just one before the decimal points if need be.
    """
    trimmed = str_floats

    def _is_number(x):
-        return x != na_rep and not x.endswith("inf")
+        return re.match(fr"\s*-?[0-9]+(\{decimal}[0-9]*)?", x) is not None


can you compile this and put it on the class / variable

…lper

dsaxton · 2020-10-04T00:38:35Z

can you run some asv's i am not sure this is really hit much, but maybe

       before           after         ratio
     [a975a754]       [229b5fc1]
     <master>         <fix-is-numeric-helper>
+      1.84±0.1ms       2.37±0.2ms     1.28  io.csv.ReadCSVCachedParseDates.time_read_csv_cached(False)
+     4.16±0.09ms       4.86±0.5ms     1.17  io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine(',', '.', 'high')
-     6.88±0.07ms      6.18±0.09ms     0.90  io.hdf.HDFStoreDataFrame.time_store_info

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

-----

       before           after         ratio
     [a975a754]       [229b5fc1]
     <master>         <fix-is-numeric-helper>
-     6.88±0.03ms      6.07±0.04ms     0.88  io.hdf.HDFStoreDataFrame.time_store_info
-        24.4±1ms       16.2±0.2ms     0.66  io.hdf.HDFStoreDataFrame.time_read_store_table_wide

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

-----

BENCHMARKS NOT SIGNIFICANTLY CHANGED.

-----

       before           after         ratio
     [a975a754]       [229b5fc1]
     <master>         <fix-is-numeric-helper>
+      18.5±0.4ms       24.4±0.7ms     1.32  io.csv.ReadCSVThousands.time_thousands(',', None)
+      17.7±0.8ms       22.4±0.6ms     1.26  io.csv.ReadCSVThousands.time_thousands(',', ',')
+      17.9±0.6ms         21.8±1ms     1.22  io.csv.ReadCSVThousands.time_thousands('|', ',')
-     6.94±0.08ms      6.14±0.04ms     0.89  io.hdf.HDFStoreDataFrame.time_store_info
-     2.09±0.08ms      1.78±0.03ms     0.85  io.csv.ReadCSVFloatPrecision.time_read_csv(';', '_', 'high')
-        5.63±1ms      3.95±0.03ms     0.70  io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine(',', '.', 'high')
-      23.6±0.7ms       16.1±0.6ms     0.68  io.hdf.HDFStoreDataFrame.time_read_store_table_wide

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

-----

BENCHMARKS NOT SIGNIFICANTLY CHANGED.

Ran the IO asvs a few times and results aren't consistent. Maybe there's something going on with io.csv / io.hdf but hard to say.

ivanovmg · 2020-10-08T16:50:57Z

pandas/io/formats/format.py

+    max_length = max(lengths)
+    padded = [
+        s[: -((k - 1) // 2 + 1)]  # real part
+        + (max_length - k) // 2 * "0"
+        + s[-((k - 1) // 2 + 1) : -((k - 1) // 2)]  # + / -
+        + s[-((k - 1) // 2) : -1]  # imaginary part
+        + (max_length - k) // 2 * "0"
+        + s[-1]
+        for s, k in zip(complex_strings, lengths)
+    ]


Would it be safer to split real and imaginary parts via +- and then process decimal and fractional parts by splitting via the dot? This way you would not need to rely on the symmetry of the original string provided.

You mean trim zeros after splitting into fractional non-fractional parts? I think the trimming has to be done with the decimal there. (I realize this helper is very confusing, and there's likely a better way to do this.)

Right, I mean trimming zeros after splitting into fractional and non-fractional parts. Since a dot char would always split float number, there is no risk to introduce a bug IMHO (even if there is no dot at all).

I think then you have to keep track of which parts are fractional and then only trim those?

However this part is not doing any actual trimming, it's correcting for the fact that the previous function is now trimming "too much." (It trims the real and imaginary parts of each complex number independently, so they aren't aligned afterwards. Rather than rewrite the other function I found it easier to do this post-processing.)

pandas/io/formats/format.py

…lper

jreback · 2020-10-10T22:57:50Z

lgtm can you add a whatsnew note and ping on green.

…lper

dsaxton · 2020-10-11T21:01:01Z

lgtm can you add a whatsnew note and ping on green.

@jreback Added note + green

jreback · 2020-10-11T21:31:09Z

pandas/tests/io/formats/test_format.py

+
+
+def test_to_string_complex_number_trims_zeros():
+    s = pd.Series([1.000000 + 1.000000j, 1.0 + 1.0j, 1.05 + 1.0j])


why should these have 2 decimal zeros and not 1 likely ordinary floats? Where did you get the expected output from?

There's a 1.05 in the last element, the expected output is similar to what happens for floats:

[ins] In [2]: s = pd.Series([1.0, 1.0000, 1.05]) [ins] In [3]: s Out[3]: 0 1.00 1 1.00 2 1.05 dtype: float64

jreback · 2020-10-14T12:52:02Z

thought we had merged this, thanks @dsaxton

dsaxton added 2 commits October 1, 2020 19:56

CLN: Clean float / complex string formatting

406ed4d

Fix

a1228c9

Merge remote-tracking branch 'upstream/master' into fix-is-numeric-he…

eabef62

…lper

jbrockmendel reviewed Oct 2, 2020

View reviewed changes

pandas/io/formats/format.py Outdated Show resolved Hide resolved

dsaxton changed the title ~~CLN: Clean float / complex string formatting~~ BUG/CLN: Clean float / complex string formatting Oct 2, 2020

dsaxton added 14 commits October 2, 2020 12:16

Fix and test

72be97f

Fix

e1d1ba3

Fix

d38839c

Nit

0da1c46

Nit

4ea7dcf

Change doc

376b06e

Merge remote-tracking branch 'upstream/master' into fix-is-numeric-he…

473d674

…lper

Escape

e564fee

Add failing test

6cec317

Edit

50ccbc0

Maybe

e725cb2

Merge remote-tracking branch 'upstream/master' into fix-is-numeric-he…

3a94daa

…lper

Comment

03e8cab

Remove

46949fd

jreback requested changes Oct 2, 2020

View reviewed changes

jreback added the Output-Formatting __repr__ of pandas objects, to_string label Oct 2, 2020

dsaxton added 3 commits October 3, 2020 01:05

Compile and comment

11f20d0

Merge remote-tracking branch 'upstream/master' into fix-is-numeric-he…

44f0792

…lper

Merge remote-tracking branch 'upstream/master' into fix-is-numeric-he…

229b5fc

…lper

ivanovmg reviewed Oct 8, 2020

View reviewed changes

jreback requested changes Oct 8, 2020

View reviewed changes

pandas/io/formats/format.py Outdated Show resolved Hide resolved

pandas/io/formats/format.py Show resolved Hide resolved

Merge remote-tracking branch 'upstream/master' into fix-is-numeric-he…

eec9d89

…lper

dsaxton added 2 commits October 8, 2020 16:11

Move into other function

0e0dd92

Merge remote-tracking branch 'upstream/master' into fix-is-numeric-he…

956ecf3

…lper

jreback added this to the 1.2 milestone Oct 10, 2020

dsaxton added 3 commits October 10, 2020 18:03

Merge remote-tracking branch 'upstream/master' into fix-is-numeric-he…

25dd8f1

…lper

Note

eca0413

Merge remote-tracking branch 'upstream/master' into fix-is-numeric-he…

00b71b2

…lper

jreback requested changes Oct 11, 2020

View reviewed changes

jreback approved these changes Oct 14, 2020

View reviewed changes

jreback merged commit 44ab7a1 into pandas-dev:master Oct 14, 2020

dsaxton deleted the fix-is-numeric-helper branch October 14, 2020 15:06

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Oct 26, 2020

BUG/CLN: Clean float / complex string formatting (pandas-dev#36799)

90a8e7a

kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020

BUG/CLN: Clean float / complex string formatting (pandas-dev#36799)

2cd8632

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG/CLN: Clean float / complex string formatting #36799

BUG/CLN: Clean float / complex string formatting #36799

dsaxton commented Oct 2, 2020

simonjayhawkins commented Oct 2, 2020

dsaxton commented Oct 2, 2020

jreback left a comment

jreback Oct 2, 2020

dsaxton commented Oct 4, 2020

ivanovmg Oct 8, 2020

dsaxton Oct 8, 2020

ivanovmg Oct 8, 2020

dsaxton Oct 8, 2020 •

edited

Loading

jreback commented Oct 10, 2020

dsaxton commented Oct 11, 2020

jreback Oct 11, 2020

dsaxton Oct 11, 2020

jreback commented Oct 14, 2020



		def test_to_string_complex_number_trims_zeros():
		s = pd.Series([1.000000 + 1.000000j, 1.0 + 1.0j, 1.05 + 1.0j])

BUG/CLN: Clean float / complex string formatting #36799

BUG/CLN: Clean float / complex string formatting #36799

Conversation

dsaxton commented Oct 2, 2020

simonjayhawkins commented Oct 2, 2020

dsaxton commented Oct 2, 2020

jreback left a comment

Choose a reason for hiding this comment

jreback Oct 2, 2020

Choose a reason for hiding this comment

dsaxton commented Oct 4, 2020

ivanovmg Oct 8, 2020

Choose a reason for hiding this comment

dsaxton Oct 8, 2020

Choose a reason for hiding this comment

ivanovmg Oct 8, 2020

Choose a reason for hiding this comment

dsaxton Oct 8, 2020 • edited Loading

Choose a reason for hiding this comment

jreback commented Oct 10, 2020

dsaxton commented Oct 11, 2020

jreback Oct 11, 2020

Choose a reason for hiding this comment

dsaxton Oct 11, 2020

Choose a reason for hiding this comment

jreback commented Oct 14, 2020

dsaxton Oct 8, 2020 •

edited

Loading