Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: .to_string float_format behavior inconsistent with documentation and different for extension types #53675

Open
1 task done
zmoon opened this issue Jun 14, 2023 · 7 comments
Assignees
Labels
Bug Docs ExtensionArray Extending pandas with custom dtypes or arrays.

Comments

@zmoon
Copy link
Contributor

zmoon commented Jun 14, 2023

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_string.html

Documentation problem

DataFrame.to_string docstring says float_format should be "one-parameter function, optional, default None", but passing a format string (e.g. '%.3e') works too, like in .to_csv, for NumPy float type columns/series.

However, unlike .to_csv, .to_string with float_format string fails with TypeError: 'str' object is not callable for float extension type columns/series.

s = pd.Series([1.1, 2.2, 3.3])

# works
s.to_csv(float_format="%.3e")
s.convert_dtypes().to_csv(float_format="%.3e")
s.to_string(float_format="%.3e")

# fails
s.convert_dtypes().to_string(float_format="%.3e")

xref: #9448 pandas-dev/pandas-stubs#730

Suggested fix for documentation

It would be nice for the documentation to reflect that a format string can be used, but probably the issue with extension types should be fixed first.

Note also that FloatFormatType, used in the internal typing of Data.Frame.to_string,

float_format: fmt.FloatFormatType | None = ...,

does include str as an option.

FloatFormatType = Union[str, Callable, "EngFormatter"]

@zmoon zmoon added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 14, 2023
@topper-123
Copy link
Contributor

topper-123 commented Jun 27, 2023

This does accept formatting string, so the docs could need some updating here and the type hint for Series.to_string float format should be the same as in DataFrame.to_string.

This failure you're showing looks like a bug, e.g. if we do s.convert_dtypes().to_string(float_format="xxx") we get the same error, while it should give ValueError: Invalid format specifier or similar.

A PR for both doc update and bug are welcome.

@topper-123 topper-123 added Bug ExtensionArray Extending pandas with custom dtypes or arrays. and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 27, 2023
@rsm-23
Copy link
Contributor

rsm-23 commented Jul 5, 2023

@topper-123 for the bug, are we trying to add this functionality for Series.to_string as well?

@topper-123
Copy link
Contributor

Yes, Series.to_string should accept the same input for float_format as DataFrame.to_string. It may already do that, in which case it's just a question of updating the type hint in Series.to_string.

@otitoU
Copy link

otitoU commented Jul 24, 2023

take

@rsm-23
Copy link
Contributor

rsm-23 commented Jul 29, 2023

hey @otitoU how is it going? Would you mind if I take a look as well?

@otitoU
Copy link

otitoU commented Jul 31, 2023

Hey @rsm-23 . it is going good! Sorry, i have already spent some time on it, I hope you don't mind I would like to finish this.

@otitoU
Copy link

otitoU commented Aug 19, 2023

Hey @phofl just to give you a bit of context.

what is the cause of the bug

So the Error stack shows the problem arises at file pandas/core/series.py line 1765 when the
“result = formatter.to_string()” is called. formatter is an instance of the SeriesFormatter Class from the fmt/format.py internal module.
There are different types of formatter used to format the specific array depending on the values.dtype of the values array like FloatArrayFormatterspecific , GenericArrayFormatter etc all can be found in the format.py file/fmt module. The specific formatter that the formatter.to_string() of the series.py file uses is the GenericArrayFormatter
In the GenericArrayFormatter it iterates over vals which is an array of the values to be formatted. if a value v in the vals is a float type it does does ”fmt_values.append(float_format(v))” which attempts to append the float formatted value in the fmt_values
This is where the real problem occurs. FloatFormatType = Union[str, Callable, "EngFormatter"] according to the type hint the FloatFormat can in the form of Str, Callable or Engformatter
But the float_format passed here is str and it attempts to use it as a Callable which is why the error
/format.py", line 1404, in _format_strings
fmt_values.append(float_format(v))
TypeError: 'str' object is not callable

The are two potential fixes are

  • Editing the GenericArrayFormatter - FloatArrayFormatter doesn't have this issue so I am thinking of implementing the way the FloatArrayFormatter formats its values with the float_formatter in the GenericArrayFormatter class.
    FloatArrayFormatter uses this function to do so def format_values_with(float_format):
  • Editing the ExtensionArrayFormatter - I want to convert the np array to float so it uses FloatArrayFormatter instead of GenericArrayFormatter

Then get a return which would be a formatted value
in line 1402 in the is_float_type[i] condition I want to check if the float_format is Callable:
fmt_values.append(float_format(v))
Else:
append to returned value from the function format_values_with(float_format) to the fmt_values

My proposed solution is to Edit the GenericArrayFormatter
I just wanted to know if you agree or have suggestions/comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Docs ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

No branches or pull requests

4 participants