ENH: DataFrame.to_expr() method #41837

kerrickstaley · 2021-06-06T04:48:43Z

Is your feature request related to a problem?

I would like to write some unit tests for my Pandas code. I want to test that some DataFrame is equal to an expected value. The expected value is complicated and I would like an easy way to get the Python code to construct it. My program already computes the expected DataFrame value but I need a way to serialize/deserialize it for use in my test code.

Here is a StackOverflow question with more detail.

Describe the solution you'd like

I would like there to be a DataFrame.to_expr() method. It should return a str containing valid Python code that can be used to re-construct the DataFrame.

To the greatest extent possible, it should be true that pd.testing.assert_frame_equal(df1, eval(df2.to_expr())) throws an AssertionError if and only if pd.testing.assert_frame_equal(df1, df2) throws an AssertionError. I am using assert_frame_equal because it checks column dtypes, whereas DataFrame.equals() does not.

Concretely, I think the return value of .to_expr() should be something like

pandas.DataFrame({'column_1': pandas.Series([1, 2, 3], dtype='int64'), 'column_2': pandas.Series([1.0, 2.0, 3.0], dtype='float64')})

Note that on many Python objects, this .to_expr() method is called __repr__(). The Python docs state:

For many types, [__repr__] makes an attempt to return a string that would yield an object with the same value when passed to eval()...

However, DataFrame.__repr__ is already defined to print a different representation (which is arguably more useful in an interactive environment).

API breaking implications

This is a backwards-compatible change.

Describe alternatives you've considered

I've used DataFrame.to_dict() and DataFrame.from_dict() for this purpose in the past. However, this doesn't preserve the type, and so it doesn't work if you're working with an empty DataFrame. I also worry that from_dict will sometimes fail to infer the original type even for non-empty DataFrames.

The text was updated successfully, but these errors were encountered:

mzeitlin11 · 2021-06-07T17:37:43Z

Thanks for the request @kerrickstaley! For an existing round-tripping option, does to_pickle work for your purposes? (or functions like to_csv if you want something more readable)

kerrickstaley · 2021-06-07T22:44:11Z

to_pickle works, but its downside is that it doesn't produce a human-readable test.
to_csv mostly works, but it doesn't work in the case where you're testing an empty dataframe, and maybe there are other cases where round-tripping through CSV doesn't preserve column types; I'm not sure.

So I think there is still a use-case for a to_expr method.

mzeitlin11 · 2021-07-01T21:03:46Z

Thanks for explaining your reasoning @kerrickstaley. I personally don't find this use case compelling enough to add a new to_* method (the API is huge already :) - sounds better suited split into its own package where a bunch of different formats could be supported. But curious if others have thoughts / are interested in this feature.

timlod · 2021-07-21T14:16:55Z

+1

I would find this very useful as well, for generating small test cases!
Our test cases usually use very small representable dataframes that we write manually. Right now I mainly print df.to_numpy(), and copy this into the pd.DataFrame constructor including the column names, but the process is a little tedious. to_expr() would greatly speed up this process - we can now craft the test cases interactively, and persist the input/expected dataframes directly into the test suite.

If anyone else has a better solution, I'd also be interested to hear it.

kerrickstaley added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 6, 2021

mzeitlin11 added DataFrame DataFrame data structure IO Data IO issues that don't fit into a more specific label Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: DataFrame.to_expr() method #41837

ENH: DataFrame.to_expr() method #41837

kerrickstaley commented Jun 6, 2021

mzeitlin11 commented Jun 7, 2021

kerrickstaley commented Jun 7, 2021

mzeitlin11 commented Jul 1, 2021

timlod commented Jul 21, 2021

ENH: DataFrame.to_expr() method #41837

ENH: DataFrame.to_expr() method #41837

Comments

kerrickstaley commented Jun 6, 2021

Is your feature request related to a problem?

Describe the solution you'd like

API breaking implications

Describe alternatives you've considered

mzeitlin11 commented Jun 7, 2021

kerrickstaley commented Jun 7, 2021

mzeitlin11 commented Jul 1, 2021

timlod commented Jul 21, 2021