Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return pandas.DataFrame #274

Open
xinbinhuang opened this issue Jul 21, 2020 · 21 comments
Open

Return pandas.DataFrame #274

xinbinhuang opened this issue Jul 21, 2020 · 21 comments
Labels

Comments

@xinbinhuang
Copy link

xinbinhuang commented Jul 21, 2020

Hi team,

It seems that currently fire cannot return dataframe from function or method calls. I would to like make a contribution to this and create a PR if this would be good feature to fire.

Let me know how you thinks

Cheers
Bin

@nfultz
Copy link
Contributor

nfultz commented Jul 21, 2020

That's odd, this worked in ~2018, I demoed it in a lightning talk at a pydata meetup - https://docs.google.com/presentation/d/1NHAYGN4Fx4YBVcaGAiJ4U3gEkyhhgE643r7Ka2MF2wI/edit?usp=sharing

EDIT:

My script still works, you just need to manually call to_string at the end

$ pd read_csv drive/aws_costs.csv - mean - round - to_string

@xinbinhuang
Copy link
Author

Hmm I see. I didn't know about the to_string method for Dataframe.

My Fire component is actually a class which utilize pandas under the hood and it returns a dataframe for most of the methods.

I think it's not intuitive for end-user (i.e. my client and colleague) to know that they need to chain command with - to_string to render the output properly. Given that dataframe is such a popular object right now, do you think it's worth adding the - to_string as default behavior to Fire?

@nfultz
Copy link
Contributor

nfultz commented Jul 21, 2020

If you look at the stack trace, it's because DataFrames fail on inspect, which is used by fire to generate the list of subcommands.

You reproduce it ouside of fire using:

>>> import pandas
>>> import inspect
>>> df = pandas.read_csv("drive/aws_costs.csv")
>>> inspect.getmembers(df)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.6/inspect.py", line 342, in getmembers
    value = getattr(object, key)
  File "/home/nfultz/.local/lib/python3.6/site-packages/pandas/core/frame.py", line 409, in _constructor_expanddim
    raise NotImplementedError("Not supported for DataFrames!")
NotImplementedError: Not supported for DataFrames!

So if you make pandas work with inspect, fire will be compatible accordingly, or at least not crash.

OTOH, I'm not sure generating a list of the data frame API is very helpful since it's quite large.

@dbieber
Copy link
Member

dbieber commented Jul 21, 2020

Python Fire should work even when inspect.getmembers doesn't. I'll mark this as a bug.

@dbieber dbieber added the bug label Jul 21, 2020
@nfultz
Copy link
Contributor

nfultz commented Jul 21, 2020

Fair enough. See also pandas-dev/pandas#31474 , they fixed it but haven't released it yet.

It's also been filed against inspect https://bugs.python.org/issue35108#msg361208

@xinbinhuang
Copy link
Author

@nfultz I am not familiar with inspect.getmembers in general, but I do get impressed by how quickly you gather all these information! These information is pretty good for me probably other people as a reference.

Thank you so much for the help!

@dbieber
Copy link
Member

dbieber commented Oct 3, 2020

Would you be able to provide a minimal example reproducing the issue?

@genos
Copy link

genos commented Oct 13, 2020

Would you be able to provide a minimal example reproducing the issue?

Please pardon me for barging in, but I use Pandas a lot and have really enjoyed using fire for CLIs.

Script

#!/usr/bin/env python3
import fire
import pandas as pd

def test(n: int) -> pd.DataFrame:
    return pd.DataFrame(data=range(n))

if __name__ == "__main__":
    fire.Fire(test)

Environment

Setting up a Python 3.8 virtual environment, activating it, installing dependencies:

~/tmp ∃ python3 -m venv env
~/tmp ∃ source ./env/bin/activate
(env) ~/tmp ∃ pip install -U pip fire jinja2 pandas -qqq
(env) ~/tmp ∃ python --version
Python 3.8.6
(env) ~/tmp ∃ pip freeze
fire==0.3.1
Jinja2==2.11.2
MarkupSafe==1.1.1
numpy==1.19.2
pandas==1.1.3
python-dateutil==2.8.1
pytz==2020.1
six==1.15.0
termcolor==1.1.0

Running

It looks like our version of Pandas fixes the inspect-related NotImplementedError mentioned above; when running the script inside our environment, we get a ValueError

(env) ~/tmp ∃ ./w.py
ERROR: The function received no value for the required argument: n
Usage: w.py N

For detailed information on this command, run:
  w.py --help
(env) ~/tmp ∃ ./w.py 10
/usr/local/Cellar/python@3.8/3.8.6/Frameworks/Python.framework/Versions/3.8/lib/python3.8/inspect.py:350: FutureWarning: _AXIS_NAMES has been deprecated.
  value = getattr(object, key)
/usr/local/Cellar/python@3.8/3.8.6/Frameworks/Python.framework/Versions/3.8/lib/python3.8/inspect.py:350: FutureWarning: _AXIS_NUMBERS has been deprecated.
  value = getattr(object, key)
Traceback (most recent call last):
  File "./w.py", line 9, in <module>
    fire.Fire(test)
  File "/Users/graham/tmp/env/lib/python3.8/site-packages/fire/core.py", line 164, in Fire
    _PrintResult(component_trace, verbose=component_trace.verbose)
  File "/Users/graham/tmp/env/lib/python3.8/site-packages/fire/core.py", line 266, in _PrintResult
    help_text = helptext.HelpText(
  File "/Users/graham/tmp/env/lib/python3.8/site-packages/fire/helptext.py", line 63, in HelpText
    actions_grouped_by_kind = _GetActionsGroupedByKind(component, verbose=verbose)
  File "/Users/graham/tmp/env/lib/python3.8/site-packages/fire/helptext.py", line 332, in _GetActionsGroupedByKind
    members = completion.VisibleMembers(component, verbose=verbose)
  File "/Users/graham/tmp/env/lib/python3.8/site-packages/fire/completion.py", line 365, in VisibleMembers
    return [
  File "/Users/graham/tmp/env/lib/python3.8/site-packages/fire/completion.py", line 367, in <listcomp>
    if MemberVisible(component, member_name, member, class_attrs=class_attrs,
  File "/Users/graham/tmp/env/lib/python3.8/site-packages/fire/completion.py", line 311, in MemberVisible
    if member in (absolute_import, division, print_function):
  File "/Users/graham/tmp/env/lib/python3.8/site-packages/pandas/core/generic.py", line 1329, in __nonzero__
    raise ValueError(
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

@dbieber
Copy link
Member

dbieber commented Oct 13, 2020

Thanks!

@genos
Copy link

genos commented Oct 13, 2020

Happy to contribute!

If we uninstall the released version of fire and instead install from the current master commit from GitHub:

(env) ~/tmp ∃ pip uninstall fire
Found existing installation: fire 0.3.1
Uninstalling fire-0.3.1:
  Would remove:
    /Users/graham/tmp/env/lib/python3.8/site-packages/fire-0.3.1-py3.8.egg-info
    /Users/graham/tmp/env/lib/python3.8/site-packages/fire/*
Proceed (y/n)? y
  Successfully uninstalled fire-0.3.1
(env) ~/tmp ∃ pip install git+https://github.com/google/python-fire.git@878b8d86f488ef2606cffdf58297dd2781708316 -qqq

it looks like we get dropped into the help information for pd.DataFrame:

(env) ~/tmp ∃ ./w.py 10
/usr/local/Cellar/python@3.8/3.8.6/Frameworks/Python.framework/Versions/3.8/lib/python3.8/inspect.py:350: FutureWarning: _AXIS_NAMES has been deprecated.
  value = getattr(object, key)
/usr/local/Cellar/python@3.8/3.8.6/Frameworks/Python.framework/Versions/3.8/lib/python3.8/inspect.py:350: FutureWarning: _AXIS_NUMBERS has been deprecated.
  value = getattr(object, key)
NAME
    w.py 10 - Two-dimensional, size-mutable, potentially heterogeneous tabular data.

SYNOPSIS
    w.py 10 GROUP | COMMAND | VALUE

DESCRIPTION
    Data structure also contains labeled axes (rows and columns).
    Arithmetic operations align on both row and column labels. Can be
    thought of as a dict-like container for Series objects. The primary
    pandas data structure.

GROUPS
    GROUP is one of the following:

     T
       Two-dimensional, size-mutable, potentially heterogeneous tabular data.

     at
       Access a single value for a row/column label pair.

     attrs

     axes

     columns
       Immutable Index implementing a monotonic integer range.

     dtypes
       One-dimensional ndarray with axis labels (including time series).

     iat
       Access a single value for a row/column pair by integer position.

     iloc
       Purely integer-location based indexing for selection by position.

     index
       Immutable Index implementing a monotonic integer range.

     loc
       Access a group of rows and columns by label(s) or a boolean array.

     plot
       Make plots of Series or DataFrame.

     shape

     style
       Helps style a DataFrame or Series according to the data with HTML and CSS.

COMMANDS
    COMMAND is one of the following:
     abs
       Return a Series/DataFrame with absolute numeric value of each element.

     add
       Get Addition of dataframe and other, element-wise (binary operator `add`).


# Several lines omitted

@genos
Copy link

genos commented Nov 24, 2020

Just tested against the latest commit (3be260e), still getting the same behavior.

@genos
Copy link

genos commented Dec 17, 2020

[ping] Just checked against the latest commit (bc43dca), still getting the same behavior.

@dbieber
Copy link
Member

dbieber commented Dec 17, 2020

Thanks for the pings.
I can confirm the behavior you're seeing.

The issue now seems to stem from our implementation of "HasCustomStr".

def HasCustomStr(component):

HasCustomStr should return whether an object defines a custom "__str__" function. DataFrames clearly do, since str(df) prints the dataframe nicely. However, HasCustomStr(df) returns False rather than True.

Next step is to figure out why.

@dbieber
Copy link
Member

dbieber commented Dec 17, 2020

Ah,

So if I'm understanding things correctly, here's what's happening.

DataFrames don't redefine __str__, they just redefine __repr__. The default implementation of __str__ is simply to call __repr__.

The fix will be to update our HasCustomStr method to also check for custom a __repr__ implementation. If either is present, we'll return True.

This will switch Fire's behavior from dropping you into the help screen to instead just printing out the DataFrame.

@dbieber
Copy link
Member

dbieber commented Dec 17, 2020

Well, my approach to detecting custom __str__ methods doesn't seem to work quite as simply for detecting custom __repr__. Depending on the type of the object, the default __repr__ implementation may come from "type" or "method" (where do I find this "method" object"?). This is going to require more poking around.

Also, this heuristic might not be good enough. E.g. OrderedDict redefines __repr__ but shouldn't be treated as a value. Will have to be more thorough to see what other places the heuristic breaks down in.

@jamesowers-ch
Copy link

jamesowers-ch commented Feb 26, 2021

Extending @genos's example above, here's a quick fix for anyone who lands here prior to a fix

#!/usr/bin/env python3
import functools

import fire
import pandas as pd

def test(n: int) -> pd.DataFrame:
    return pd.DataFrame(data=range(n))

if __name__ == "__main__":
    @functools.wraps(test)
    def _test(*args, **kwargs):
        return str(test(*args, **kwargs))
    fire.Fire(_test)

(functools.wraps means that you can still run test -h)

@mwort
Copy link

mwort commented Sep 4, 2021

I am still seeing this rather odd behaviour, also when returning dicts of pd.DataFrames, i.e. the help is shown, no error though after quitting the help. Returning tuples/lists of DataFrames works. A fix for this would be much appreciated!

@szeitlin
Copy link

We're also running into this problem. It would be great if this could get fixed (maybe we can do it if no one else has time).

@szeitlin
Copy link

@jamesowers-ch thanks for the hint! I ended up having to modify your example slightly in order to get it to work with automated tests:

def df_to_string(func):
    """
    See https://github.com/google/python-fire/issues/274
    for why we need this (for now)
    """
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        return str(func(*args, **kwargs))
    return wrapper

then I can just @df_to_string all of my methods without having to modify anything about the fire.Fire() call in __main__ or unittest.main()

jfaldanam added a commit to KhaosResearch/greensenti that referenced this issue Nov 8, 2022
@fredcallaway
Copy link

Running into this myself. The problem with the solution above is that it now prevents the function from being used programmatically (e.g. you actually want to operate on the dataframe).

I think the best solution here is to have a list of types that are automatically rendered as strings when they are the output of a function. It seems that dict and list already have that behavior; so I think something like this list might already exist in the code. We just need to make it user-modifiable.

@fredcallaway
Copy link

fredcallaway commented Nov 19, 2022

For now, here is a generalization of @jamesowers-ch's solution to classes. Although note that this doesn't allow the CLI to call it's own methods, which is probably one of the most frequent cases where you'd want to have programmatic access...

#!/usr/bin/env python3
import fire
import pandas as pd
from functools import wraps
import pprint

class CLI(object):
    """Command Line Interface"""
    def foo(self):
        return pd.DataFrame({'x': [1,2,3], 'y': [4,5,6]})

def str_output_wrapper(method):
    @wraps(method)
    def wrapped(*args, **kwargs):
        return pprint.pformat(method(*args, **kwargs))
    return wrapped

if __name__ == '__main__':
    for key, value in CLI.__dict__.items( ):
        if not key.startswith('_') and hasattr(value, '__call__' ):
            setattr(CLI, key, str_output_wrapper(value))

    fire.Fire(CLI)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants