COMPAT: show warning when using numexpr not installed and defaulted #12749

Closed
sciunto opened this Issue Mar 30, 2016 · 10 comments

Projects

None yet

2 participants

@sciunto
sciunto commented Mar 30, 2016

cc @mrocklin

Hi,

Context:

Right now, pandas treats numexpr as an optional dependency. I'm a packager for archlinux, and recently, I got a feedback on the dask package saying that numexpr was missing as a dependency of this package. https://aur.archlinux.org/packages/python-dask/
However, dask does not explicitly on numexpr. The reason is detailed below.

Analysis:

In pandas.computations, eval() takes an optional argument engine='numexpr'.
If numexpr is not install, then any call with default arguments will raise an exception importError from the function _check_engine in pandas/computations/eval.py.

eval() is called (at least) from query, that's why we are in trouble if we run dask's test without numexpr.

RFC:

Here is the question: is numexpr really an optional dependency since it's the default argument?
I would say no, but comments are open :)
From the dask devs point of view, they do not have to mark numexpr as a dependency because they do not use it explicitly. To me, they can expect that default arguments from pandas work out of the box.
From the pandas packager (not me), pandas says that numexpr is optional, treated as optional. No problem here too, he followed the guidelines.

I see two options:

  • pandas changes the default backend
  • or pandas adds numexpr as a true dependency.
@jreback
Contributor
jreback commented Mar 30, 2016

this was fixed in 0.18.0. previously if you tried to use .eval and numexpr wasn't installed it would bork.

@jreback jreback closed this Mar 30, 2016
@jreback jreback added the Compat label Mar 30, 2016
@jreback
Contributor
jreback commented Mar 30, 2016

xref #12511

@sciunto
sciunto commented Mar 30, 2016

The problem is obviously not fixed (or not correctly) for the case I describe because it happens with pandas 0.18.0. It's annoying to see a post closed without discussion. :(

@sciunto
sciunto commented Mar 30, 2016

Note that I also read the code in master before opening this issue.

@jreback
Contributor
jreback commented Mar 30, 2016

@sciunto so what is the reproducible example then. are you sure you are using 0.18.0

@jreback
Contributor
jreback commented Mar 30, 2016

also you didn't read our issue submission guidelines. I don't see pd.show_versions()

@jreback
Contributor
jreback commented Mar 30, 2016

I suppose you are doing this?

In [1]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 14.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8

pandas: 0.18.0
nose: 1.3.7
pip: 8.0.3
setuptools: 20.2.2
Cython: 0.23.4
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: 0.7.2
IPython: 4.1.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.5.2
pytz: 2016.1
blosc: 1.2.8
bottleneck: 1.0.0
tables: None
numexpr: None
matplotlib: 1.5.0
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.3.2
html5lib: 0.999
httplib2: 0.9.2
apiclient: 1.4.1
sqlalchemy: 1.0.11
pymysql: 0.6.6.None
psycopg2: 2.6.1 (dt dec pq3 ext)
jinja2: 2.8
boto: 2.39.0
In [2]: df = DataFrame({'A' : [1,2,3]})

In [3]: df.query('A>0')
ImportError: 'numexpr' is not installed or an unsupported version. Cannot use engine='numexpr' for query/eval if 'numexpr' is not installed
In [4]: df.query('A>0',engine='python')
Out[4]: 
   A
0  1
1  2
2  3
@jreback
Contributor
jreback commented Mar 30, 2016

I guess that could be more friendly. Though maybe should show a PerformanceWarning that you are trying to use a function that is intended for a highly recommended dep.

@jreback jreback reopened this Mar 30, 2016
@sciunto
sciunto commented Mar 30, 2016

Yes I'm sure...

yaourt -Ss python-pandas
community/python-pandas 0.18.0-1 [installed]
    Cross-section and time series data analysis toolkit
=================================================================================== FAILURES ====================================================================================
__________________________________________________________________________________ test_query ___________________________________________________________________________________

    def test_query():
        df = pd.DataFrame({'x': [1, 2, 3, 4], 'y': [5, 6, 7, 8]})
        a = dd.from_pandas(df, npartitions=2)
>       q = a.query('x**2 > y')

dask/dataframe/tests/test_dataframe.py:1280:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
dask/dataframe/core.py:1467: in query
    dummy = self._pd.query(expr, **kwargs)
/usr/lib/python3.5/site-packages/pandas/core/frame.py:2140: in query
    res = self.eval(expr, **kwargs)
/usr/lib/python3.5/site-packages/pandas/core/frame.py:2209: in eval
    return _eval(expr, inplace=inplace, **kwargs)
/usr/lib/python3.5/site-packages/pandas/computation/eval.py:233: in eval
    _check_engine(engine)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

engine = 'numexpr'

    def _check_engine(engine):
        """Make sure a valid engine is passed.

        Parameters
        ----------
        engine : str

        Raises
        ------
        KeyError
          * If an invalid engine is passed
        ImportError
          * If numexpr was requested but doesn't exist
        """
        if engine not in _engines:
            raise KeyError('Invalid engine {0!r} passed, valid engines are'
                           ' {1}'.format(engine, list(_engines.keys())))

        # TODO: validate this in a more general way (thinking of future engines
        # that won't necessarily be import-able)
        # Could potentially be done on engine instantiation
        if engine == 'numexpr':
            if not _NUMEXPR_INSTALLED:
>               raise ImportError("'numexpr' is not installed or an "
                                  "unsupported version. Cannot use "
                                  "engine='numexpr' for query/eval "
                                  "if 'numexpr' is not installed")
E               ImportError: 'numexpr' is not installed or an unsupported version. Cannot use engine='numexpr' for query/eval if 'numexpr' is not installed

/usr/lib/python3.5/site-packages/pandas/computation/eval.py:39: ImportError
______________________________________________________________________________ test_to_hdf_kwargs _______________________________________________________________________________

    def test_to_hdf_kwargs():
        df = pd.DataFrame({'A': ['a', 'aaaa']})
        ddf = dd.from_pandas(df, npartitions=2)
>       ddf.to_hdf('tst.h5', 'foo4', format='table', min_itemsize=4)

dask/dataframe/tests/test_io.py:1028:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
dask/dataframe/core.py:532: in to_hdf
    fletcher32, get=get, **kwargs)
dask/dataframe/io.py:655: in to_hdf
    get=get, **dask_kwargs)
dask/base.py:43: in _get
    return get(dsk2, keys, **kwargs)
dask/async.py:516: in get_sync
    raise_on_exception=True, **kwargs)
dask/async.py:462: in get_async
    fire_task()
dask/async.py:458: in fire_task
    get_id, raise_on_exception])
dask/async.py:508: in apply_sync
    return func(*args, **kwds)
dask/async.py:264: in execute_task
    result = _execute_task(task, data)
dask/async.py:245: in _execute_task
    args2 = [_execute_task(a, cache) for a in args]
dask/async.py:245: in <listcomp>
    args2 = [_execute_task(a, cache) for a in args]
dask/async.py:246: in _execute_task
    return func(*args2)
dask/compatibility.py:26: in apply
    return func(*args, **kwargs)
/usr/lib/python3.5/site-packages/pandas/core/generic.py:1096: in to_hdf
    return pytables.to_hdf(path_or_buf, key, self, **kwargs)
/usr/lib/python3.5/site-packages/pandas/io/pytables.py:259: in to_hdf
    complib=complib) as store:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <[RecursionError("maximum recursion depth exceeded") raised in repr()] HDFStore object at 0x7f7298752198>, path = 'tst.h5', mode = 'a', complevel = 0, complib = None
fletcher32 = False, kwargs = {}

    def __init__(self, path, mode=None, complevel=None, complib=None,
                 fletcher32=False, **kwargs):
        try:
            import tables  # noqa
        except ImportError as ex:  # pragma: no cover
            raise ImportError('HDFStore requires PyTables, "{ex}" problem '
>                             'importing'.format(ex=str(ex)))
E           ImportError: HDFStore requires PyTables, "No module named 'tables'" problem importing

/usr/lib/python3.5/site-packages/pandas/io/pytables.py:389: ImportError
========================================================= 2 failed, 846 passed, 21 skipped, 5 xfailed in 375.39 seconds =========================================================
==> ERREUR : Une erreur s’est produite dans check().
    Abandon...

Please, read carefully my first post.

@jreback
Contributor
jreback commented Mar 30, 2016

well, you can submit a pull-request if you would like.

I DID read your first post. we have 1600 issues.

@jreback jreback added this to the Next Major Release milestone Mar 30, 2016
@jreback jreback changed the title from [RFC] is numexpr really an optional dependency? to COMPAT: show warning when using numexpr not installed and defaulted Mar 30, 2016
@jreback jreback modified the milestone: 0.18.1, Next Major Release Apr 10, 2016
@jreback jreback added a commit to jreback/pandas that referenced this issue Apr 11, 2016
@jreback jreback COMPAT: .query/.eval should work w/o numexpr being installed if possible
closes #12749
closes #12864
d49c263
@jreback jreback closed this in 504ad46 Apr 11, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment