Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pd.eval() discards imaginary part in division "/" #21374

Open
fillipe-gsm opened this issue Jun 8, 2018 · 7 comments
Open

pd.eval() discards imaginary part in division "/" #21374

fillipe-gsm opened this issue Jun 8, 2018 · 7 comments
Labels
Bug Complex Complex Numbers expressions pd.eval, query

Comments

@fillipe-gsm
Copy link

Code Sample

data = {"a": [1 + 2j], "b": [1 + 1j]}
df = pd.DataFrame(data = data)
df.eval("a/b")

/usr/local/lib64/python3.6/site-packages/pandas/core/dtypes/cast.py:730: ComplexWarning: Casting complex values to real discards the imaginary part
  return arr.astype(dtype, copy=True)

0    1.0
dtype: float64

Problem description

The output type was coerced into a float. This also happens by assigning the result to another existing column:

data = {"a": [1 + 2j], "b": [1 + 1j], "c": [1j]}
df = pd.DataFrame(data = data)
df.eval("c = a/b")

/usr/local/lib64/python3.6/site-packages/pandas/core/dtypes/cast.py:730: ComplexWarning: Casting complex values to real discards the imaginary part
  return arr.astype(dtype, copy=True)
Out[82]: 
        a       b    c
0  (1+2j)  (1+1j)  1.0

And even if the operation is in place:

data = {"a": [1 + 2j], "b": [1 + 1j], "c": [1j]}
df = pd.DataFrame(data = data)
df.eval("c = a/b", inplace = True)
/usr/local/lib64/python3.6/site-packages/pandas/core/dtypes/cast.py:730: ComplexWarning: Casting complex values to real discards the imaginary part
  return arr.astype(dtype, copy=True)

df
        a       b    c
0  (1+2j)  (1+1j)  1.0

Expected Output

The expected output is

df["a"]/df["b"]

0    (1.5+0.5j)
dtype: complex128

The problem seems to happen only with the "/" operator. In fact, the correct result can be obtained by replacing the division with a multiplication and a negative exponent:

df.eval("a*b**(-1)")

0    (1.5+0.5j)
dtype: complex128

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.16.12-300.fc28.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.utf8
LOCALE: en_US.UTF-8

pandas: 0.23.0
pytest: None
pip: 9.0.3
setuptools: 39.2.0
Cython: None
numpy: 1.14.4
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: None
patsy: 0.4.1
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.1
feather: None
matplotlib: 2.2.2
openpyxl: None
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: None
lxml: 4.1.1
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@gfyoung gfyoung added Bug Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Compat pandas objects compatability with Numpy or Python functions labels Jun 8, 2018
@gfyoung
Copy link
Member

gfyoung commented Jun 8, 2018

@fillipe-gsm : How odd! Investigation and patch are welcome!

@uds5501
Copy link
Contributor

uds5501 commented Jun 8, 2018

@gfyoung @fillipe-gsm maybe the problem lies in the usage of numpy.imag casting (if we are using it) , have a look at it's documentation here : https://docs.scipy.org/doc/numpy/reference/generated/numpy.imag.html

It also mentions that " If val is real, the type of val is used for the output. If val has complex elements, the returned type is float. "
Can it happen that "/" operator triggers the type caster and the "**(-1)" doesn't?

@fillipe-gsm
Copy link
Author

@gfyoung @uds5501
I did some throghout investigation line by line starting in the pandas.core.computation.eval module, and I may have found some lead in the Div class inside the pandas.core.computation.ops module, which seems to be created specially to handle divisions. Given the odd behavior happening only with the "/" operator, this makes some sense. Since it is a small code, here it is:

class Div(BinOp):

    """Div operator to special case casting.

    Parameters
    ----------
    lhs, rhs : Term or Op
        The Terms or Ops in the ``/`` expression.
    truediv : bool
        Whether or not to use true division. With Python 3 this happens
        regardless of the value of ``truediv``.
    """

    def __init__(self, lhs, rhs, truediv, *args, **kwargs):
        super(Div, self).__init__('/', lhs, rhs, *args, **kwargs)

        if not isnumeric(lhs.return_type) or not isnumeric(rhs.return_type):
            raise TypeError("unsupported operand type(s) for {0}:"
                            " '{1}' and '{2}'".format(self.op,
                                                      lhs.return_type,
                                                      rhs.return_type))

        if truediv or PY3:
            # do not upcast float32s to float64 un-necessarily
            acceptable_dtypes = [np.float32, np.float_]
            _cast_inplace(com.flatten(self), acceptable_dtypes, np.float_)

It seems that this class is simply not ready to handle complex numbers given the instruction acceptable_dtypes = [np.float32, np.float_]. Thus, I tried to replace this line with acceptable_dtypes = [np.float32, np.float_, np.complex_, np.complex64]. Apparently, this solves the problem:

data = {"a": [1 + 2j], "b": [1 + 1j]}
 df = pd.DataFrame(data = data)
 df.eval("a/b")
  
0    (1.5+0.5j)
dtype: complex128

What do you guys think?

@gfyoung
Copy link
Member

gfyoung commented Jun 13, 2018

@fillipe-gsm : That seems reasonable to me. A PR for this would be great!

@jreback jreback added this to the 0.23.2 milestone Jun 15, 2018
@jorisvandenbossche jorisvandenbossche removed this from the 0.23.2 milestone Jun 21, 2018
@jbrockmendel jbrockmendel added the Numeric Operations Arithmetic, Comparison, and Logical operations label Aug 3, 2019
@jbrockmendel jbrockmendel added the expressions pd.eval, query label Oct 22, 2019
@mroeschke mroeschke removed Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Compat pandas objects compatability with Numpy or Python functions labels Apr 10, 2020
@mroeschke mroeschke removed the Numeric Operations Arithmetic, Comparison, and Logical operations label Jun 20, 2021
@jbrockmendel jbrockmendel added the Complex Complex Numbers label Oct 29, 2021
@mutricyl
Copy link
Contributor

This is quite an old issue but it was not solved and it looks similar to issues I encountered working on hgrecco/pint-pandas#137 and #58748 :

This makes me questioning the use of _cast_inplace function that was introduced by cc1025a to resolve #12388. It looks it was designed to manage only floats at the time. Now complex and ExtensionArray and other shall be managed.

A specific test was introduced at the time. It is now here:

def test_binop_typecasting(self, engine, parser, op, float_numpy_dtype, left_right):

I tried to run this test removing the used of _cast_inplace and the test ran successfully. Changes in numpy or expression backend seams to deprecate the use of _cast_inplace. When it is removed complex and ExtensionArray are computed properly @jreback @jennolsen84 do you have any opinion on the matter?

Since TypeError: unsupported operan types (...) is also manage somewhere else Div class of pandas.core.computation.ops would not be necessary anymore and we could remove it.

class Div(BinOp):

mutricyl pushed a commit to mutricyl/pandas that referenced this issue May 24, 2024
mutricyl pushed a commit to mutricyl/pandas that referenced this issue May 24, 2024
@mutricyl
Copy link
Contributor

note that test pandas.tests.frame.test_query_eval.test_extension_array_eval that may be added by #58793 shall then be modified to pass (resulting dtype is no more float but pandas.arrays.FloatingArray

-    expected = Series([0.25, 0.40, 0.50])
+    expected = Series(pd.array([0.25, 0.40, 0.50]))

@mutricyl
Copy link
Contributor

mutricyl commented Jun 4, 2024

I worked a bit on this and I am facing an issue when updating test_binop_typecasting to cover for complex (it covers floats only when it could be upgraded to int and complex). complex64 are upcasted to complex128 when using numexpr engine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Complex Complex Numbers expressions pd.eval, query
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants