Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for pandas.DataFrame.eval()? #42

Closed
gipert opened this issue Jan 2, 2024 · 3 comments
Closed

Support for pandas.DataFrame.eval()? #42

gipert opened this issue Jan 2, 2024 · 3 comments

Comments

@gipert
Copy link

gipert commented Jan 2, 2024

I've just ran a quick test and it seems to me that pandas.DataFrame.eval() is not supported. Is that correct? Is there a way to evaluate string expressions on dataframes containing Awkward arrays?

@jpivarski
Copy link
Collaborator

Is it an error message saying that it's not supported?

If Pandas is just running Python eval with column names loaded into the namespace (as the documentation suggests), then I don't see why those strings couldn't operate directly on Awkward Arrays.

Maybe the AwkwardSeries objects need to be unwrapped before passing to eval and the result needs to be re-wrapped? (This is a question for @douglasdavis.)

@gipert
Copy link
Author

gipert commented Jan 2, 2024

I don't see why those strings couldn't operate directly on Awkward Arrays.

Indeed that was my thinking. This is what happens:

>>> import awkward_pandas as akpd
>>> import pandas as pd
>>> import awkward as ak
>>> df = pd.DataFrame(
    ...:   {
    ...:     "a": [1, 2, 3, 4],
    ...:     "b": akpd.from_awkward(ak.Array([[1, 2], [], [3], [4, 5, 6]]))
    ...:   }
    ...: )
>>> df.eval("b*2")
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[10], line 1
----> 1 df.eval("b*2")

File ~/.virtualenvs/legend/lib/python3.11/site-packages/pandas/core/frame.py:4566, in DataFrame.eval(self, expr, inplace, **kwargs)
   4563     kwargs["target"] = self
   4564 kwargs["resolvers"] = tuple(kwargs.get("resolvers", ())) + resolvers
-> 4566 return _eval(expr, inplace=inplace, **kwargs)

File ~/.virtualenvs/legend/lib/python3.11/site-packages/pandas/core/computation/eval.py:336, in eval(expr, parser, engine, local_dict, global_dict, resolvers, level, target, inplace)
    327 # get our (possibly passed-in) scope
    328 env = ensure_scope(
    329     level + 1,
    330     global_dict=global_dict,
   (...)
    333     target=target,
    334 )
--> 336 parsed_expr = Expr(expr, engine=engine, parser=parser, env=env)
    338 if engine == "numexpr" and (
    339     is_extension_array_dtype(parsed_expr.terms.return_type)
    340     or getattr(parsed_expr.terms, "operand_types", None) is not None
   (...)
    344     )
    345 ):
    346     warnings.warn(
    347         "Engine has switched to 'python' because numexpr does not support "
    348         "extension array dtypes. Please set your engine to python manually.",
    349         RuntimeWarning,
    350         stacklevel=find_stack_level(),
    351     )

File ~/.virtualenvs/legend/lib/python3.11/site-packages/pandas/core/computation/expr.py:809, in Expr.__init__(self, expr, engine, parser, env, level)
    807 self.parser = parser
    808 self._visitor = PARSERS[parser](self.env, self.engine, self.parser)
--> 809 self.terms = self.parse()

File ~/.virtualenvs/legend/lib/python3.11/site-packages/pandas/core/computation/expr.py:828, in Expr.parse(self)
    824 def parse(self):
    825     """
    826     Parse an expression.
    827     """
--> 828     return self._visitor.visit(self.expr)

File ~/.virtualenvs/legend/lib/python3.11/site-packages/pandas/core/computation/expr.py:415, in BaseExprVisitor.visit(self, node, **kwargs)
    413 method = f"visit_{type(node).__name__}"
    414 visitor = getattr(self, method)
--> 415 return visitor(node, **kwargs)

File ~/.virtualenvs/legend/lib/python3.11/site-packages/pandas/core/computation/expr.py:421, in BaseExprVisitor.visit_Module(self, node, **kwargs)
    419     raise SyntaxError("only a single expression is allowed")
    420 expr = node.body[0]
--> 421 return self.visit(expr, **kwargs)

File ~/.virtualenvs/legend/lib/python3.11/site-packages/pandas/core/computation/expr.py:415, in BaseExprVisitor.visit(self, node, **kwargs)
    413 method = f"visit_{type(node).__name__}"
    414 visitor = getattr(self, method)
--> 415 return visitor(node, **kwargs)

File ~/.virtualenvs/legend/lib/python3.11/site-packages/pandas/core/computation/expr.py:424, in BaseExprVisitor.visit_Expr(self, node, **kwargs)
    423 def visit_Expr(self, node, **kwargs):
--> 424     return self.visit(node.value, **kwargs)

File ~/.virtualenvs/legend/lib/python3.11/site-packages/pandas/core/computation/expr.py:415, in BaseExprVisitor.visit(self, node, **kwargs)
    413 method = f"visit_{type(node).__name__}"
    414 visitor = getattr(self, method)
--> 415 return visitor(node, **kwargs)

File ~/.virtualenvs/legend/lib/python3.11/site-packages/pandas/core/computation/expr.py:537, in BaseExprVisitor.visit_BinOp(self, node, **kwargs)
    535 op, op_class, left, right = self._maybe_transform_eq_ne(node)
    536 left, right = self._maybe_downcast_constants(left, right)
--> 537 return self._maybe_evaluate_binop(op, op_class, left, right)

File ~/.virtualenvs/legend/lib/python3.11/site-packages/pandas/core/computation/expr.py:507, in BaseExprVisitor._maybe_evaluate_binop(self, op, op_class, lhs, rhs, eval_in_python, maybe_eval_in_python)
    504 res = op(lhs, rhs)
    506 if res.has_invalid_return_type:
--> 507     raise TypeError(
    508         f"unsupported operand type(s) for {res.op}: "
    509         f"'{lhs.type}' and '{rhs.type}'"
    510     )
    512 if self.engine != "pytables" and (
    513     res.op in CMP_OPS_SYMS
    514     and getattr(lhs, "is_datetime", False)
   (...)
    517     # all date ops must be done in python bc numexpr doesn't work
    518     # well with NaT
    519     return self._maybe_eval(res, self.binary_ops)

TypeError: unsupported operand type(s) for *: 'awkward' and '<class 'int'>'

@gipert
Copy link
Author

gipert commented Jan 2, 2024

Just for the context: I'm writing some code to evaluate algebraic expressions from a config file on tables made by jagged and rectangular columns. I was hoping to be able to write almost no code by using pandas.eval()...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants