### High-Performance Pandas: eval() and query()

•  Pandas includes some experimental tools that allow you to directly access C-speed operations without costly allocation of intermediate arrays.

•  These are the eval() and query() functions, which rely on the Numexpr package.

### pandas.eval() for Efficient Operations

• The eval() function in Pandas uses string expressions to efficiently compute operations using DataFrames.

• For example, consider the following DataFrames:

In [1]:
import numpy as np
import pandas as pd

In [5]:
nrows, ncols = 100000, 100
rng = np.random.RandomState(42)
df1, df2, df3, df4 = (pd.DataFrame(rng.rand(nrows, ncols)) for i in range(4))

• To compute the sum of all four DataFrame s using the typical Pandas approach, we can just write the sum:

In [7]:
%timeit df1 + df2 + df3 + df4

101 ms ± 4.72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


• We can compute the same result via pd.eval by constructing the expression as a string:

• The eval() version of this expression is about 50% faster (and uses much less memory), while giving the same result

In [8]:
%timeit pd.eval('df1 + df2 + df3 + df4')

53.1 ms ± 717 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
