In [3]:
import pandas as pd
import numpy as np

## Motivating query() and eval(): Compound Expressions

Universal funcs in numpy are much faster doing than python loop or comprehension.

In [7]:
rng = np.random.RandomState(42)
x = rng.rand(int(1E6))
y = rng.rand(int(1E6))
%timeit x + y

1.76 ms ± 56.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


But ufuncs still slows in computing compound expressions like follow:

In [8]:
mask = (x > 0.5) & (y < 0.5)

Because NumPy evaluates each subexpression, this is roughly equivalent to the
following:

In [9]:
tmp1 = (x > 0.5)
tmp2 = (y < 0.5)
mask = tmp1 & tmp2

In other words, every intermediate step is explicitly allocated in memory. If the x and y arrays are very large, this can lead to significant memory and computational overhead.

- Numexpr library gives you the ability to compute this type of compound expression element by element, without the need to allocate full intermediate arrays.

In [13]:
import numexpr
mask_numexpr = numexpr.evaluate('(x > 0.5) & (y < 0.5)')
np.allclose(mask, mask_numexpr)

True

- The benefit here is that Numexpr evaluates the expression in a way that does not use
full-sized temporary arrays, and thus can be much more efficient than NumPy, espe‐
cially for large arrays.

## pandas.eval() for Efficient Operations

- The eval() function in Pandas uses string expressions to efficiently compute operations using DataFrames.

In [14]:
rng = np.random.RandomState(42)
nrows, ncols = 10000, 100
df1, df2, df3, df4 = (pd.DataFrame(rng.rand(nrows, ncols))for i in range(4))

In [15]:
%timeit df1 + df2 + df3 + df4

8.93 ms ± 289 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [16]:
%timeit pd.eval('df1 + df2 + df3 + df4')

6.22 ms ± 190 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


- same result and nearly 50% faster with eval

In [18]:
np.allclose(df1 + df2 + df3 + df4, pd.eval('df1 + df2 + df3 + df4'))

True

### Operations supported by pd.eval()

In [22]:
df1, df2, df3, df4, df5 = (pd.DataFrame(np.random.randint(0, 1000, (100, 3))) for i in range(5))

- Arithmetic operators. pd.eval() supports all arithmetic operators.

In [24]:
result1 = -df1 * df2 / (df3 + df4 ) + df5
result2 = pd.eval('-df1 * df2 / (df3 + df4) + df5')
np.allclose(result1, result2)

True

- Comparison operators-pd.eval() supports all comparison operators, including chained expressions:


In [30]:
result1= (df1 < df2) & (df2 <= df3) & (df3 != df4)
result = pd.eval("df1 < df2 <= df3 != df4")
np.allclose(result1, result2)

True

- Bitwise operators. pd.eval() supports the & and | bitwise operators:

In [31]:
result1 = (df1 < 0.5) & (df2 < 0.5) | (df3 < df4)
result2 = pd.eval('(df1 < 0.5) & (df2 < 0.5) | (df3 < df4)')
np.allclose(result1, result2)

True

In addition, it supports the use of the literal and and or in Boolean expressions:

In [32]:
result3 = pd.eval('(df1 < 0.5) and (df2 < 0.5) or (df3 < df4)')
np.allclose(result1, result3)

True

- Object attributes and indices. pd.eval() supports access to object attributes via the obj.attr syntax, and indexes via the obj[index] syntax:

In [38]:
result1 = df2.T[0] + df3.iloc[1]
result2 = pd.eval('df2.T[0] + df3.iloc[1]')
np.allclose(result1, result2)

True

## DataFrame.eval() for Column-Wise Operations