Eval function is use to evaluate python expression which are written as strings

In [None]:
eval("5*2*3/393")

#eval function mainly used to evaluate expressions from a string-based input.
# It can be used to execute Python expressions dynamically.

0.07633587786259542

In [1]:
# 1. Import required libraries
import pandas as pd
import numpy as np

# 2. Set number of rows and columns
nrows, ncols = 100000, 100  # 100,000 rows and 100 columns

# 3. Create a random number generator for reproducibility
rng = np.random.RandomState(42)

# 4. Create 4 DataFrames filled with random values (same shape)
df1, df2, df3, df4 = (pd.DataFrame(rng.rand(nrows, ncols)) for i in range(4))

# ----------------------------- #
#        NORMAL ADDITION        #
# ----------------------------- #

# 5. Add all 4 DataFrames using regular '+' operator
# This approach is easy to understand but slower and uses more memory
# %timeit helps to measure execution time of this operation
%timeit df1 + df2 + df3 + df4
# Example output: 10 loops, best of 3: 87.1 ms per loop

# ----------------------------- #
#         EVAL ADDITION         #
# ----------------------------- #

# 6. Add all 4 DataFrames using pd.eval() with string expression
# This method is faster and uses less memory because it avoids creating temporary intermediate DataFrames
%timeit pd.eval('df1 + df2 + df3 + df4')
# Example output: 10 loops, best of 3: 42.2 ms per loop

# ----------------------------- #
#      VALIDATION CHECK         #
# ----------------------------- #

# 7. Check if both methods give same result (should return True)
# np.allclose() checks if the two DataFrames have nearly the same values
print(np.allclose(df1 + df2 + df3 + df4, pd.eval('df1 + df2 + df3 + df4')))
# Output: True


89.2 ms ± 3.28 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
84.7 ms ± 774 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
True




### 🧠 What is `pd.eval()` doing?

* **`pd.eval()`** evaluates a string expression involving pandas objects.
* It is **faster** than regular operations in some cases because:

  * It **avoids temporary objects** in memory.
  * It uses **NumExpr** (a fast numerical expression evaluator under the hood).

---

### ✅ Use Cases for `pd.eval()`

1. **Big DataFrames**: When you’re working with large datasets and want to **speed up** arithmetic or logical operations.
2. **Memory Efficiency**: Useful if you want to **save RAM** during computations.
3. **Cleaner Code**: You can perform multiple operations in one line using a string.

---

### ⚠️ Note

* Use `pd.eval()` mostly for **arithmetic or logical** operations.
* Avoid for complex operations that require Python functions or custom logic.



In [2]:
import pandas as pd
import numpy as np

# 1. Create a random number generator
rng = np.random.RandomState(42)

# 2. Create 5 DataFrames with random integers between 0 and 1000
df1, df2, df3, df4, df5 = (pd.DataFrame(rng.randint(0, 1000, (100, 3))) for _ in range(5))

# ------------------------ Arithmetic Operators ------------------------

# 3. Normal way to do arithmetic operations on DataFrames
result_arith_normal = -df1 * df2 / (df3 + df4) - df5

# 4. Using pd.eval() to do the same operation as a string (faster and uses less memory)
result_arith_eval = pd.eval('-df1 * df2 / (df3 + df4) - df5')

# 5. Check if results are the same
print("Arithmetic check:", np.allclose(result_arith_normal, result_arith_eval))  # True

# ------------------------ Comparison Operators ------------------------

# 6. Regular way to compare values in DataFrames
result_comp_normal = (df1 < df2) & (df2 <= df3) & (df3 != df4)

# 7. Using pd.eval() with chained comparisons
result_comp_eval = pd.eval('df1 < df2 <= df3 != df4')

# 8. Check if results are the same
print("Comparison check:", np.allclose(result_comp_normal, result_comp_eval))  # True

# ------------------------ Bitwise Operators ------------------------

# 9. Regular bitwise logic
result_bitwise_normal = (df1 < 0.5) & (df2 < 0.5) | (df3 < df4)

# 10. Same expression using pd.eval()
result_bitwise_eval = pd.eval('(df1 < 0.5) & (df2 < 0.5) | (df3 < df4)')

# 11. Check if results are the same
print("Bitwise check:", np.allclose(result_bitwise_normal, result_bitwise_eval))  # True

# ------------------------ Boolean keywords ------------------------

# 12. Using `and`/`or` inside eval string (not valid outside eval)
result_boolean_eval = pd.eval('(df1 < 0.5) and (df2 < 0.5) or (df3 < df4)')

# 13. Check if it matches previous result
print("Boolean check:", np.allclose(result_bitwise_normal, result_boolean_eval))  # True

# ------------------------ Attribute and Index Access ------------------------

# 14. Access attributes and index values normally
result_index_normal = df2.T[0] + df3.iloc[1]

# 15. Same using pd.eval
result_index_eval = pd.eval('df2.T[0] + df3.iloc[1]')

# 16. Check if results are the same
print("Attribute/Index access check:", np.allclose(result_index_normal, result_index_eval))  # True

# ------------------------ Unsupported Operations ------------------------

# 17. Examples of unsupported operations (will raise errors if you uncomment)
# result_error = pd.eval('np.sin(df1)')  # ❌ Function calls not allowed
# result_error = pd.eval('for i in range(3): df1 + i')  # ❌ Loops not allowed
# result_error = pd.eval('if df1[0][0] > 10: df1')  # ❌ Conditionals not allowed

print("\nAll pd.eval() tests passed successfully!")


Arithmetic check: True
Comparison check: True
Bitwise check: True
Boolean check: True
Attribute/Index access check: True

All pd.eval() tests passed successfully!


What This Shows:

✅ pd.eval() can safely and efficiently handle arithmetic, comparisons, bitwise logic, boolean keywords, and attribute/index access.

🚫 Cannot be used with function calls, loops, or conditional statements.

🛠 Useful for speeding up complex expressions involving large DataFrames.

In [3]:
import pandas as pd
import numpy as np

# 1. Create a random number generator with a fixed seed for reproducibility
rng = np.random.RandomState(42)

# 2. Create a DataFrame with 1000 rows and 3 columns named 'A', 'B', and 'C'
df = pd.DataFrame(rng.rand(1000, 3), columns=['A', 'B', 'C'])

# 3. Traditional way: Calculate a new result using values from columns A, B, and C
#    Formula: (A + B) / (C - 1)
result1 = (df['A'] + df['B']) / (df['C'] - 1)

# 4. Same expression using pd.eval() — access columns using dot notation (df.A)
#    This is slightly more efficient but less readable than df.eval
result2 = pd.eval("(df.A + df.B) / (df.C - 1)")

# 5. Even cleaner and shorter: Use DataFrame.eval() — directly use column names as variables
#    This avoids writing df['A'] or df.A repeatedly
result3 = df.eval("(A + B) / (C - 1)")

# 6. Confirm all three approaches give the same result using np.allclose() — checks if values are almost equal
print("Column-wise expression check:", np.allclose(result1, result2) and np.allclose(result1, result3))  # True

# 7. Now let's assign a new column 'D' in the DataFrame using df.eval()
#    This adds a new column: D = (A + B) / C
#    Set inplace=True to apply the result directly to the original DataFrame
df.eval("D = (A + B) / C", inplace=True)
print("After creating column D:\n", df.head(1))  # Check first row to see new column D

# 8. Modify the existing column 'D' using a new formula
#    Here, we overwrite the previous 'D' with: D = (A - B) / C
df.eval("D = (A - B) / C", inplace=True)
print("After modifying column D:\n", df.head(1))  # Check updated values in column D

# 9. Now define a local Python variable (outside the DataFrame)
#    This is the row-wise mean of all columns for each row
column_mean = df.mean(axis=1)  # axis=1 means row-wise mean

# 10. Add this local variable to column A using traditional syntax
result4 = df['A'] + column_mean

# 11. Use df.eval() to do the same operation — but now refer to the Python variable using '@'
#     The '@' symbol tells Pandas to look for a Python variable, not a column
result5 = df.eval("A + @column_mean")

# 12. Confirm that both methods produce the same result
print("Local variable with @ check:", np.allclose(result4, result5))  # True


Column-wise expression check: True
After creating column D:
          A         B         C         D
0  0.37454  0.950714  0.731994  1.810472
After modifying column D:
          A         B         C        D
0  0.37454  0.950714  0.731994 -0.78713
Local variable with @ check: True
