In [3]:
%load_ext nb_black

<IPython.core.display.Javascript object>

## Tools to Speed Up Code

This section covers some tools to speed up your code.

### Fastai's df_shrink: Shrink DataFrame's Memory Usage in One Line of Code

Changing data types of DataFrame columns to smaller data types can significantly reduce the memory usage of the DataFrame. Instead of manually choosing smaller data types, is there a way that you can automatically change data types in one line of code?

That is when the `df_shrink` method of Fastai comes in handy. In the code below, the memory usage of the DataFrame decreases from 200 bytes to 146 bytes.

In [4]:
from fastai.tabular.core import df_shrink
import pandas as pd

df = pd.DataFrame({"col1": [1, 2, 3], "col2": [1.0, 2.0, 3.0]})
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   col1    3 non-null      int64  
 1   col2    3 non-null      float64
dtypes: float64(1), int64(1)
memory usage: 176.0 bytes
None


<IPython.core.display.Javascript object>

In [5]:
new_df = df_shrink(df)
print(new_df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   col1    3 non-null      int8   
 1   col2    3 non-null      float32
dtypes: float32(1), int8(1)
memory usage: 143.0 bytes
None


<IPython.core.display.Javascript object>

[Link to Fastai](https://docs.fast.ai/).

### Swifter: Add One Word to Make Your Pandas Apply 23 Times Faster

If you want to have faster pandas apply when working with large data, try swifter. To use swifter, simply add `.swifter` before `.apply`. Everything else is the same.

In the code below, I compared the speed of Pandas' `apply` and the speed of swifter's `apply` using the California housing dataset of 20640 rows.

In [7]:
from time import time
from sklearn.datasets import fetch_california_housing
from scipy.special import boxcox1p
import swifter
import timeit

X, y = fetch_california_housing(return_X_y=True, as_frame=True)


def pandas_apply():
    X["AveRooms"].apply(lambda x: boxcox1p(x, 0.25))


def swifter_apply():
    X["AveRooms"].swifter.apply(lambda x: boxcox1p(x, 0.25))


num_experiments = 100
pandas_time = timeit.timeit(pandas_apply, number=num_experiments)
swifter_time = timeit.timeit(swifter_apply, number=num_experiments)

pandas_vs_swifter = round(pandas_time / swifter_time, 2)
print(f"Swifter apply is {pandas_vs_swifter} times faster than Pandas apply")

Swifter apply is 16.82 times faster than Pandas apply


<IPython.core.display.Javascript object>

Using swifter apply is 23.56 times faster than Pandas apply! This ratio is calculated by taking the average run time of each method after 100 experiments.