## Numerical dtypes and precision to improve performance

By the end of this lecture you will be able to:
- get the upper and lower bounds you can represent at a given precision
- estimate the size of a `DataFrame` in memory
- compare the effect of working with 32-bit and 64-bit representations

In this lecture we examine the affect of varying the numerical precision on computational speed, memory usage and precision. In some use cases this can be a simple way of improving performance and reducing memory usage.

In [1]:
import polars as pl
import numpy as np

We create a simple `DataFrame` to see the default dtypes for integers and floats

In [2]:
df = pl.DataFrame(
    {
        "ints":[0,1,2],
        "floats":[0.0,1,2]
    }
)
df

ints,floats
i64,f64
0,0.0
1,1.0
2,2.0


Polars defaults to 64-bit representations for both integers and floats. In this notebook we examine the affect of varying the numerical precision.

## Integers

Polars has the following integer types:
| dtype | Precision (bits) | Signed |
|-----------|------------------|--------|
| Int8      | 8                | Yes    |
| Int16     | 16               | Yes    |
| Int32     | 32               | Yes    |
| Int64     | 64               | Yes    |
| UInt8     | 8                | No     |
| UInt16    | 16               | No     |
| UInt32    | 32               | No     |
| UInt64    | 64               | No     |


The unsigned integers are `0` and positive values only. Polars uses them for things like row indexes.

Polars generates an `Exception` if we try to cast a negative integer to an unsigned integer dtype.

## Constraints of lower precision
With a lower precision the range of values we can represent is smaller.

The `upper_bound` and `lower_bound` expressions show the maximum and minimum values that can be represented at a given precision.

In [3]:
pl.Config.set_fmt_str_lengths(100)
df_ints = pl.DataFrame({"ints": [1, 2, 3]})
(
    df_ints
    .select(
        [
            pl.col("ints").upper_bound().alias("pl.Int64_upper"),
            pl.col("ints").cast(pl.Int32).upper_bound().alias("pl.Int32_upper"),
            pl.col("ints").cast(pl.Int16).upper_bound().alias("pl.Int16_upper"),
            pl.col("ints").cast(pl.Int8).upper_bound().alias("pl.Int8_upper"),
            
            pl.col("ints").lower_bound().alias("pl.Int64_lower"),
            pl.col("ints").cast(pl.Int32).lower_bound().alias("pl.Int32_lower"),
            pl.col("ints").cast(pl.Int16).lower_bound().alias("pl.Int16_lower"),
            pl.col("ints").cast(pl.Int8).lower_bound().alias("pl.Int8_lower"),
        ]
    )
    .melt()
    .sort("variable")
)

variable,value
str,i64
"""pl.Int16_lower""",-32768
"""pl.Int16_upper""",32767
"""pl.Int32_lower""",-2147483648
"""pl.Int32_upper""",2147483647
"""pl.Int64_lower""",-9223372036854775808
"""pl.Int64_upper""",9223372036854775807
"""pl.Int8_lower""",-128
"""pl.Int8_upper""",127


If we try to cast a value outside of the valid range Polars raises an `Exception` - uncomment the following code to test this

In [None]:
# (
#     pl.DataFrame(
#         {'values':[126,127,128]}
#     )
#     .with_columns(
#         pl.col("values").cast(pl.Int8).alias("values_Int8")
#     )
# )

## Floats
Polars has the following floating point types:

`Float32`: 32-bit floating point

`Float64`: 64-bit floating point

`Decimal`: 128-bit floating point

The `Decimal` dtype is still experimental and may not work with all expressions or methods. To use `Decimal` you must first activate it

In [5]:
pl.Config.activate_decimals()
df_floats = (
    pl.DataFrame(
        {
            "floats_64":[0.0,1,2.578]
        }
    )
    .with_columns(
        floats_32 = pl.col("floats_64").cast(pl.Float32),
        decimal = pl.col("floats_64").cast(pl.Decimal),
    )
)
df_floats

floats_64,floats_32,decimal
f64,f32,"decimal[38,0]"
0.0,0.0,0
1.0,1.0,1
2.578,2.578,2


## A dtype diet
Polars creates integer and float columns as 64-bit by default. Polars can check if the actual data in a column can fit in a lower precision dtype and cast the column to that dtype with the `shrink_dtype` expression.

Here we create a `DataFrame` with columns that could potentially be cast to a lower-precision dtype and then call `shrink_dtype`

In [6]:
(
    pl.DataFrame(
         {
             "a": [1, 2, 3],
             "b": [1, 2, 2**31],
             "c": [-1, 2, 2**30],
             "d": [-112, 2, 112],
             "e": [-112, 2, 129],
             "f": [0.1, 1.32, 0.12],
         }
     )
    .select(
        pl.all().shrink_dtype()
    )
)

a,b,c,d,e,f
i8,i64,i32,i8,i16,f32
1,1,-1,-112,-112,0.1
2,2,2,2,2,1.32
3,2147483648,1073741824,112,129,0.12


We see that:
- the small numbers in `a` can go to 8-bits
- the last value in `b` means is must stay 64-bit
- the last value in `c` is within range for 32-bit
- the positive and negative values in `d` are in range for 8-bits
- the last value in `e` is too large for 8-bit so is 16-bit
- the values in `f` can be cast to 32-bit

Note that floats are always cast to 32-bits by this expression

## Effects of moving to lower precision

### Size in memory
We get the estimated size in bytes of a `DataFrame` with `estimated_size`. We can pass the `unit` argument to change from e.g. bytes to kilobytes

In [7]:
df = pl.DataFrame(
    {
        "ints":[0,1,2],
        "floats":[0.0,1,2]
    }
)
df.estimated_size(unit="b")

48

We compare this size with a `DataFrame` where both columns are cast to 32-bit representations

In [8]:
(
    df
    .with_columns(
        [
            pl.col("ints").cast(pl.Int32),
            pl.col("floats").cast(pl.Float32),
        ]
    )
    .estimated_size(unit="b")
)

24

Memory usage is halved by moving from 64-bit to 32-bit representations and in a similar ratio when we move to 8 or 16-bit integer representations.

### Computational speed
The effect of lower precision on computational speed is not as simple.

We explore the effect of reduced precision by creating a larger `DataFrame` of random values

In [9]:
N_rows = 1_000_000
N_columns = 10
df_num = pl.DataFrame(np.random.standard_normal((N_rows,N_columns)))
df_num.head(2)

column_0,column_1,column_2,column_3,column_4,column_5,column_6,column_7,column_8,column_9
f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
-0.308592,0.269884,-0.231066,0.65407,-1.052772,-1.28532,-1.770431,-1.610772,0.347842,1.39213
-0.741942,-0.507648,2.177605,0.884115,-0.245917,0.473476,-1.178441,-0.073685,-0.353683,-0.516298


These columns all have dtype `pl.Float64`

For comparison we create a new `DataFrame` where we cast all values to 32-bit

In [10]:
df_num_32 = (
        df_num
        .select(
            pl.all().cast(pl.Float32)
        )
)
df_num_32.head(2)

column_0,column_1,column_2,column_3,column_4,column_5,column_6,column_7,column_8,column_9
f32,f32,f32,f32,f32,f32,f32,f32,f32,f32
-0.308592,0.269884,-0.231066,0.65407,-1.052772,-1.28532,-1.770431,-1.610772,0.347842,1.39213
-0.741942,-0.507648,2.177605,0.884115,-0.245917,0.473476,-1.178441,-0.073685,-0.353683,-0.516298


### Memory usage at lower precision
The 32-bit `DataFrame` uses half as much memory

In [11]:
print(f"64-bit DataFrame: {round(df_num.estimated_size(unit='mb'))} Mb")
print(f"32-bit DataFrame: {round(df_num_32.estimated_size(unit='mb'))} Mb")

64-bit DataFrame: 76 Mb
32-bit DataFrame: 38 Mb


### Computational speed at lower precision

Some calculations are faster with 32-bit data. To time the computation we use the IPython `timeit` magic.

We start a cell we want to time with `%%timeit`. By default `timeit` does multiple runs of multiple loops where it runs the computation each time to produce an estimate of mean time taken and the standard deviation of time taken. 

In many cases, however, the number of runs and loops is more than we really need. We control the number of iterations by setting `n` for number of runs and `l` for the number of loops.

In [12]:
%%timeit -n1 -r3
(
    2 + 4
)

133 ns ± 47.1 ns per loop (mean ± std. dev. of 3 runs, 1 loop each)


In this example we compare performance of 64-bit and 32-bit data where we:
- subtract the mean of each column and 
- divide by the standard deviation

In [13]:
%%timeit -n1
(
    df_num
    .select( 
        (pl.all()-pl.all().mean())/(pl.all().std())
    )
)

48.3 ms ± 7.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [14]:
%%timeit -n1 
(
    df_num_32
    .select( 
        (pl.all()-pl.all().mean())/(pl.all().std())
    )
)

28.7 ms ± 3.07 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


On my machine the 32-bit operation takes about 35% less time than 64-bit. It is not always the case, however, that operations at 32-bit are much faster, the difference depends on the transformations applied. Try it on your own data and transformations.

### Difference in outputs?
We can check the size of the differences between the outputs

In [15]:
output64 = (
    df_num
    .select( 
        (pl.all()-pl.all().mean())/(pl.all().std())
    )
)
output32 = (
    df_num_32
    .select( 
        (pl.all()-pl.all().mean())/(pl.all().std())
    )
)

We can see the size of the differences in the first two rows

In [16]:
(output64 - output32).head(2)

column_0,column_1,column_2,column_3,column_4,column_5,column_6,column_7,column_8,column_9
f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
-2.1903e-08,4.4616e-09,8.9332e-09,8.8894e-08,2.7257e-10,4.0757e-08,-7.2966e-08,5.4628e-09,3.456e-09,-1.4043e-07
-2.4988e-08,4.2169e-08,1.2214e-09,7.145e-08,5.164e-09,-5.7038e-09,-2.5651e-08,-1.2623e-09,2.8966e-08,2.4521e-08


The overall maximum difference in this case is order `10^-5` or smaller

In [17]:
(output64 - output32).max_horizontal().max()

8.144334255888452e-07

Before moving to a lower precision always **check that the size of the difference between outputs is negligible** for your analysis!

Moving to a lower precision than 32-bit does not always lead to faster performance. Many CPUs do not have native support for 8-bit and 16-bit operations and so they emulate it with 32-bit operations and so lose the performance gains. See the exercises for an example of lowering precision below 32-bits for integers.

## Exercises

In the exercises you will develop your understanding of:
- getting the upper and lower bounds for a dtype
- getting the estimated size of a `DataFrame`
- comparing performance between different precisions 

### Exercise 1
We create a `DataFrame` with 10 columns of random integers between 1 and 10

In [18]:
N_rows = 1_000_000
N_columns = 10
df_ints_64 = pl.DataFrame(np.random.randint(1,10,(N_rows,N_columns)))
df_ints_64.head(2)

column_0,column_1,column_2,column_3,column_4,column_5,column_6,column_7,column_8,column_9
i32,i32,i32,i32,i32,i32,i32,i32,i32,i32
4,3,7,2,3,4,7,8,1,3
4,2,2,3,6,4,8,5,1,3


Create a `DataFrame` called `df_ints_8` where all the values in `df_ints_64` are cast to `pl.Int8'

In [None]:
df_ints_8 = (
    <blank>
)


Compare the size of these `DataFrames` in memory in Mb

In [None]:
print(f"64-bit DataFrame: {} Mb")
print(f"8-bit DataFrame: {} Mb")

Compare how long it takes to do a cumulative sum on all the columns of the `DataFrames`

In [None]:
%%timeit -n1
(
    df_ints_64
)

In [None]:
%%timeit -n1
(
    df_ints_8
)

Compare how long it takes at 16- and 32-bit precision.

Which precision is fastest?

## Solutions

### Solution to exercise 1
We create a `DataFrame` with 10 columns of random integers between 1 and 10

In [None]:
N_rows = 1_000_000
N_columns = 10
df_ints_64 = pl.DataFrame(np.random.randint(1,10,(N_rows,N_columns)))
df_ints_64.head(2)

Create a `DataFrame` called `df_ints_8` where all the values in `df_ints` are cast to `pl.Int8'

In [None]:
df_ints_8 = (
    df_ints_64
    .select(
        pl.all().cast(pl.Int8)
    )
)


Compare the size of these `DataFrames` in memory in Mb

In [None]:
print(f"64-bit DataFrame: {round(df_ints_64.estimated_size(unit='mb'))} Mb")
print(f"8-bit DataFrame: {round(df_ints_8.estimated_size(unit='mb'))} Mb")

Compare how long it takes to do a cumulative sum on all the columns of the `DataFrames`

In [None]:
%%timeit -n1
(
    df_ints_64
    .select( 
        pl.all().cum_sum()
    )
)

In [None]:
%%timeit -n1
(
    df_ints_8
    .select( 
        pl.all().cum_sum()
    )
)

Compare how long it takes at 16- and 32-bit precision.

Which precision is fastest?

In [None]:
df_ints_16 = (
    df_ints_64
    .select(
        pl.all().cast(pl.Int16)
    )
)
df_ints32 = (
    df_ints_64
    .select(
        pl.all().cast(pl.Int32)
    )
)


In [None]:
%%timeit -n1
(
    df_ints_16
    .select( 
        pl.all().cum_sum()
    )
)

In [None]:
%%timeit -n1
(
    df_ints32
    .select( 
        pl.all().cum_sum()
    )
)

Many CPUs do not have native support for 8-bit and 16-bit calculations and so calculations at these precisions may not be faster than at 32-bit.