### Estimating memory footprint for Random Forest Models

We breakdown memory usage, $M$, of the process running a Random Forest model in this manner $M=B+T$, where $B$ is baseline memory overhead for any process running in a Jupyter Notebook and $T$ is the memory consumption by the Random Forest tree structure.  For this analysis we have $T$ representing the memory usage for a Random Forest with 100 trees with an average depth of 21 branches.

By using the `top` command to observe running processes, we find that $M$ for a 100-tree model is about $0.5$GB and $1.5$GB for a 500-tree model.  This leads to the following system of equations.

\begin{align}
B + T &= 0.5 \\
B + 5T &= 1.5
\end{align}

This system of two equations with two unknowns leads to the following:
\begin{align}
4T &= 1 \\
T &= \frac{1}{4} = 0.25
\end{align}

This indicates a 100-tree forest with an depth of 21 takes about $0.25$ GB of memory and Baseline memory $B$ is 

\begin{equation}
B = 0.5 - T = 0.5 - 0.25 = 0.25 \text{GB}
\end{equation}

From this we can estimate memory footprint for various Random Forest models structures.

In [12]:
import pandas as pd
pd.DataFrame([{'trees': x*100, 'memory_size_gb': 0.25 + 0.25*x} for x in range(1,11)])

Unnamed: 0,trees,memory_size_gb
0,100,0.5
1,200,0.75
2,300,1.0
3,400,1.25
4,500,1.5
5,600,1.75
6,700,2.0
7,800,2.25
8,900,2.5
9,1000,2.75
