# Memory profiling: Identify RAM bottlenecks

Harder to do than CPU usage because of the garbage collection, e.g. some variables might not be collected yet until some random point. Better to look at the overall gross trend than line by line. 

2 scenarios:

1. You use too much memory, and you need to re-write your code more smartly, e.g. don't store unused in-between variables. 
2. You use very little memory and can speed up your code by cashing.


## Example of profiling prepare_dataset.py

In [4]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [5]:
from pathlib import Path
path = Path("../../../anomaly_detection/scripts/config/config.json")
absolute_path_config = path.absolute()

In [7]:
!python -m memory_profiler ../../../anomaly_detection/scripts/prepare_dataset.py --config {absolute_path_config}

2024-04-04 14:09:22,576 [INFO] __main__ pid=1089: Load labels from: 
	s3://sl-datascience-development/research/auto_diagnosis/failure_mode_labels/20240404.csv
2024-04-04 14:09:27,505 [INFO] __main__ pid=1089: Load scorers from: s3://sl-rds-export-production/pivoted_scores/sl-sensor-prod-db-kms-slsensor-export-2024-04-01-06-00-30
2024-04-04 14:17:53,143 [INFO] __main__ pid=1089: Build dataset
Filename: ../../../anomaly_detection/scripts/prepare_dataset.py

Line #    Mem usage    Increment  Occurrences   Line Contents
    44    286.1 MiB    286.1 MiB           1   @mem_profile
    45                                         def main(config):
    46                                         
    47                                         
    48    286.1 MiB      0.0 MiB           1       scorers_path = config["scorers_path"]
    49    286.1 MiB      0.0 MiB           1       failure_mode_labels_path = config["failure_mode_labels_path"]
    50                                         
    51 