Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory profiling #152

Closed
TomTranter opened this issue Apr 21, 2022 · 41 comments
Closed

Memory profiling #152

TomTranter opened this issue Apr 21, 2022 · 41 comments

Comments

@TomTranter
Copy link
Collaborator

@wigging could give this a try https://github.com/bloomberg/memray

@wigging
Copy link
Collaborator

wigging commented Apr 21, 2022

Already working on it. I ran memory-profiler and need to collect the results. I'll try memray too. Any particular pack configurations or parameters you want me to run?

@TomTranter
Copy link
Collaborator Author

Not really, just see how it scales with number of processors

@wigging
Copy link
Collaborator

wigging commented Apr 27, 2022

Below are some memory profiling results. I used the Memray and memory_profiler tools to log the memory usage during various pack simulations. Information about the system I ran the tests on is here:

OS: Ubuntu 20.04.4 LTS x86_64 
CPU: Intel Xeon E5-2680 v3 (20) @ 2.499GHz 
GPU: NVIDIA Tesla K40m 
Memory: 582MiB / 58269MiB 

Small pack tests, 16p2s, 1 cycle

Memory tests for the pack_16p2s.py example using the Casadi and Ray solvers with 2, 5, 10, and 20 CPU cores and 58 GB of RAM. Experiment is a single charge/discharge cycle with a period of 10 seconds. Results are from the Memray profiler.

Run Cores Solver Total memory
1 2 casadi 309 MB
2 5 casadi 278 MB
3 10 casadi 247 MB
4 20 casadi 248 MB
5 2 ray 16.9 GB
6 5 ray 17.0 GB
7 10 ray 16.9 GB
8 20 ray 16.9 GB

Memory tests for the pack_16p2s.py example using the Casadi and Ray solvers with 20 CPU cores and 58 GB of RAM. Experiment is a single charge/discharge cycle with a period of 10 seconds. Results are from the Memray profiler.

Run Cores Description Total memory
1 20 casadi 280 MB
2 20 casadi 312 MB
3 20 casadi 280 MB
4 20 ray 16.9 GB
5 20 ray 17.1 GB
6 20 ray 17.0 GB

Small pack tests, 16p2s, 10 cycles

Memory tests for the pack_16p2s.py example using the Casadi and Ray solvers with 20 CPU cores and 58 GB of RAM. Experiment is a single charge/discharge cycle with a period of 10 seconds. The experiment is repeated ten times (10 cycles). Results are from the Memray profiler.

Run Cores Description Total memory
1 20 casadi 316 MB
2 20 casadi 284 MB
3 20 casadi 316 MB
4 20 ray 16.9 GB
5 20 ray 17.0 GB
6 20 ray 16.9 GB

Small pack tests, 16p2s, 10 cycles, SEI degradation

Memory tests for the pack_16p2s_sei.py example using the Casadi and Ray solvers with 20 CPU cores and 58 GB of RAM. Experiment is a single charge/discharge cycle with a period of 10 seconds. The experiment is repeated ten times (10 cycles). SEI degradation is implemented for capacity loss. Results are from the Memray profiler.

Run Cores Description Total memory
1 20 casadi 329 MB
2 20 casadi 329 MB
3 20 casadi 330 MB
4 20 ray 17.0 GB
5 20 ray 16.9 GB
6 20 ray 17.0 GB

Medium pack tests, 32p20s, 1 cycle

Memory tests for the pack_32p20s.py example using the Casadi and Ray solvers with 2, 5, 10, 20 CPU cores and 58 GB of RAM. Experiment is a single charge/discharge cycle with a period of 10 seconds. Results from the Memray profiler.

Run Cores Solver Total memory
2 5 casadi 353 MB
3 10 casadi 352 MB
4 20 casadi 320 MB
6 5 ray 16.9 GB
7 10 ray 16.9 GB
8 20 ray 17.0 GB

Results from the memory_profiler tool.

Run Cores Solver Total memory
2 5 casadi 372 MiB
3 10 casadi 371 MiB
4 20 casadi 371 MiB
6 5 ray 258 MiB
7 10 ray 258 MiB
8 20 ray 259 MiB

Megapack tests, 1 cycle

Memory tests for the megapack.py example using the Casadi and Ray solvers with 20 CPU cores and 58 GB of RAM. Experiment is a single charge/discharge cycle and period is set to 30 seconds. Results are from the Memray profiler. Memray crashed on the Casadi runs.

Run Cores Description Total memory
1 20 casadi x
2 20 casadi x
3 20 casadi x
4 20 ray 16.9 GB
5 20 ray 16.8 GB
6 20 ray 705 MB

@wigging
Copy link
Collaborator

wigging commented Apr 27, 2022

Memray does not give reliable results for the ray simulations. It seems to be stuck at 16-17 GB of memory usage. Not sure why. I checked htop during the simulations and it was inline with the memory_profiler results. So I should redo the Memray tests with the memory_profiler tool or figure out how to properly use Memray with ray.

@TomTranter
Copy link
Collaborator Author

Thanks @wigging, can you plot memory over time for a medium pack for 1000 cycles?

@wigging
Copy link
Collaborator

wigging commented Apr 27, 2022

Memray produces a huge log file and will probably use up all my storage space for 1000 cycles. Might have a similar problem with memory_profiler. If I can adjust the logging frequency of these tools then it will be feasible for a 1000 cycle simulation. Otherwise, I'll have to use something like psutil to log memory usage during a simulation.

@TomTranter
Copy link
Collaborator Author

Well probably we would see a trend with fewer cycles. Going from 1 to 10 increased by 20mb it would seem for the small pack. Would be interesting to see the increase on the medium pack

@wigging
Copy link
Collaborator

wigging commented Apr 29, 2022

Did more tests with the memory_profiler tool for the small pack. I also have some medium pack results which I'll share in the next comment.

Small pack tests, 16p2s, 1 cycle

Memory_profiler results for pack_16p2s.py. Experiment is a single charge/discharge cycle. Period is 10 seconds. Top plot is casadi and bottom plot is ray.

Run Cores Description Total memory
1 20 casadi 285 MiB
2 20 ray 245 MiB

Screen Shot 2022-04-29 at 11 19 51 AM

Screen Shot 2022-04-29 at 11 21 07 AM

Small pack tests, 16p2s, 10 cycles

Memory_profiler results for pack_16p2s.py. Experiment is a single charge/discharge for 10 cycles. Period is 10 seconds. Top plot is casadi and bottom plot is ray.

Run Cores Description Total memory
1 20 casadi 325 MiB
2 20 ray 253 MiB

Screen Shot 2022-04-29 at 11 27 57 AM

Screen Shot 2022-04-29 at 11 36 02 AM

@wigging
Copy link
Collaborator

wigging commented Apr 29, 2022

Here are the medium pack results.

Medium pack test, 32p20s, 100 cycles

Memory_profiler results for pack_32p20s.py at 20 second intervals. Experiment is a single charge/discharge for 100 cycles. Period is 30 seconds. Top plot is casadi and bottom plot is ray.

Run Cores Solver Total memory
1 20 casadi 872 MiB
2 20 ray 758 MiB

Screen Shot 2022-04-29 at 11 51 11 AM

Screen Shot 2022-04-29 at 11 52 27 AM

Medium pack test, 32p20s, 1000 cycles

Memory_profiler results for pack_32p20s.py at 20 second intervals. Experiment is a single charge/discharge for 1000 cycles. Period is 30 seconds. Top plot is casadi and bottom plot is ray.

Run Cores Solver Total memory
1 20 casadi 5.78 GB
2 20 ray 5.66 GB

Screen Shot 2022-04-29 at 11 56 24 AM

Screen Shot 2022-04-29 at 11 57 06 AM

@TomTranter
Copy link
Collaborator Author

Thanks @wigging does it break down what variables are contributing at all?

@wigging
Copy link
Collaborator

wigging commented Apr 30, 2022

The memory_profiler tool just gives total memory over time and it works with the Casadi and Ray liionpack solvers. Memray shows how the memory is allocated but I can't get it to work properly with the Ray solver. So my next test is to use Memray with the Casadi solver and see if I can tell what is increasing the memory.

@wigging
Copy link
Collaborator

wigging commented May 2, 2022

Here are more results using the memory_profiler tool. Compare these to the Memray results in the next comment.

Medium pack test, 32p20s, 10 cycles

Memory_profiler results for pack_32p20s.py at 10 second intervals. Experiment is a single charge/discharge for 10 cycles. Period is 30 seconds. Plot is casadi results.

Run Cores Solver Total memory
1 20 casadi 405 MiB

Screen Shot 2022-05-02 at 2 30 35 PM

@wigging
Copy link
Collaborator

wigging commented May 2, 2022

Here are results from the Memray profiler.

Medium pack test, 32p20s, 10 cycles

Memray results for pack_32p20s.py. Experiment is a single charge/discharge for 10 cycles. Period is 30 seconds. Memray ran out of storage space for experiments with more than 10 cycles.

Run Cores Solver Total memory
1 20 casadi 357 MB

Memray stats results.

📏 Total allocations:
        146343

📦 Total memory allocated:
        356.778MB

📊 Histogram of allocation size:
        min: 1.000B
        -------------------------------------------
        < 5.000B   :   49 ▇
        < 33.000B  : 1864 ▇▇▇▇▇▇
        < 191.000B : 2395 ▇▇▇▇▇▇▇
        < 1.076KB  : 8954 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
        < 6.206KB  : 6295 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
        < 35.766KB :  751 ▇▇▇
        < 206.112KB:   88 ▇
        < 1.160MB  :  398 ▇▇
        < 6.684MB  :    7 ▇
        <=38.521MB :    6 ▇
        -------------------------------------------
        max: 38.521MB

📂 Allocator type distribution:
         MALLOC: 19830
         REALLOC: 731
         MMAP: 177
         CALLOC: 70

🥇 Top 5 largest allocating locations (by size):
        - get:/home/cades/miniconda3/envs/lipack/lib/python3.9/site-packages/casadi/casadi.py:6354 -> 38.521MB
        - _horzcat:/home/cades/miniconda3/envs/lipack/lib/python3.9/site-packages/casadi/casadi.py:20190 -> 33.018MB
        - spsolve:/home/cades/miniconda3/envs/lipack/lib/python3.9/site-packages/scipy/sparse/linalg/_dsolve/linsolve.py:203 -> 32.000MB
        - dot:<__array_function__ internals>:180 -> 32.000MB
        - solve:/home/cades/liionpack/liionpack/solvers.py:185 -> 30.777MB

🥇 Top 5 largest allocating locations (by number of allocations):
        - __init__:/home/cades/miniconda3/envs/lipack/lib/python3.9/site-packages/pybamm/solvers/solution.py:93 -> 17822
        - integrator:/home/cades/miniconda3/envs/lipack/lib/python3.9/site-packages/casadi/casadi.py:14436 -> 12788
        - load_default_certs:/home/cades/miniconda3/envs/lipack/lib/python3.9/ssl.py:575 -> 10094
        - get:/home/cades/miniconda3/envs/lipack/lib/python3.9/site-packages/casadi/casadi.py:6354 -> 8962
        - _horzcat:/home/cades/miniconda3/envs/lipack/lib/python3.9/site-packages/casadi/casadi.py:20190 -> 7680

Memray tree results.

Allocation metadata
-------------------
Command line arguments: 'pack_32p20s.py'
Peak memory size: 356.778MB
Number of allocations: 528176676

Biggest 10 allocations:
-----------------------
📂 203.743MB (100.00 %) <ROOT>                                                                                                                             
└── [[3 frames hidden in 2 file(s)]]                                                                                                                       
    └── 📂 203.743MB (100.00 %) _run_code  /home/cades/miniconda3/envs/lipack/lib/python3.9/runpy.py:87                                                    
        ├── [[1 frames hidden in 1 file(s)]]                                                                                                               
        │   └── 📂 198.737MB (97.54 %) solve  /home/cades/liionpack/liionpack/solver_utils.py:425                                                          
        │       ├── [[2 frames hidden in 1 file(s)]]                                                                                                       
        │       │   └── 📂 77.018MB (37.80 %) step  /home/cades/liionpack/liionpack/solvers.py:81                                                          
        │       │       ├── [[2 frames hidden in 2 file(s)]]                                                                                               
        │       │       │   └── 📄 38.521MB (18.91 %) get  /home/cades/miniconda3/envs/lipack/lib/python3.9/site-packages/casadi/casadi.py:6354            
        │       │       ├── [[5 frames hidden in 3 file(s)]]                                                                                               
        │       │       │   └── 📄 33.018MB (16.21 %) _horzcat  /home/cades/miniconda3/envs/lipack/lib/python3.9/site-packages/casadi/casadi.py:20190      
        │       │       └── [[2 frames hidden in 2 file(s)]]                                                                                               
        │       │           └── 📄 5.480MB (2.69 %) call  /home/cades/miniconda3/envs/lipack/lib/python3.9/site-packages/casadi/casadi.py:12324            
        │       ├── [[2 frames hidden in 2 file(s)]]                                                                                                       
        │       │   └── 📄 32.000MB (15.71 %) spsolve                                                                                                      
        │       │       /home/cades/miniconda3/envs/lipack/lib/python3.9/site-packages/scipy/sparse/linalg/_dsolve/linsolve.py:203                         
        │       ├── [[1 frames hidden in 1 file(s)]]                                                                                                       
        │       │   └── 📂 38.424MB (18.86 %) setup_actors  /home/cades/liionpack/liionpack/solvers.py:453                                                 
        │       │       ├── [[9 frames hidden in 7 file(s)]]                                                                                               
        │       │       │   └── 📄 32.000MB (15.71 %) dot  <__array_function__ internals>:180                                                              
        │       │       └── [[2 frames hidden in 2 file(s)]]                                                                                               
        │       │           └── 📄 6.424MB (3.15 %) map  /home/cades/miniconda3/envs/lipack/lib/python3.9/site-packages/casadi/casadi.py:12565             
        │       ├── 📄 30.777MB (15.11 %) solve  /home/cades/liionpack/liionpack/solvers.py:185                                                            
        │       ├── 📄 10.259MB (5.04 %) solve  /home/cades/liionpack/liionpack/solvers.py:183                                                             
        │       └── 📄 10.259MB (5.04 %) solve  /home/cades/liionpack/liionpack/solvers.py:184                                                             
        └── [[57 frames hidden in 11 file(s)]]                                                                                                             
            └── 📄 5.006MB (2.46 %) _compile_bytecode  <frozen importlib._bootstrap_external>:647            

Here is the HTML file for the flame graph generated from Memray.
flamegraph-32p20s-10cyc-casadi.html.zip

@wigging
Copy link
Collaborator

wigging commented May 2, 2022

The flame graph represents allocations contributing to peak memory usage. According to the flame graph for the 32p20s 10 cycles test using the Casadi solver, most of the memory usage is associated with the following liionpack functions, methods, and properties:

  • self.setup_actors()
  • solve_circuit_vectorized()
  • self.step_actors()
  • self.output

@TomTranter
Copy link
Collaborator Author

Would it be easy to run the same script for twice as long and compare the output to see which objects grew? We could also just try throwing in a bunch of del statements to objects that could be the issue to see what works. I've never really understood how garbage collection works - especially for parallel processes but potentially as the processes are getting continuously used rather than destroyed that might be doing something gc is not expecting. Maybe @martinjrobins has some insights?

@wigging
Copy link
Collaborator

wigging commented May 3, 2022

If I run it longer, more than 10 cycles, I run out of storage space and Memray crashes. I could run 5 cycles and compare that to the memory usage for 10 cycles.

@wigging
Copy link
Collaborator

wigging commented May 18, 2022

The latest Memray update (v1.1.0) fixed the large bin file size. The table below compares the storage size of the bin file generated from Memray v1.0.3 (Run 1) and v1.1.0 (Run 2). The bin size from v1.1.0 is much smaller at 1.7 GB compared to the previous version which generated a 24 GB bin file.

Run Cores Solver Total memory Bin size
1 20 casadi 374 MB 24.0 GB
2 20 casadi 356 MB 1.7 GB

@TomTranter
Copy link
Collaborator Author

@wigging great so are you able to compare which objects grow most over different numbers of cycles. If you share the scripts you are running for memray I can help fix the issues

@wigging
Copy link
Collaborator

wigging commented May 20, 2022

Here are more results from Memray using their latest version which reduces the size of the log file (bin file). Results are based on the flamegraph generated for each pack simulation. More details are given in the flamegraph html files but the tables below summarize the objects allocating the most memory.

Medium pack tests, 32p20s, flamegraph comparison

Memray results for pack_32p20s.py based on flamegraph report. Experiment is a single charge/discharge for multiple cycles. Period is 30 seconds and simulation uses the Casadi solver. All methods/functions/objects in first table are in the solvers.py module.

Cycles setup_actors step_actors V_node, I_batt shm_Ri shm_i_app output
1 47 MiB 87 MiB 6 MiB 1 MiB 1 MiB 3 MiB
5 47 MiB 86 MiB 7 MiB 5 MiB 5 MiB 15 MiB
10 47 MiB 87 MiB 7 MiB 10 MiB 10 MiB 30 MiB
20 47 MiB 87 MiB 7 MiB 20 MiB 20 MiB 61 MiB
40 47 MiB 88 MiB 33 MiB 41 MiB 41 MiB 123 MiB
80 47 MiB 21 MiB 33 MiB 82 MiB 82 MiB 246 MiB
Cycles Cores Description Peak memory
1 20 casadi 325 MB
5 20 casadi 347 MB
10 20 casadi 407 MB
20 20 casadi 428 MB
40 20 casadi 536 MB
80 20 casadi 800 MB

@TomTranter
Copy link
Collaborator Author

@wigging I just ran 16p4s charge and discharge cycling for 10 cycles and saw no increase in memory over time just watching my memory usage in task monitor. I even saw it go up a little and then down a little. This was on windows 10. I wonder if the problem is a linux issue. It doesn't make sense that output should grow in memory over time as the arrays are created on the first step with fixed size and filled with zeros. Do you have any intuition on the differences between memory management or multiprocessing in windows and linux? Here's the code

import liionpack as lp
import pybamm
import numpy as np
import os

lp.set_logging_level('NOTICE')

# Define parameters
Np = 16
Ns = 4
Iapp = 20

# Generate the netlist
netlist = lp.setup_circuit(Np=Np, Ns=Ns)

# Define additional output variables
output_variables = [
    'Volume-averaged cell temperature [K]']

# Define a cycling experiment using PyBaMM
experiment = pybamm.Experiment([
    f'Charge at {Iapp} A for 30 minutes',
    'Rest for 15 minutes',
    f'Discharge at {Iapp} A for 30 minutes',
    'Rest for 30 minutes']*10,
    period='10 seconds')

# Define the PyBaMM parameters
parameter_values = pybamm.ParameterValues("Chen2020")
inputs = {"Total heat transfer coefficient [W.m-2.K-1]": np.ones(Np * Ns) * 10}

# Solve the pack
output = lp.solve(netlist=netlist,
                  sim_func=lp.thermal_simulation,
                  parameter_values=parameter_values,
                  experiment=experiment,
                  output_variables=output_variables,
                  initial_soc=0.5,
                  inputs=inputs,
                  nproc=os.cpu_count(),
                  manager='casadi')

# Plot the pack and individual cell results
lp.plot_pack(output)
lp.plot_cells(output)
lp.show_plots()

@wigging
Copy link
Collaborator

wigging commented May 24, 2022

I also saw no increase in memory for the 10 cycle tests. Memory usage seemed to increase when using 30 cycles or more. Run your example for 100 cycles, not 10 cycles. Use the memory_profiler or Memray package to log your memory use. Not the Window's task manager.

@wigging
Copy link
Collaborator

wigging commented May 24, 2022

Here is the file I'm running for the memory_profiler and Memray tests.

"""
Medium size pack with 640 cells in a 32p20s configuration.

Run with memory_profiler and log at intervals of 20 s
$ mprof run --interval 20 pack_32p20s.py

Plot memory_profiler results to a PDF file
$ mprof plot --output file.pdf file.dat

Run Memray profiler
$ memray run pack_32p20s.py
"""

import liionpack as lp
import pybamm
import numpy as np

# Define parameters
Np = 32
Ns = 20
cycles = 80
nproc = 20
manager = 'casadi'

# Generate the netlist
netlist = lp.setup_circuit(Np=Np, Ns=Ns)

# Define additional output variables
output_variables = ['Volume-averaged cell temperature [K]']

# Define a cycling experiment using PyBaMM
experiment = pybamm.Experiment([
    'Charge at 20 A for 30 minutes',
    'Rest for 15 minutes',
    'Discharge at 20 A for 30 minutes',
    'Rest for 30 minutes'] * cycles,
    period='30 seconds')

# Define the PyBaMM parameters
chemistry = pybamm.parameter_sets.Chen2020
parameter_values = pybamm.ParameterValues(chemistry=chemistry)
inputs = {"Total heat transfer coefficient [W.m-2.K-1]": np.ones(Np * Ns) * 10}

# Solve the pack
output = lp.solve(netlist=netlist,
                  sim_func=lp.thermal_simulation,
                  parameter_values=parameter_values,
                  experiment=experiment,
                  output_variables=output_variables,
                  initial_soc=0.5,
                  inputs=inputs,
                  nproc=nproc,
                  manager=manager)

@TomTranter
Copy link
Collaborator Author

TomTranter commented May 24, 2022

This is what I got for 32p4s 40 cycles using your script:
image
Seems like the growth rate is faster after 400s when you get fewer resets. I couldn't see this happening in the plots you showed above as the total memory was much higher than that periodically recovered. I wonder what is determining this memory release as it should just be the same process repeating over and over... not sure if that's just a trick of the eye tbh.

@wigging
Copy link
Collaborator

wigging commented May 24, 2022

What are the specs of the machine you're running on? Can you run a test for 32p20s for 100 cycles using casadi and ray solvers? And run the profiler using mprof run --interval 20 pack_32p20s.py where pack_32p20s is the name of the file. This would provide a direct comparison between your Windows results and the Ubuntu results shown above.

@TomTranter
Copy link
Collaborator Author

image
32p20s 80 cycles

@TomTranter
Copy link
Collaborator Author

I think something that might fix it would be periodically saving the state of the system and restarting the processes. I need to implement saving and loading states anyway so might be a good work around. It seems like it does do periodic recovery but not enough which is quite strange. Have you tried decorating any functions for the profiler / know if that would reveal anything extra?

@wigging
Copy link
Collaborator

wigging commented May 25, 2022

I put the results from Memray in the above tables. See here. We can go through the Memray results if you think that would help. The shm_Ri, shm_i_app, and output are where the memory is growing. I can look further into this later in the week.

@TomTranter
Copy link
Collaborator Author

TomTranter commented May 26, 2022

The size of the output doesn't make sense. It should be much larger to start with.

import sys
import numpy as np
Nspm = 32 * 20
Nt = 90 * 2 * 100
Nvar = 5
output = np.zeros([Nvar, Nspm, Nt])
print(np.around(sys.getsizeof(output)/1e6, 3), 'mb')

This is about 460 mb

@TomTranter
Copy link
Collaborator Author

I think this might not be a memory leak at all but a slow filling up of the arrays with real numbers instead of pointers. There's something different happening when you do sys.getsizeof and that gives you the full size. I profiled the code and added a new array filled with random numbers that should get fully populated with real numbers, I'm guessing the zeros array all point to a single zero number in memory and then each step in the algorithm fills a new column of data hence it looks like a leak but the actuall arrays should be pretty huge for a 32p20s system. Here is the difference in profiled memory
image
First column is running total and second is increment. I ran a simulation with the same steps as yours but changed the step loop condition to finish after 10 cycles. But here self.Nsteps is as above and self.Nvar is 5 or 6

@wigging
Copy link
Collaborator

wigging commented May 27, 2022

Looks like the main solver function changed in the latest liionpack release. All of the memory profiling I did was with version 0.3. The latest version is 0.3.1 which has the old solver code commented out. So I don't know if any of the results I've shown here are still valid.

@TomTranter
Copy link
Collaborator Author

It's more or less the same code but easier to step and solve. All the memory allocation for the above arrays is the same

@wigging
Copy link
Collaborator

wigging commented Jun 3, 2022

Here's a comparison from using a 30 second period and a 60 second period for a 32p20s simulation. Peak memory results are from the memory_profiler tool. The plot below is from using the 60 second period. As expected, the simulation with a 60 second period has lower memory usage then with a 30 second period. Probably because there are fewer time steps thus fewer results stored in the arrays.

So for these large pack simulations with many cycles, the memory usage could be flattened by not storing results at each time step. It looks like the liionpack code only relies on the previous time step so there's no need to store all the previous time steps in memory. Results from previous times steps could be written to file or deleted depending on how the results should be viewed/analyzed. I need to look at the liionpack code in more detail but I think this might be a good way to reduce the memory usage for large/long running simulations.

Period Cycles Solver Peak memory
30 sec 100 casadi 872 MiB
60 sec 100 casadi 610 MiB

plot32p20s60sec

@wigging
Copy link
Collaborator

wigging commented Jun 6, 2022

I ran some NumPy examples that build a large array to investigate what causes the memory usage to increase. I won't go into all the detail, but the examples and summary of the results can be read here. Based on the examples, the following should be considered for liionpack:

  • Changing the array data type from float64 to float32 can cut memory usage in half.
  • When an array is initialized with np.zeros, the memory is lazily allocated as values are added to the array. The total amount of memory allocated for the array is based on the data type of the values stored in the array. This is why the memory usage is increasing during large liionpack simulations.
  • Writing the array to disk using an hdf5 file allows substantial memory reduction as long as disk space is available for storing the array. Writing the liionpack results to an hdf5 file at each time step could reduce memory usage for large simulations with a negligible effect on performance.

A key finding from the examples I ran is shown in the plot below. It shows the memory usage of building a 500x2000x2000 array. The array is created on disk using an hdf5 file. Notice the memory used is about 100 MiB. If this array were built completely in memory it would use 8 GB of RAM. However, since this approach writes the array to disk, it requires 8 GB of storage space for the hdf5 file.

large  array memory

@TomTranter
Copy link
Collaborator Author

Great, are you able to work out a prototype for saving the output while it solves?

@wigging
Copy link
Collaborator

wigging commented Jun 8, 2022

@TomTranter Just want to check with you and see if you want this to be a part of liionpack. Writing the output to hdf5 is only needed for large long-running simulations. Small to medium pack simulations don't need this feature. Also, this would introduce more package dependences to the project.

@TomTranter
Copy link
Collaborator Author

I think it would be a good feature and happy to have hdf5 deps. It's a very widely used package

@srikanthallu
Copy link
Collaborator

If needed, we could make it optional - but hdf5 is recommended if we ever wanted to do a parallel IO.

@TomTranter
Copy link
Collaborator Author

Maybe Dask arrays would also work but then we'd need to get Dask actors working

@TomTranter
Copy link
Collaborator Author

I think saving the output should be optional but I don't see why the format should be optional too if it's simpler just to use hdf5

@TomTranter
Copy link
Collaborator Author

@wigging did you get anywhere with hdf5? Shall we close this issue and open a new one for that?

@wigging
Copy link
Collaborator

wigging commented Aug 2, 2022

Changing the NumPy array type from float64 to float32 was the first step to saving results to an HDF5 file. This will reduce the file size by half when results are saved. But I haven't done anything beyond that. I will close this issue since the memory profiling is done and we plan to move forward with writing the results to file to avoid memory issues for long running simulations.

@wigging wigging closed this as completed Aug 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants