Chunk size implementation for DIT Algorithm loops for Broadening #489

sagarchotalia · 2022-06-09T15:37:23Z

Objectives

This pull request is to address the chunksize implementation in DIT Algorithm(optimized) loops in the _broaden_lines method.
With this PR, RADIS users shall be able to fully utilize the Chunksize feature in the code. Instead of looping the entire DataFrame, we use chunks of it at a time, hence allowing for much better memory management!

Changes Made

I restructured the code such that the outer loops check for the chunksize variable, since I think that it'll improve code readability. I've only added the time prediction block of code twice at the start of each loop.
Arrays are being updated according to the Chunksize.
Chunksize is now a parameter in calc_spectrum() and calc_spectrum_one_molecule()

Future Objectives

Support for non-equilibrium loops as well (see _broaden_lines_noneq)
Add an "auto" parameter for chunksize, which will calculate the best chunksize based on user-RAM.

Fixes #488

…g of lines.

erwanp · 2022-06-09T22:35:06Z

Hello, good first work!
Can you provide an example by testing, for instance the Radis basic example radis.test_spectrum() using different optimization and chunksize as parameters?

sagarchotalia · 2022-06-26T08:55:17Z

Hello, sorry for getting back so late.
I tested the example after just commenting out the DeprecationWarning lines in calc.py;

# if "chunksize" in kwargs:
#     raise DeprecationWarning("use optimization= instead of chunksize=")

Then, I tested the basic example test_spectrum() as such:

s = radis.test_spectrum(optimization = "min-RMS", chunksize = 100)

The spectrum is calculated, however I do receive an error for the chunksize parameter:

Calculating Equilibrium Spectrum
Physical Conditions
----------------------------------------
   Tgas                 700 K
   Trot                 700 K
   Tvib                 700 K
   isotope              1,2,3
   mole_fraction        0.1
   molecule             CO
   overpopulation       None
   path_length          1 cm
   pressure_mbar        1013.25 mbar
   rot_distribution     boltzmann
   self_absorption      True
   state                X
   vib_distribution     boltzmann
   wavenum_max          2300.0000 cm-1
   wavenum_min          1900.0000 cm-1
Computation Parameters
----------------------------------------
   Tref                 296 K
   add_at_used          
   broadening_method    voigt
   cutoff               1e-27 cm-1/(#.cm-2)
   dbformat             hitran
   dbpath               /Users/sagarchotalia/.radisdb/hitran/CO.hdf5
   folding_thresh       1e-06
   include_neighbouring_lines  True
   memory_mapping_engine  auto
   neighbour_lines      0 cm-1
   optimization         min-RMS
   parfuncfmt           hapi
   parsum_mode          full summation
   pseudo_continuum_threshold  0
   sparse_ldm           auto
   truncation           50 cm-1
   waveunit             cm-1
   wstep                0.01 cm-1
   zero_padding         -1
----------------------------------------
Traceback (most recent call last):
  File "/Users/sagarchotalia/Desktop/radis temp files/example.py", line 24, in <module>
    s = radis.test_spectrum(optimization = "min-RMS", chunksize = 100)
  File "/Users/sagarchotalia/radis/radis/test/utils.py", line 187, in test_spectrum
    s = calc_spectrum(**conditions)
  File "/Users/sagarchotalia/radis/radis/lbl/calc.py", line 514, in calc_spectrum
    generated_spectrum = _calc_spectrum_one_molecule(
  File "/Users/sagarchotalia/radis/radis/lbl/calc.py", line 838, in _calc_spectrum_one_molecule
    s = sf.eq_spectrum(
  File "/Users/sagarchotalia/radis/radis/lbl/factory.py", line 799, in eq_spectrum
    wavenumber, abscoeff_v = self._calc_broadening()
  File "/Users/sagarchotalia/radis/radis/lbl/broadening.py", line 2449, in _calc_broadening
    (wavenumber, abscoeff) = self._broaden_lines(df)
  File "/Users/sagarchotalia/radis/radis/lbl/broadening.py", line 2220, in _broaden_lines
    (wavenumber, absorption) = self._apply_lineshape_LDM(
  File "/Users/sagarchotalia/radis/radis/lbl/broadening.py", line 1934, in _apply_lineshape_LDM
    df = pd.DataFrame(
  File "/Users/sagarchotalia/opt/anaconda3/envs/radis-env/lib/python3.8/site-packages/pandas/core/frame.py", line 636, in __init__
    mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
  File "/Users/sagarchotalia/opt/anaconda3/envs/radis-env/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 502, in dict_to_mgr
    return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
  File "/Users/sagarchotalia/opt/anaconda3/envs/radis-env/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 120, in arrays_to_mgr
    index = _extract_index(arrays)
  File "/Users/sagarchotalia/opt/anaconda3/envs/radis-env/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 674, in _extract_index
    raise ValueError("All arrays must be of the same length")
ValueError: All arrays must be of the same length

Of course, this error doesn't occur if chunksize=None is entered as an argument in the function. I tested for other values of chunksize and optimization, however I'm getting the error everytime except when chunksize is None.

To test this example, clone my RADIS fork and run the test on your local machine.

sagarchotalia · 2022-06-29T18:47:32Z

Update: the errors were being generated due to path errors in my ~/radis.json. I fixed them, and now it's working properly.
The spectrum is being calculated upon being given a chunksize:
s = radis.test_spectrum(optimization = "simple", chunksize = 1000)

Output:

Calculating Equilibrium Spectrum
Physical Conditions
----------------------------------------
   Tgas                 700 K
   Trot                 700 K
   Tvib                 700 K
   isotope              1,2,3
   mole_fraction        0.1
   molecule             CO
   overpopulation       None
   path_length          1 cm
   pressure_mbar        1013.25 mbar
   rot_distribution     boltzmann
   self_absorption      True
   state                X
   vib_distribution     boltzmann
   wavenum_max          2300.0000 cm-1
   wavenum_min          1900.0000 cm-1
Computation Parameters
----------------------------------------
   Tref                 296 K
   add_at_used          
   broadening_method    voigt
   cutoff               1e-27 cm-1/(#.cm-2)
   dbformat             hitran
   dbpath               /Users/sagarchotalia/.radisdb/hitran/CO.hdf5
   folding_thresh       1e-06
   include_neighbouring_lines  True
   memory_mapping_engine  auto
   neighbour_lines      0 cm-1
   optimization         simple
   parfuncfmt           hapi
   parsum_mode          full summation
   pseudo_continuum_threshold  0
   sparse_ldm           auto
   truncation           50 cm-1
   waveunit             cm-1
   wstep                0.01 cm-1
   zero_padding         -1
----------------------------------------
0.50s - Spectrum calculated

The example is working for both optimization = "simple" and "min-RMS", and all chunksize values.

erwanp · 2022-06-29T22:44:21Z

Nice! Can you compare that you get the same spectrum with / without chunksize?

Then can you try to push to large scale spectra (ex : CO2 HiTEMP, full range) and confirm that you can run it on limited RAM?

sagarchotalia · 2022-06-30T14:12:56Z

Hello, I've confirmed that both the spectra are the same, with and without chunksize!
Also, I've tested the RADIS examples calc_hitran_full_range.py, plot_hitemp_spectrum.py and plot_SpecDatabase.pyboth with and without chunksize. They're yielding the same results as well.

P.S. Another thing, I'm having issues plotting through matplotlib due to a publib warning:

/Users/sagarchotalia/opt/anaconda3/lib/python3.8/site-packages/publib/main.py:230: UserWarning: 
Glyph 8315 (\N{SUPERSCRIPT MINUS}) missing from current font.
  plt.tight_layout()

Due to which I'm not able to plot some spectra. But I'll try to get it resolved asap!

erwanp · 2022-06-30T20:35:21Z

Hello! You can change the default plotting style and library of Radis, drop publib and use something else. There is a Gallery Example showing how to customize plots

sagarchotalia · 2022-07-09T08:09:48Z

----------------------------------------
Traceback (most recent call last):
  File "/Users/sagarchotalia/Desktop/radis temp files/example.py", line 24, in <module>
    s = radis.test_spectrum(optimization = "min-RMS", chunksize = 100)
  File "/Users/sagarchotalia/radis/radis/test/utils.py", line 187, in test_spectrum
    s = calc_spectrum(**conditions)
  File "/Users/sagarchotalia/radis/radis/lbl/calc.py", line 514, in calc_spectrum
    generated_spectrum = _calc_spectrum_one_molecule(
  File "/Users/sagarchotalia/radis/radis/lbl/calc.py", line 838, in _calc_spectrum_one_molecule
    s = sf.eq_spectrum(
  File "/Users/sagarchotalia/radis/radis/lbl/factory.py", line 799, in eq_spectrum
    wavenumber, abscoeff_v = self._calc_broadening()
  File "/Users/sagarchotalia/radis/radis/lbl/broadening.py", line 2449, in _calc_broadening
    (wavenumber, abscoeff) = self._broaden_lines(df)
  File "/Users/sagarchotalia/radis/radis/lbl/broadening.py", line 2220, in _broaden_lines
    (wavenumber, absorption) = self._apply_lineshape_LDM(
  File "/Users/sagarchotalia/radis/radis/lbl/broadening.py", line 1934, in _apply_lineshape_LDM
    df = pd.DataFrame(
  File "/Users/sagarchotalia/opt/anaconda3/envs/radis-env/lib/python3.8/site-packages/pandas/core/frame.py", line 636, in __init__
    mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
  File "/Users/sagarchotalia/opt/anaconda3/envs/radis-env/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 502, in dict_to_mgr
    return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
  File "/Users/sagarchotalia/opt/anaconda3/envs/radis-env/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 120, in arrays_to_mgr
    index = _extract_index(arrays)
  File "/Users/sagarchotalia/opt/anaconda3/envs/radis-env/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 674, in _extract_index
    raise ValueError("All arrays must be of the same length")
ValueError: All arrays must be of the same length

It seems that this error is persisting even now. It's occurring on this line of _apply_lineshape_LDM(), giving a ValueError that all arrays should be of the same length.

Also, here are the Pytest results after chunksize implementation:

erwanp

See comments in code

also

make sure you add in your PR objectives to also make the changes for nonequilibrium mode (AFAIK, there is a "broaden_lines_noneq" function somewhere) . Do this only after equilibrium works.

radis/lbl/broadening.py

…_LDM

sagarchotalia · 2022-07-13T07:51:46Z

Here are the plot_diff() results for a HITEMP CO spectra, with the same parameters(optimization="simple"), the only difference being the chunksize.

What I understand is happening:

The values of line_profile_LDM, wL and wG are not the same when calculated over one chunk of the database vs. over the entire database. These values later go in _apply_lineshape_LDM() to compute the coefficients such as li0, li1, mi0, mi1 etc., which affects the overall spectrum, hence the overall result is not the same. Hence these values need to be the same with or without Chunksize.

sagarchotalia · 2022-07-19T17:53:03Z

Hello! Turns out my thinking was in the wrong direction, I finally fixed the error being caused by my implementation by taking shifted_wavenum over the chunks instead of the entire df.
Here's some spectra outputs:

N2O Spectrum

CO2 Spectrum

CO Spectrum

If these are alright, I can go ahead with the chunksize implementation for non-equilibrium spectra.

radis/lbl/broadening.py

erwanp · 2022-07-20T16:56:21Z

Hello @sagarchotalia These results seems great !

Before moving to nonequilibrium we'll have to :

implement tests in radis/test/lbl, with one function computing a spectrum with/without chunksize optimization; and making sure the residual (get_residual) is very small (which it seems to be; right now) ; and remain small forever.

We also have to make sure it is more efficient. Computing all lines together is more efficient if all lines hold in RAM. It becomes problematic when you saturate RAM. With Chunksize, you have an implementation which will ensure you never saturate RAM.

Compute a large spectrum, with lots of lines. For instance HITEMP H2O or HITEMP CH4. Ensure it fills your RAM memory. If it does not, find a larger molecule (ExoMol has many extremely large molecules), or find a weaker computer :) Record calculation times
Then, compare with the same calculation using chunksize, and compare calculation times. You may have to adjust chunksize parameter to optimize it.
@anandxkumar had created a very Performance graph showing calculation time vs number of lines. You'll want to produce the same kind of graph : without chunksize, I expect that beyond a certain threshold, calculation time skyrockets because of RAM saturation. With chunksize, it may be slightly slower at first, but never skyrockets.

After implementation you'll also be able to :

add an "auto" mode which automatically compute the chunksize based on user available RAM (and setting it to None if it seems all lines will fit in RAM); This requires that you have some kind of correlation to know the memory size of a line chunk.

This reverts commit 0621068.

erwanp

See my Warning. The rest are minor comments

radis/test/lbl/test_broadening.py

Making sure chunksize is taken into account in Spectrum Conditions, as per the suggestion by Erwan in radis#489 (comment) Co-authored-by: Erwan Pannier <erwan.pannier@gmail.com>

erwanp · 2022-08-09T15:42:18Z

About the errors

for https://app.travis-ci.com/github/radis/radis/jobs/578898407#L2079 , pinv2 is deprecated and removed in Scipy 1.9, replace with scipy.linalg.pinv
for https://app.travis-ci.com/github/radis/radis/jobs/578898407#L1920, I think this was created by your changes. Have a look. predict_time was written by @anandxkumar I think

codecov-commenter · 2022-08-10T13:41:49Z

Codecov Report

Merging #489 (4669631) into develop (b853a26) will increase coverage by 0.24%.
The diff coverage is 92.53%.

@@             Coverage Diff             @@
##           develop     #489      +/-   ##
===========================================
+ Coverage    73.10%   73.35%   +0.24%     
===========================================
  Files          137      137              
  Lines        18870    18924      +54     
===========================================
+ Hits         13795    13881      +86     
+ Misses        5075     5043      -32

erwanp · 2022-08-10T14:36:09Z

Looks good to me, merging this first major part of the GSOC project !

arunavabasucom · 2022-08-10T14:37:17Z

great @sagarchotalia 🎉

anandxkumar · 2022-08-10T20:35:21Z

Congrats @sagarchotalia on the first major PR merge!

sagarchotalia added 2 commits June 9, 2022 20:55

Added chunk size implementation for DIT Algorithm loops for broadenin…

e263a7f

…g of lines.

Replaced absorption with abscoeff for non-optimized, no chunk loop

b8aff13

Merge branch 'radis:develop' into develop

e478467

sagarchotalia added this to In progress in GSoC 2022: Performance Tweaks in RADIS Jun 23, 2022

sagarchotalia removed this from In progress in GSoC 2022: Performance Tweaks in RADIS Jun 27, 2022

sagarchotalia requested a review from erwanp June 29, 2022 09:17

erwanp requested changes Jul 13, 2022

View reviewed changes

radis/lbl/broadening.py Show resolved Hide resolved

radis/lbl/broadening.py Outdated Show resolved Hide resolved

Added loops for updating array values and feeding to _apply_lineshape…

e34e245

…_LDM

sagarchotalia requested a review from erwanp July 13, 2022 07:38

Fixed the chunksize error

4bb56ab

erwanp reviewed Jul 20, 2022

View reviewed changes

radis/lbl/broadening.py Show resolved Hide resolved

Renamed local variables

0a9788d

sagarchotalia requested a review from erwanp July 20, 2022 15:46

Added test for the chunksize implementation

00dde95

sagarchotalia requested a review from encrypted-soul July 22, 2022 16:38

sagarchotalia added 3 commits July 24, 2022 21:49

Added chunksize for non-equilibrium, DIT algorithm loops

0621068

Revert "Added chunksize for non-equilibrium, DIT algorithm loops"

288f61d

This reverts commit 0621068.

Added usage example in function docs

006309f

erwanp requested changes Jul 31, 2022

View reviewed changes

radis/test/lbl/test_broadening.py Show resolved Hide resolved

radis/test/lbl/test_broadening.py Outdated Show resolved Hide resolved

radis/test/lbl/test_broadening.py Outdated Show resolved Hide resolved

radis/test/lbl/test_broadening.py Outdated Show resolved Hide resolved

Fixed the test function

dda65b8

sagarchotalia mentioned this pull request Aug 1, 2022

Chunksize manual benchmarking notebook for equilibrium spectra added radis/radis-benchmark#12

Open

erwanp requested changes Aug 1, 2022

View reviewed changes

radis/test/lbl/test_broadening.py Outdated Show resolved Hide resolved

sagarchotalia and others added 2 commits August 1, 2022 13:57

Update radis/test/lbl/test_broadening.py

ee6624b

Making sure chunksize is taken into account in Spectrum Conditions, as per the suggestion by Erwan in radis#489 (comment) Co-authored-by: Erwan Pannier <erwan.pannier@gmail.com>

Fixed chunksize variable name

712cf71

sagarchotalia requested a review from erwanp August 8, 2022 17:49

sagarchotalia added 2 commits August 10, 2022 18:14

Fixed error caused by improper predicting of time

faeaa58

Added baseline function from PeakUtils to fix the error in operations.py

4669631

erwanp approved these changes Aug 10, 2022

View reviewed changes

erwanp merged commit 2707fbe into radis:develop Aug 10, 2022

erwanp added this to In progress in GSoC 2022: Performance Tweaks in RADIS via automation Aug 10, 2022

erwanp added the performance label Aug 10, 2022

erwanp added this to the 0.14 milestone Aug 10, 2022

sagarchotalia mentioned this pull request Aug 13, 2022

Fixing the case when the chunksize is too small #500

Merged

erwanp modified the milestones: 0.14, 0.13.1 Aug 14, 2022

erwanp mentioned this pull request Aug 28, 2022

0.13.1 #517

Merged

erwanp mentioned this pull request Oct 12, 2022

🚀 Feature: I want to calculate total emissivity using radis someday #528

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chunk size implementation for DIT Algorithm loops for Broadening #489

Chunk size implementation for DIT Algorithm loops for Broadening #489

sagarchotalia commented Jun 9, 2022 •

edited

erwanp commented Jun 9, 2022

sagarchotalia commented Jun 26, 2022 •

edited

sagarchotalia commented Jun 29, 2022 •

edited

erwanp commented Jun 29, 2022

sagarchotalia commented Jun 30, 2022

erwanp commented Jun 30, 2022

sagarchotalia commented Jul 9, 2022

erwanp left a comment •

edited by sagarchotalia

sagarchotalia commented Jul 13, 2022

sagarchotalia commented Jul 19, 2022

erwanp commented Jul 20, 2022 •

edited by sagarchotalia

erwanp left a comment

erwanp commented Aug 9, 2022 •

edited

codecov-commenter commented Aug 10, 2022

erwanp commented Aug 10, 2022

arunavabasucom commented Aug 10, 2022

anandxkumar commented Aug 10, 2022

Chunk size implementation for DIT Algorithm loops for Broadening #489

Chunk size implementation for DIT Algorithm loops for Broadening #489

Conversation

sagarchotalia commented Jun 9, 2022 • edited

Objectives

Changes Made

Future Objectives

erwanp commented Jun 9, 2022

sagarchotalia commented Jun 26, 2022 • edited

sagarchotalia commented Jun 29, 2022 • edited

erwanp commented Jun 29, 2022

sagarchotalia commented Jun 30, 2022

erwanp commented Jun 30, 2022

sagarchotalia commented Jul 9, 2022

erwanp left a comment • edited by sagarchotalia

Choose a reason for hiding this comment

sagarchotalia commented Jul 13, 2022

sagarchotalia commented Jul 19, 2022

erwanp commented Jul 20, 2022 • edited by sagarchotalia

erwanp left a comment

Choose a reason for hiding this comment

erwanp commented Aug 9, 2022 • edited

codecov-commenter commented Aug 10, 2022

Codecov Report

erwanp commented Aug 10, 2022

arunavabasucom commented Aug 10, 2022

anandxkumar commented Aug 10, 2022

sagarchotalia commented Jun 9, 2022 •

edited

sagarchotalia commented Jun 26, 2022 •

edited

sagarchotalia commented Jun 29, 2022 •

edited

erwanp left a comment •

edited by sagarchotalia

erwanp commented Jul 20, 2022 •

edited by sagarchotalia

erwanp commented Aug 9, 2022 •

edited