Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunk size implementation for DIT Algorithm loops for Broadening #489

Merged
merged 15 commits into from Aug 10, 2022

Conversation

sagarchotalia
Copy link
Collaborator

@sagarchotalia sagarchotalia commented Jun 9, 2022

Objectives

This pull request is to address the chunksize implementation in DIT Algorithm(optimized) loops in the _broaden_lines method.
With this PR, RADIS users shall be able to fully utilize the Chunksize feature in the code. Instead of looping the entire DataFrame, we use chunks of it at a time, hence allowing for much better memory management!

Changes Made

  • I restructured the code such that the outer loops check for the chunksize variable, since I think that it'll improve code readability. I've only added the time prediction block of code twice at the start of each loop.
  • Arrays are being updated according to the Chunksize.
  • Chunksize is now a parameter in calc_spectrum() and calc_spectrum_one_molecule()

Future Objectives

  • Support for non-equilibrium loops as well (see _broaden_lines_noneq)
  • Add an "auto" parameter for chunksize, which will calculate the best chunksize based on user-RAM.

Fixes #488

@erwanp
Copy link
Member

erwanp commented Jun 9, 2022

Hello, good first work!
Can you provide an example by testing, for instance the Radis basic example radis.test_spectrum() using different optimization and chunksize as parameters?

@sagarchotalia
Copy link
Collaborator Author

sagarchotalia commented Jun 26, 2022

Hello, sorry for getting back so late.
I tested the example after just commenting out the DeprecationWarning lines in calc.py;

# if "chunksize" in kwargs:
#     raise DeprecationWarning("use optimization= instead of chunksize=")

Then, I tested the basic example test_spectrum() as such:

s = radis.test_spectrum(optimization = "min-RMS", chunksize = 100)

The spectrum is calculated, however I do receive an error for the chunksize parameter:

Calculating Equilibrium Spectrum
Physical Conditions
----------------------------------------
   Tgas                 700 K
   Trot                 700 K
   Tvib                 700 K
   isotope              1,2,3
   mole_fraction        0.1
   molecule             CO
   overpopulation       None
   path_length          1 cm
   pressure_mbar        1013.25 mbar
   rot_distribution     boltzmann
   self_absorption      True
   state                X
   vib_distribution     boltzmann
   wavenum_max          2300.0000 cm-1
   wavenum_min          1900.0000 cm-1
Computation Parameters
----------------------------------------
   Tref                 296 K
   add_at_used          
   broadening_method    voigt
   cutoff               1e-27 cm-1/(#.cm-2)
   dbformat             hitran
   dbpath               /Users/sagarchotalia/.radisdb/hitran/CO.hdf5
   folding_thresh       1e-06
   include_neighbouring_lines  True
   memory_mapping_engine  auto
   neighbour_lines      0 cm-1
   optimization         min-RMS
   parfuncfmt           hapi
   parsum_mode          full summation
   pseudo_continuum_threshold  0
   sparse_ldm           auto
   truncation           50 cm-1
   waveunit             cm-1
   wstep                0.01 cm-1
   zero_padding         -1
----------------------------------------
Traceback (most recent call last):
  File "/Users/sagarchotalia/Desktop/radis temp files/example.py", line 24, in <module>
    s = radis.test_spectrum(optimization = "min-RMS", chunksize = 100)
  File "/Users/sagarchotalia/radis/radis/test/utils.py", line 187, in test_spectrum
    s = calc_spectrum(**conditions)
  File "/Users/sagarchotalia/radis/radis/lbl/calc.py", line 514, in calc_spectrum
    generated_spectrum = _calc_spectrum_one_molecule(
  File "/Users/sagarchotalia/radis/radis/lbl/calc.py", line 838, in _calc_spectrum_one_molecule
    s = sf.eq_spectrum(
  File "/Users/sagarchotalia/radis/radis/lbl/factory.py", line 799, in eq_spectrum
    wavenumber, abscoeff_v = self._calc_broadening()
  File "/Users/sagarchotalia/radis/radis/lbl/broadening.py", line 2449, in _calc_broadening
    (wavenumber, abscoeff) = self._broaden_lines(df)
  File "/Users/sagarchotalia/radis/radis/lbl/broadening.py", line 2220, in _broaden_lines
    (wavenumber, absorption) = self._apply_lineshape_LDM(
  File "/Users/sagarchotalia/radis/radis/lbl/broadening.py", line 1934, in _apply_lineshape_LDM
    df = pd.DataFrame(
  File "/Users/sagarchotalia/opt/anaconda3/envs/radis-env/lib/python3.8/site-packages/pandas/core/frame.py", line 636, in __init__
    mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
  File "/Users/sagarchotalia/opt/anaconda3/envs/radis-env/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 502, in dict_to_mgr
    return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
  File "/Users/sagarchotalia/opt/anaconda3/envs/radis-env/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 120, in arrays_to_mgr
    index = _extract_index(arrays)
  File "/Users/sagarchotalia/opt/anaconda3/envs/radis-env/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 674, in _extract_index
    raise ValueError("All arrays must be of the same length")
ValueError: All arrays must be of the same length

Of course, this error doesn't occur if chunksize=None is entered as an argument in the function. I tested for other values of chunksize and optimization, however I'm getting the error everytime except when chunksize is None.

To test this example, clone my RADIS fork and run the test on your local machine.

@sagarchotalia
Copy link
Collaborator Author

sagarchotalia commented Jun 29, 2022

Update: the errors were being generated due to path errors in my ~/radis.json. I fixed them, and now it's working properly.
The spectrum is being calculated upon being given a chunksize:
s = radis.test_spectrum(optimization = "simple", chunksize = 1000)

Output:

Calculating Equilibrium Spectrum
Physical Conditions
----------------------------------------
   Tgas                 700 K
   Trot                 700 K
   Tvib                 700 K
   isotope              1,2,3
   mole_fraction        0.1
   molecule             CO
   overpopulation       None
   path_length          1 cm
   pressure_mbar        1013.25 mbar
   rot_distribution     boltzmann
   self_absorption      True
   state                X
   vib_distribution     boltzmann
   wavenum_max          2300.0000 cm-1
   wavenum_min          1900.0000 cm-1
Computation Parameters
----------------------------------------
   Tref                 296 K
   add_at_used          
   broadening_method    voigt
   cutoff               1e-27 cm-1/(#.cm-2)
   dbformat             hitran
   dbpath               /Users/sagarchotalia/.radisdb/hitran/CO.hdf5
   folding_thresh       1e-06
   include_neighbouring_lines  True
   memory_mapping_engine  auto
   neighbour_lines      0 cm-1
   optimization         simple
   parfuncfmt           hapi
   parsum_mode          full summation
   pseudo_continuum_threshold  0
   sparse_ldm           auto
   truncation           50 cm-1
   waveunit             cm-1
   wstep                0.01 cm-1
   zero_padding         -1
----------------------------------------
0.50s - Spectrum calculated

The example is working for both optimization = "simple" and "min-RMS", and all chunksize values.

@erwanp
Copy link
Member

erwanp commented Jun 29, 2022

Nice! Can you compare that you get the same spectrum with / without chunksize?

Then can you try to push to large scale spectra (ex : CO2 HiTEMP, full range) and confirm that you can run it on limited RAM?

@sagarchotalia
Copy link
Collaborator Author

Hello, I've confirmed that both the spectra are the same, with and without chunksize!
Also, I've tested the RADIS examples calc_hitran_full_range.py, plot_hitemp_spectrum.py and plot_SpecDatabase.pyboth with and without chunksize. They're yielding the same results as well.

P.S. Another thing, I'm having issues plotting through matplotlib due to a publib warning:

/Users/sagarchotalia/opt/anaconda3/lib/python3.8/site-packages/publib/main.py:230: UserWarning: 
Glyph 8315 (\N{SUPERSCRIPT MINUS}) missing from current font.
  plt.tight_layout()

Due to which I'm not able to plot some spectra. But I'll try to get it resolved asap!

@erwanp
Copy link
Member

erwanp commented Jun 30, 2022

Hello! You can change the default plotting style and library of Radis, drop publib and use something else. There is a Gallery Example showing how to customize plots

@sagarchotalia
Copy link
Collaborator Author

----------------------------------------
Traceback (most recent call last):
  File "/Users/sagarchotalia/Desktop/radis temp files/example.py", line 24, in <module>
    s = radis.test_spectrum(optimization = "min-RMS", chunksize = 100)
  File "/Users/sagarchotalia/radis/radis/test/utils.py", line 187, in test_spectrum
    s = calc_spectrum(**conditions)
  File "/Users/sagarchotalia/radis/radis/lbl/calc.py", line 514, in calc_spectrum
    generated_spectrum = _calc_spectrum_one_molecule(
  File "/Users/sagarchotalia/radis/radis/lbl/calc.py", line 838, in _calc_spectrum_one_molecule
    s = sf.eq_spectrum(
  File "/Users/sagarchotalia/radis/radis/lbl/factory.py", line 799, in eq_spectrum
    wavenumber, abscoeff_v = self._calc_broadening()
  File "/Users/sagarchotalia/radis/radis/lbl/broadening.py", line 2449, in _calc_broadening
    (wavenumber, abscoeff) = self._broaden_lines(df)
  File "/Users/sagarchotalia/radis/radis/lbl/broadening.py", line 2220, in _broaden_lines
    (wavenumber, absorption) = self._apply_lineshape_LDM(
  File "/Users/sagarchotalia/radis/radis/lbl/broadening.py", line 1934, in _apply_lineshape_LDM
    df = pd.DataFrame(
  File "/Users/sagarchotalia/opt/anaconda3/envs/radis-env/lib/python3.8/site-packages/pandas/core/frame.py", line 636, in __init__
    mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
  File "/Users/sagarchotalia/opt/anaconda3/envs/radis-env/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 502, in dict_to_mgr
    return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
  File "/Users/sagarchotalia/opt/anaconda3/envs/radis-env/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 120, in arrays_to_mgr
    index = _extract_index(arrays)
  File "/Users/sagarchotalia/opt/anaconda3/envs/radis-env/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 674, in _extract_index
    raise ValueError("All arrays must be of the same length")
ValueError: All arrays must be of the same length

It seems that this error is persisting even now. It's occurring on this line of _apply_lineshape_LDM(), giving a ValueError that all arrays should be of the same length.

Also, here are the Pytest results after chunksize implementation:
Screenshot 2022-07-09 at 1 26 00 PM

Copy link
Member

@erwanp erwanp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments in code

also

  • make sure you add in your PR objectives to also make the changes for nonequilibrium mode (AFAIK, there is a "broaden_lines_noneq" function somewhere) . Do this only after equilibrium works.

radis/lbl/broadening.py Show resolved Hide resolved
radis/lbl/broadening.py Outdated Show resolved Hide resolved
@sagarchotalia
Copy link
Collaborator Author

Here are the plot_diff() results for a HITEMP CO spectra, with the same parameters(optimization="simple"), the only difference being the chunksize.
image

What I understand is happening:

  • The values of line_profile_LDM, wL and wG are not the same when calculated over one chunk of the database vs. over the entire database. These values later go in _apply_lineshape_LDM() to compute the coefficients such as li0, li1, mi0, mi1 etc., which affects the overall spectrum, hence the overall result is not the same. Hence these values need to be the same with or without Chunksize.

@sagarchotalia
Copy link
Collaborator Author

Hello! Turns out my thinking was in the wrong direction, I finally fixed the error being caused by my implementation by taking shifted_wavenum over the chunks instead of the entire df.
Here's some spectra outputs:
image
N2O Spectrum
image
CO2 Spectrum
image
CO Spectrum

If these are alright, I can go ahead with the chunksize implementation for non-equilibrium spectra.

@erwanp
Copy link
Member

erwanp commented Jul 20, 2022

Hello @sagarchotalia These results seems great !

Before moving to nonequilibrium we'll have to :

  • implement tests in radis/test/lbl, with one function computing a spectrum with/without chunksize optimization; and making sure the residual (get_residual) is very small (which it seems to be; right now) ; and remain small forever.

We also have to make sure it is more efficient. Computing all lines together is more efficient if all lines hold in RAM. It becomes problematic when you saturate RAM. With Chunksize, you have an implementation which will ensure you never saturate RAM.

  • Compute a large spectrum, with lots of lines. For instance HITEMP H2O or HITEMP CH4. Ensure it fills your RAM memory. If it does not, find a larger molecule (ExoMol has many extremely large molecules), or find a weaker computer :) Record calculation times
  • Then, compare with the same calculation using chunksize, and compare calculation times. You may have to adjust chunksize parameter to optimize it.
  • @anandxkumar had created a very Performance graph showing calculation time vs number of lines. You'll want to produce the same kind of graph : without chunksize, I expect that beyond a certain threshold, calculation time skyrockets because of RAM saturation. With chunksize, it may be slightly slower at first, but never skyrockets.

After implementation you'll also be able to :

  • add an "auto" mode which automatically compute the chunksize based on user available RAM (and setting it to None if it seems all lines will fit in RAM); This requires that you have some kind of correlation to know the memory size of a line chunk.

Copy link
Member

@erwanp erwanp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my Warning. The rest are minor comments

radis/test/lbl/test_broadening.py Show resolved Hide resolved
radis/test/lbl/test_broadening.py Outdated Show resolved Hide resolved
radis/test/lbl/test_broadening.py Outdated Show resolved Hide resolved
radis/test/lbl/test_broadening.py Outdated Show resolved Hide resolved
radis/test/lbl/test_broadening.py Outdated Show resolved Hide resolved
sagarchotalia and others added 2 commits August 1, 2022 13:57
Making sure chunksize is taken into account in Spectrum Conditions, as per the suggestion by Erwan in radis#489 (comment)

Co-authored-by: Erwan Pannier <erwan.pannier@gmail.com>
@erwanp
Copy link
Member

erwanp commented Aug 9, 2022

About the errors

@codecov-commenter
Copy link

Codecov Report

Merging #489 (4669631) into develop (b853a26) will increase coverage by 0.24%.
The diff coverage is 92.53%.

@@             Coverage Diff             @@
##           develop     #489      +/-   ##
===========================================
+ Coverage    73.10%   73.35%   +0.24%     
===========================================
  Files          137      137              
  Lines        18870    18924      +54     
===========================================
+ Hits         13795    13881      +86     
+ Misses        5075     5043      -32     

@erwanp
Copy link
Member

erwanp commented Aug 10, 2022

Looks good to me, merging this first major part of the GSOC project !

@erwanp erwanp merged commit 2707fbe into radis:develop Aug 10, 2022
@erwanp erwanp added this to In progress in GSoC 2022: Performance Tweaks in RADIS via automation Aug 10, 2022
@erwanp erwanp added this to the 0.14 milestone Aug 10, 2022
@arunavabasucom
Copy link
Collaborator

great @sagarchotalia 🎉

@anandxkumar
Copy link
Collaborator

Congrats @sagarchotalia on the first major PR merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Adding a Chunk size method for broadening in Optimized loops
5 participants