Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove memory leaks in HEMCO core and standalone routines (closes #190) #194

Merged
merged 10 commits into from
Feb 28, 2023

Conversation

yantosca
Copy link
Contributor

This is the companion PR to #190. We have rewritten code in several HEMCO modules to remove memory leaks (which were diagnosed with -DSANITIZE=y).

NOTE: The memory leak in hco_config_mod.F90 (due to the FileData object) has not been solved, despite what the comments say.

Closes #190

src/Extensions/hcox_dustdead_mod.F
src/Extensions/hcox_tomas_dustdead_mod.F
- Allocate and zero fields of Inst% individually
- Add missing deallocation statement for Inst%MSS_FRC_CLY.  This field
  was allocated but not deallocated, thus causing the memory leak.

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
src/Core/hco_readlist_mod.F90
- Add an error trap if ReadLists cannot be allocated
- Do not allocate individual fields of ReadLists, nullify instead.
- Cosmetic changes

See: https://www.personal.psu.edu/jhm/f90/statements/nullify.html

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
src/Interfaces/hcoi)standalone_mod.F90
- Replace error messages such as "ERROR 1" with more descriptive
  messages showing the name of the routine where the failure happened.

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
src/Core/hco_arr_mod.F90
- Updated argument headers to make it clear which are input,
  input/output, and output-only
- Replaced placeholder error messages with actual error messages,
  which are passed to HCO_Error.  Added errMsg and thisLoc variables.
- In Hco_ValInit* routines, return a null pointer if input dimensions
  are all zero.  Otherwise allocate the Val array.
- In Hco_ArrInit* routines, do not nullify before allocating. This
  can cause memory leaks if the Arr array is already allocated.
- In ArrVecInit* routines, return a null pointer if nn = 0

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
src/Core/hco_geotools_mod.F90
- Now call HCO_ArrCleanup to finalize the HcoState%Grid%BXHEIGHT_M
  array.  The prior code was just setting this to NULL, which caused
  large memory leaks.  Nullifying a pointer with memory allocated to
  it leaves that memory unreachable.  The proper method is to deallocate
  rather than nullify.

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
src/Core/hco_arr_mod.F90
- In HCO_ArrInit* routines:
  - Remove IF ( ASSOCIATED( Val ) ) blocks and also set Val => NULL()
    before allocating.  This replicates the prior behavior.  However, the
    nullifcation may lead to memory leaks if Arr enters the routine
    already allocated.
  - Change .and. statements to .or statements when checking if
    the dimensions are equal to zero.

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
src/Core/hco_config_mod.F90
- No longer call FileData_Init; this is now done from DataCont_Init
- Do not define local FileData object Dta.  We now use Lct%Dct%Dta,
  which will avoid creating a memory leak.

src/Core/hco_datacont_mod.F90
- DataCont_Init now calls FileData_Init to initialize the FileData
  object that is stored as a subfield of the DataCont object
- Remove IF block where FileData_Cleanup is called
- Remove nullifcation of Dct%Dta in DataCont_Cleanup

src/Core/hco_filedata_mod.F90
- Remove FileDta => NULL() statement, this was generating a memory leak
- Now allocate FileDta if it is not already allocated

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
src/Core/hco_datacont_mod.F90
- In Cleanup_DataCont, added a missing ")" character in the IF statement
  where DeepClean is defined.

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
src/Core/hco_config_mod.F90
- Add top-of-file header splash comments
- Replace local Dta variable with Lct%Dct%Dta in subroutine calls etc.
- Define a local pointer PrevDta to point Lct%Dct%Dta before cycling
  to the next iteration.  This will make sure that PrevDta always points
  to the last FileData object for which we will read data from a netCDF
  file (as opposed to reusing a previous FileData object).
- If srcFile == '-', reuse the FileData object pointed to by PrevDta.
- Update comments to better describe the flow of the code.
- Cosmetic and whitespace changes.

src/Core/hco_datacont_mod.F90
- Remove previously-placed call to FileData_Init.  This will be called
  from routine Config_ReadCont of hco_config_mod.F90.
- Restore Dct%Dta => NULL() statement.  The Dta (aka FileData) object
  will be initialized by calling FileData_Init from hco_config_mod.F90.

src/Core/hco_filedata_mod.F90
- Remove "IF ( .not. ASSOCIATED( FileDta ) )".  We will always call
  ALLOCATE( FileDta ) to initialize a new FileData object here.

CHANGELOG.md
- Updated accordingly

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
@yantosca yantosca added category: Feature Request New feature or request topic: Performance Related to HEMCO performance, parallelization, or memory issues labels Feb 27, 2023
@yantosca yantosca added this to the 3.6.1 milestone Feb 27, 2023
@yantosca yantosca self-assigned this Feb 27, 2023
Copy link
Contributor

@msulprizio msulprizio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These updates look good to merge.

@yantosca
Copy link
Contributor Author

Integration tests revealed an error:

-------------------------------------------------------------------------------
Using HEMCO 3.6.0       

HEMCO precision (hp) is set to is 8-byte real (aka REAL*8)
-------------------------------------------------------------------------------
  
Reading fields of HEMCO configuration file: HEMCO_Config.rc

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x2b4b02f633ff in ???
#1  0x11eaf02 in addzeroscal
	at /n/holyscratch01/jacob_lab/ryantosca/tests/memleak/GCC_memleak/CodeDir/src/HEMCO/src/Core/hco_config_mod.F90:1758
#2  0x11ec15d in addshadowfields
	at /n/holyscratch01/jacob_lab/ryantosca/tests/memleak/GCC_memleak/CodeDir/src/HEMCO/src/Core/hco_config_mod.F90:1677
#3  0x11f1d7e in config_readcont
	at /n/holyscratch01/jacob_lab/ryantosca/tests/memleak/GCC_memleak/CodeDir/src/HEMCO/src/Core/hco_config_mod.F90:1275
#4  0x11f3bb1 in __hco_config_mod_MOD_config_readfile
	at /n/holyscratch01/jacob_lab/ryantosca/tests/memleak/GCC_memleak/CodeDir/src/HEMCO/src/Core/hco_config_mod.F90:367
#5  0x5dfed2 in __hco_interface_gc_mod_MOD_hcoi_gc_init
	at /n/holyscratch01/jacob_lab/ryantosca/tests/memleak/GCC_memleak/CodeDir/src/GEOS-Chem/GeosCore/hco_interface_gc_mod.F90:469
#6  0x4e0c48 in __emissions_mod_MOD_emissions_init
	at /n/holyscratch01/jacob_lab/ryantosca/tests/memleak/GCC_memleak/CodeDir/src/GEOS-Chem/GeosCore/emissions_mod.F90:117
#7  0x406280 in geos_chem
	at /n/holyscratch01/jacob_lab/ryantosca/tests/memleak/GCC_memleak/CodeDir/src/GEOS-Chem/Interfaces/GCClassic/main.F90:691
#8  0x40bb72 in main
	at /n/holyscratch01/jacob_lab/ryantosca/tests/memleak/GCC_memleak/CodeDir/src/GEOS-Chem/Interfaces/GCClassic/main.F90:32
srun: error: holy2c01209: task 0: Segmentation fault (core dumped)

I am investigating further.

src/Core/hco_config_mod.F90
- This has been reverted to the prior state as in commit 52beff6.
  This undoes the switch from Dta to PrevDta.
- Newer comments (which are more thorough) have been manually added.

Signed-off-by: Bob Yantosca <yantosca@seas.harvard.edu>
@yantosca
Copy link
Contributor Author

After reverting the hco_config_mod.F90 in commit 78d230d, GEOS-Chem Classic integration tests now pass.

==============================================================================
GEOS-Chem Classic: Execution Test Results

GCClassic #76f9571 GEOS-Chem submod update: Merge PRs #1663, #1669 into version 14.1.1
GEOS-Chem #22c5c0278 Merge PR #1669 (RRTMG diagnostic indexing fix) into 14.1.1
HEMCO     #78d230d Revert hco_config_mod.F90 to prior state (52beff6) but keep new comments

Using 24 OpenMP threads
Number of execution tests: 26

Submitted as SLURM job: 44164630
==============================================================================
 
Execution tests:
------------------------------------------------------------------------------
gc_05x0625_NA_47L_merra2_CH4........................Execute Simulation....PASS
gc_05x0625_NA_47L_merra2_fullchem...................Execute Simulation....PASS
gc_4x5_47L_merra2_fullchem..........................Execute Simulation....PASS
gc_4x5_47L_merra2_fullchem_TOMAS15..................Execute Simulation....PASS
gc_4x5_47L_merra2_fullchem_TOMAS40..................Execute Simulation....PASS
gc_4x5_merra2_aerosol...............................Execute Simulation....PASS
gc_4x5_merra2_carbon................................Execute Simulation....PASS
gc_4x5_merra2_CH4...................................Execute Simulation....PASS
gc_4x5_merra2_CO2...................................Execute Simulation....PASS
gc_4x5_merra2_fullchem..............................Execute Simulation....PASS
gc_4x5_merra2_fullchem_aciduptake...................Execute Simulation....PASS
gc_4x5_merra2_fullchem_APM..........................Execute Simulation....PASS
gc_4x5_merra2_fullchem_benchmark....................Execute Simulation....PASS
gc_4x5_merra2_fullchem_complexSOA...................Execute Simulation....PASS
gc_4x5_merra2_fullchem_complexSOA_SVPOA.............Execute Simulation....PASS
gc_4x5_merra2_fullchem_LuoWd........................Execute Simulation....PASS
gc_4x5_merra2_fullchem_marinePOA....................Execute Simulation....PASS
gc_4x5_merra2_fullchem_RRTMG........................Execute Simulation....PASS
gc_4x5_merra2_Hg....................................Execute Simulation....PASS
gc_4x5_merra2_metals................................Execute Simulation....PASS
gc_4x5_merra2_POPs_BaP..............................Execute Simulation....PASS
gc_4x5_merra2_tagCH4................................Execute Simulation....PASS
gc_4x5_merra2_tagCO.................................Execute Simulation....PASS
gc_4x5_merra2_tagO3.................................Execute Simulation....PASS
gc_4x5_merra2_TransportTracers......................Execute Simulation....PASS
gc_4x5_merra2_TransportTracers_LuoWd................Execute Simulation....PASS
 
Summary of test results:
------------------------------------------------------------------------------
Execution tests passed: 26
Execution tests failed: 0
Execution tests not yet completed: 0

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%  All execution tests passed!  %%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

and GCHP integration tests also pass except tagO3, which is a known issue and which will be fixed later.

==============================================================================
GCHP: Execution Test Results

GCClassic #8d3db1a GEOS-Chem submod update: Merge PRs #1663, #1669 into version 14.1.1
GEOS-Chem #22c5c0278 Merge PR #1669 (RRTMG diagnostic indexing fix) into 14.1.1
HEMCO     #78d230d Revert hco_config_mod.F90 to prior state (52beff6) but keep new comments

Number of execution tests: 5

Submitted as SLURM job: 44165605
==============================================================================
 
Execution tests:
------------------------------------------------------------------------------
gchp_merra2_fullchem................................Execute Simulation....PASS
gchp_merra2_fullchem_benchmark......................Execute Simulation....PASS
gchp_merra2_fullchem_RRTMG..........................Execute Simulation....PASS
gchp_merra2_tagO3...................................Execute Simulation....FAIL
gchp_merra2_TransportTracers........................Execute Simulation....PASS
 
Summary of test results:
------------------------------------------------------------------------------
Execution tests passed: 4
Execution tests failed: 1
Execution tests not yet completed: 0

@yantosca yantosca merged commit 317827d into dev/3.6.1 Feb 28, 2023
@yantosca yantosca deleted the bugfix/cleanup-memory-leaks branch February 28, 2023 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Feature Request New feature or request topic: Performance Related to HEMCO performance, parallelization, or memory issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants