The datasets `conditioned/gsim_XXX` are using too much disk space #9387

micheles · 2024-01-26T14:09:24Z

As reported by @CatalinaYepes . A solution could be to store them in the .tmp.hdf5 file. Otherwise, we could revert #9094.

micheles · 2024-03-21T06:43:01Z

Actually the only solution is to reduce the number of sites, since the memory/disk space occupation is quadratic with the number of sites.

raoanirudh · 2024-03-21T06:51:08Z

Why doesn't storing them in the .tmp.hdf5 work? This is data needed only during the calculation and doesn't need to be stored in the final calc.hdf5

micheles · 2024-03-21T06:55:33Z

Because you will soon run out of disk space, this is how Cata discovered the issue. Also, once you start storing 100+GB then reading the data will kill your calculation (out of memory or so slow to be impossible to run). No matter how big is your machine, a quadratic calculation will run out of resources pretty soon. You would need an algorithm not quadratic with the number of sites.

raoanirudh · 2024-05-15T14:01:36Z

Opening this issue again as it still persists.

The issue is not related to having too many sites in the calculation. It was that the conditioned/mean_covs data that is now stored in the calc_xxx.hdf5 file is useful only while the calculation is running, and can safely be deleted from the datastore once the calculation is completed. Or the other option might be to store it in the calc_xxx.tmp.hdf5 file instead, which gets deleted at the end of the calculation, since this interim data is not useful to the user after the calculation is over. If the conditioned/mean_covs data is deleted from the datastore, the hdf5 file sizes in oqdata should go back to the regular sizes for scenario calculations that do not involve conditioning.

micheles · 2024-05-16T03:44:38Z

You are partially right @raoanirudh , but my point still stand that calculations with too many points will be impossible. The only solution I see for Aristotle calculations is to use a large enough region_grid_spacing so that calculations can run. Then, to avoid wasting too much disk space we can store the temporary data in _tmp.hdf5 or even better only keep it in memory as it was originally, before #9094 (retrospectively, it was a bad idea, trading a decent but not impressive speedup for too much disk space).

micheles added the performance label Jan 26, 2024

micheles added this to the Engine 3.19.0 milestone Jan 26, 2024

micheles self-assigned this Jan 26, 2024

micheles modified the milestones: Engine 3.19.0, Engine 3.20.0 Mar 1, 2024

micheles mentioned this issue Mar 21, 2024

Raise an error for GMF-conditioned calculations with too many sites #9540

Merged

micheles closed this as completed in #9540 Mar 21, 2024

raoanirudh reopened this May 15, 2024

micheles mentioned this issue May 16, 2024

Saving disk space in conditioned GMFs calculations #9671

Merged

micheles closed this as completed May 17, 2024

micheles mentioned this issue May 17, 2024

Saving memory in conditioned scenarios #9672

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The datasets `conditioned/gsim_XXX` are using too much disk space #9387

The datasets `conditioned/gsim_XXX` are using too much disk space #9387

micheles commented Jan 26, 2024 •

edited

Loading

micheles commented Mar 21, 2024

raoanirudh commented Mar 21, 2024

micheles commented Mar 21, 2024

raoanirudh commented May 15, 2024

micheles commented May 16, 2024 •

edited

Loading

The datasets conditioned/gsim_XXX are using too much disk space #9387

The datasets conditioned/gsim_XXX are using too much disk space #9387

Comments

micheles commented Jan 26, 2024 • edited Loading

micheles commented Mar 21, 2024

raoanirudh commented Mar 21, 2024

micheles commented Mar 21, 2024

raoanirudh commented May 15, 2024

micheles commented May 16, 2024 • edited Loading

The datasets `conditioned/gsim_XXX` are using too much disk space #9387

The datasets `conditioned/gsim_XXX` are using too much disk space #9387

micheles commented Jan 26, 2024 •

edited

Loading

micheles commented May 16, 2024 •

edited

Loading