Skip to content

negative binomial tom for classical area/point sources #7870

Merged
micheles merged 16 commits intogem:masterfrom
pabloitu:negbinom_3.15
Jun 29, 2022
Merged

negative binomial tom for classical area/point sources #7870
micheles merged 16 commits intogem:masterfrom
pabloitu:negbinom_3.15

Conversation

@pabloitu
Copy link
Contributor

(Draft) Created new tom implementation for uncorrelated sources, with mu/alpha parametrization of NB (Kagan, 2010)). Modified sourceconverter/writer to be able to handle NB-Tom XML-Nodes from source_model files. And modified context manager to handle a parametric_source with multiple rupture occurrence.

@micheles micheles added this to the Engine 3.15.0 milestone May 31, 2022
@pabloitu pabloitu force-pushed the negbinom_3.15 branch 3 times, most recently from c88f7b5 to 5fd3019 Compare June 3, 2022 23:10
@micheles
Copy link
Contributor

micheles commented Jun 9, 2022

This is the most advanced contribution (in terms of impact on the engine core) we ever received in 10 years. You clearly did an impressive amount of work.

However, when working at this level of sophistication, the devil is in the details: the new feature causes a significant slowdown for code not using the negbin distribution.

For instance do the following in current master:

$ oq run demos/hazard/LogicTreeCase2ClassicalPSHA/job.ini && oq show performance
| calc_31715, maxmem=2.4 GB  | time_sec  | memory_mb | counts |
|----------------------------+-----------+-----------+--------|
| total classical            | 12.3      | 4.52344   | 27     |
| make_contexts              | 6.30520   | 0.0       | 14_388 |
| ClassicalCalculator.run    | 3.89814   | 22.2      | 1      |

In your branch the performance is terrible since all the time in spent in saving the ruptures:

$ oq run demos/hazard/LogicTreeCase2ClassicalPSHA/job.ini && oq show performance
| calc_31714, maxmem=2.3 GB  | time_sec  | memory_mb | counts |
|----------------------------+-----------+-----------+--------|
| ClassicalCalculator.run    | 46.6      | 23.4      | 1      |
| saving rup_data            | 42.8      | 0.25781   | 27     |
| total classical            | 16.4      | 4.46094   | 27     |

My suggestion would be to split this PR in many small PRs adding various pieces, checking at every moment that you do not lose performance in calculations not using the feature. When you find a PR with a performance hit, please ask here for help.

@pabloitu
Copy link
Contributor Author

pabloitu commented Jun 9, 2022

Michele, thank you for your reply! I was finger crossing that I would not cause a performance issue, so thank you for pinpointing what could be causing that. I will take a look step by step what is causing the performance break, and when I have identified it, we could discuss an elegant solution for it.

@mmpagani
Copy link
Member

mmpagani commented Jun 9, 2022

Pablo, I gave a look at the PR and did not manage to find at first sight a component of the code where what you are adding might impact the other sources. I will give a more thorough look. Note moreover that we have a test broken that we need to figure out why it's not passing.

@pabloitu
Copy link
Contributor Author

I fixed what was slowing the overall process. Apparently I was unfolding the context rates everytime I wanted to check in concat() whether if a ctx was Poisson, Nonparam or Negative Binomial. But, to check the failing tests: Could you point me to a doc or tell me how can I run them myself?

@micheles
Copy link
Contributor

Update your branch from the latest master, install pytest and then

$ pytest -vsx openquake/hazardlib/tests/sourcewriter_test.py 

pabloitu added 2 commits June 17, 2022 15:12
 Negative Binomial implementation with uncorrelated sources.
- Added NegativeBinomialTOM class in hazardlib.tom. Calculate probabilities of no-exceedance approximating the infinite series.
- Modified sourcewriter.py to write a source group, where some of its pointsources could be NB.
- Modified sourceconverter.py to read source models with a temporal_occurrence_model node in the definition of Point Sources.
- Modified contexts.py to use the temporal model of a point source (which has the parameters for each NB point, instead of the MFD)

- For rupture_contexts that have both occurrence_rate (mean_rate) and probs_occur (e.g. negbinomial or other parametric non-poisson distributions),  bypassed the concateniation between context. It is due to numpy forcing for probs_occur to have the same shape.

fixed typo

fixed typo

Added NB doc and set a n_max fixed for NB pdf (7), so contexts can handle the multiple_shape array of context.probs_occur

Added NB tom test

- Fixed NB writer to write properly a negbinom tom
- Modified NegativeBinomial.get_pmf() to handle mean_rates of type int and np.ndarray
- Modified contexts to handle properly a NB TOM, when multiple rupture planes and hypocentral depths are present.

 Negative Binomial implementation with uncorrelated sources.
- Added NegativeBinomialTOM class in hazardlib.tom. Calculate probabilities of no-exceedance approximating the infinite series.
- Modified sourcewriter.py to write a source group, where some of its pointsources could be NB.
- Modified sourceconverter.py to read source models with a temporal_occurrence_model node in the definition of Point Sources.
- Modified contexts.py to use the temporal model of a point source (which has the parameters for each NB point, instead of the MFD)
Fixed sourcewriter to ignore sourcegroup tom.

- modified tom name in sourcewriter

- Modified concat to not access full occurrence_rate array while checking for NB
- Fixed NB qa_tests
pabloitu added 7 commits June 21, 2022 13:24
 Negative Binomial implementation with uncorrelated sources.
- Added NegativeBinomialTOM class in hazardlib.tom. Calculate probabilities of no-exceedance approximating the infinite series.
- Modified sourcewriter.py to write a source group, where some of its pointsources could be NB.
- Modified sourceconverter.py to read source models with a temporal_occurrence_model node in the definition of Point Sources.
- Modified contexts.py to use the temporal model of a point source (which has the parameters for each NB point, instead of the MFD)

- For rupture_contexts that have both occurrence_rate (mean_rate) and probs_occur (e.g. negbinomial or other parametric non-poisson distributions),  bypassed the concateniation between context. It is due to numpy forcing for probs_occur to have the same shape.

fixed typo

fixed typo

Added NB doc and set a n_max fixed for NB pdf (7), so contexts can handle the multiple_shape array of context.probs_occur

Added NB tom test

- Fixed NB writer to write properly a negbinom tom
- Modified NegativeBinomial.get_pmf() to handle mean_rates of type int and np.ndarray
- Modified contexts to handle properly a NB TOM, when multiple rupture planes and hypocentral depths are present.

 Negative Binomial implementation with uncorrelated sources.
- Added NegativeBinomialTOM class in hazardlib.tom. Calculate probabilities of no-exceedance approximating the infinite series.
- Modified sourcewriter.py to write a source group, where some of its pointsources could be NB.
- Modified sourceconverter.py to read source models with a temporal_occurrence_model node in the definition of Point Sources.
- Modified contexts.py to use the temporal model of a point source (which has the parameters for each NB point, instead of the MFD)
Fixed sourcewriter to ignore sourcegroup tom.

- modified tom name in sourcewriter

- Modified concat to not access full occurrence_rate array while checking for NB
- Fixed NB qa_tests
…ntext, and then concatenates by identical shape of probs_occur
…ative binomial case_78 test to classical_test
@pabloitu
Copy link
Contributor Author

I've managed to run all the tests, find some details in my modifications that made them fail.

@micheles
Copy link
Contributor

case_78 is still red

@pabloitu
Copy link
Contributor Author

sorry, missed an init in that the case_78 for negbinom

if parametric_np:
for shp in set(ctx.probs_occur.shape[1] for ctx in parametric_np):
p_array = [p for p in parametric_np if p.probs_occur.shape[1] == shp]
out.append(numpy.concatenate(p_array).view(numpy.recarray))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment explaining the logic here?

:class:`openquake.hazardlib.mfd.TruncatedGRMFD` instance
"""

if node.tag.endswith('pointSource'): # Check if there a tom specified at the point level
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not clear to me: what happens if you have a non-pointsource? How do you specify the TOM? Also, if you have 1 million sources specifying the TOM for each one makes little sense. My understanding is the TOM should be set at the SourceGroup level, not at the source level, i.e. there are multiple sources with the same TOM. Let me check with @mmpagani and we will implement that feature, including a general way of passing extra parameters to the TOM subclass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out it is already implemented, see this example: https://github.com/gem/oq-engine/blob/master/openquake/qa_tests_data/classical/case_35/source_model.xml
The place to touch is the method get_tom in the SourceConverter.

"""
Negative Binomial temporal occurrence model.
"""
def __init__(self, time_span, occurrence_rate=None, parameters=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like to use a generic parameters. Just use mu and alpha, as mandatory parameters. Also the occurrence_rate is not needed, see #7931

pabloitu added 5 commits June 23, 2022 13:49
# Conflicts:
#	openquake/hazardlib/sourceconverter.py
…hanged tom to become an attribute of a Source node, in consistency to nonparametric or cluster, rather than an making it an extra subnode.

Modified NegativeBinomialTOM class to explicitly give mu and alpha as param.
# if tom is negbinom, sets mu and alpha attr to tom_class
if node['tom'] == 'NegativeBinomialTOM':
kwargs = {'alpha': eval(node['alpha']),
'mu': eval(node['mu'])}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use ast.literal_eval here, which is safe

name="point00000"
tom="NegativeBinomialTOM"
mu="0.1"
alpha="2.0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tom, mu and alpha should go inside the sourceGroup, not inside pointSource.

if isinstance(tom, NegativeBinomialTOM):
attrs['tom'] = 'NegativeBinomialTOM'
attrs['mu'] = tom.mu
attrs['alpha'] = tom.alpha
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the right place for this, since the tom instance should not be an attribute of PointSource, only of SourceGroup.

Copy link
Contributor

@micheles micheles Jun 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion, it became clear that for the New Zealand model they really need by source TOMs, so I will retract my objection. Still the sourcewriter will not work for non-point-sources with NegativeBinomialTOM.

@micheles
Copy link
Contributor

LGTM

@mmpagani
Copy link
Member

mmpagani commented Jun 29, 2022

LGTM

@pabloitu further checks to be performed and not covered by this PR:

  • event-based calculation
  • disaggregation calculation

Both are important since the former is needed for risk analyses while disaggregation provides important information for engineering applications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants