use inference data in end of sampling report #3883

OriolAbril · 2020-04-16T20:32:48Z

Current code calls az.ess and az.rhat using trace objects instead of inference data objects. ArviZ therefore converts the trace to inferencedata object twice. This is generally not an issue and it is barely noticeable unless the number of observations is large, in which case the conversion (and consequent retrieving of pointwise log likelihood data) can be quite memory expensive. Below there is memory usage in an example (courtesy of @nitishp25) where the effect is quite noticeable:

This PR explicits the conversion to inferencedata so it only happens once, and if possible, skips retrieving the pointwise log likelihood data (which is not needed for ess nor rhat calculation).

OriolAbril · 2020-04-16T20:34:47Z

pymc3/backends/report.py

@@ -107,7 +107,7 @@ def _run_convergence_checks(self, trace, model):
            self._add_warnings([warn])
            return

-        from pymc3 import rhat, ess
+        from pymc3 import rhat, ess, _to_arviz


I have used this to_arviz approach for readability, but I can change it to from arviz import from_pymc3 if you prefer it

Could _to_arviz be a place where to put storing of sampling metadata? (See arviz-devs/arviz#1146)

I definitely think it has to be added to from_pymc3 so also to _to_arviz, is that the question?

I didn't know _to_arviz existed, so to me it sounds like code duplication?

I aliased from_pymc3 to _to_arviz, I was worried calling a function called from_pymc3 inside pymc code would be confusing. I guess it depends on the general degree of familiarity with ArviZ

codecov · 2020-04-16T21:00:26Z

Codecov Report

Merging #3883 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #3883   +/-   ##
=======================================
  Coverage   90.71%   90.71%           
=======================================
  Files         135      135           
  Lines       21314    21316    +2     
=======================================
+ Hits        19335    19337    +2     
  Misses       1979     1979

Impacted Files	Coverage Δ
pymc3/backends/report.py	`93.07% <100.00%> (+0.10%)`	⬆️

michaelosthege · 2020-04-17T08:56:24Z

@OriolAbril do you think we could transition to returning InferenceData in the long term?

OriolAbril · 2020-04-17T16:02:29Z

@OriolAbril do you think we could transition to returning InferenceData in the long term?

Now that ArviZ is a dependency, it is feasible, however it would be quite a breaking change. It should probably be discussed in an issue or lab meeting. Another alternative would be to add an option to pm.sampling to return an inference data object 🤔

AlexAndorra · 2020-04-17T16:41:57Z

I like the idea of having an option to return an inference data object! Would make the modeling workflow easier and more elegant.

rpgoldman · 2020-04-17T16:44:23Z

Returning an InferenceData is a nice idea, but it would radically change the posterior predictive sampling code, which is pretty rickety already. To be honest, I'm so scarred by my experience working on fast_sample_posterior_predictive that I wouldn't like to go back in there again.

michaelosthege · 2020-04-17T18:05:39Z

Returning an InferenceData is a nice idea, but it would radically change the posterior predictive sampling code, which is pretty rickety already. To be honest, I'm so scarred by my experience working on fast_sample_posterior_predictive that I wouldn't like to go back in there again.

But we already have a converter to swallow InferenceData.posterior objects in sample_posterior_predictive - we did thata few weeks ago, remember?
https://github.com/pymc-devs/pymc3/blob/e7bc8326541e2c7aeee1f3741ff1877f716077a1/pymc3/sampling.py#L1561

rpgoldman · 2020-04-17T18:10:34Z

Ah, right. I had forgotten. And it's already ported to fast_sample_posterior_predictive. OK, never mind!

lucianopaz · 2020-04-17T19:07:21Z

@OriolAbril do you think we could transition to returning InferenceData in the long term?

Now that ArviZ is a dependency, it is feasible, however it would be quite a breaking change. It should probably be discussed in an issue or lab meeting. Another alternative would be to add an option to pm.sampling to return an inference data object 🤔

I think that the steps to transition to an InferenceData interface should be:

Write a new trace backend that uses xarray datasets or whatever data container arviz uses under the hood. This way, arviz would only need to get a reference to the true trace instead of copying the contents into a new data structure.
Proceed as the guys in scikit-learn do: issue a FutureWarning in pm.sample saying that the default backend will change from NDArray to XArray (or whatever it is called in the end).
Leave the warning for the duration of at least one minor release before switching the default. By this I mean that, if the new backend is added in pymc3 v3.9, the default should change in v3.10.

The way in which the future warnings is dealt with usually is by setting the default argument value to something that indicates that the user did not explicitly provide anything as it's value (could be None or a string like "unset"). Then, if the argument is unset, the future warning is signaled and the argument's value is set to NDArray to preserve backwards compatibility.

nitishp25 · 2020-04-18T11:09:58Z

Btw, this is how the memory usage looks after the changes done in this PR:

pymc3/backends/report.py

OriolAbril · 2020-04-18T19:26:26Z

Addressed the comments about using inference data to avoid memory usage peaks due to pointwise log likelihood. I would leave returning an inference data object as a result of pm.sample for another PR.

twiecki · 2020-04-19T08:44:54Z

@OriolAbril Looks great -- thanks! Can you add a note to the release-notes under maintenance?

twiecki · 2020-04-20T07:42:17Z

Thanks @OriolAbril !

OriolAbril commented Apr 16, 2020

View reviewed changes

michaelosthege requested changes Apr 18, 2020

View reviewed changes

pymc3/backends/report.py Outdated Show resolved Hide resolved

use from_pymc3(..., log_likelihood=False) and update requirements

9b697f6

OriolAbril force-pushed the arviz_integration branch from ced029d to 9b697f6 Compare April 18, 2020 19:23

michaelosthege approved these changes Apr 18, 2020

View reviewed changes

update release notes

ecb70a6

twiecki merged commit 80a82dd into pymc-devs:master Apr 20, 2020

OriolAbril deleted the arviz_integration branch April 20, 2020 09:09

michaelosthege mentioned this pull request May 4, 2020

return_inferencedata option for pm.sample #3911

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use inference data in end of sampling report #3883

use inference data in end of sampling report #3883

OriolAbril commented Apr 16, 2020

OriolAbril Apr 16, 2020

michaelosthege Apr 17, 2020

OriolAbril Apr 17, 2020

michaelosthege Apr 17, 2020

OriolAbril Apr 17, 2020

codecov bot commented Apr 16, 2020 •

edited

Loading

michaelosthege commented Apr 17, 2020

OriolAbril commented Apr 17, 2020

AlexAndorra commented Apr 17, 2020

rpgoldman commented Apr 17, 2020

michaelosthege commented Apr 17, 2020

rpgoldman commented Apr 17, 2020

lucianopaz commented Apr 17, 2020

nitishp25 commented Apr 18, 2020 •

edited

Loading

OriolAbril commented Apr 18, 2020

twiecki commented Apr 19, 2020

twiecki commented Apr 20, 2020

use inference data in end of sampling report #3883

use inference data in end of sampling report #3883

Conversation

OriolAbril commented Apr 16, 2020

OriolAbril Apr 16, 2020

Choose a reason for hiding this comment

michaelosthege Apr 17, 2020

Choose a reason for hiding this comment

OriolAbril Apr 17, 2020

Choose a reason for hiding this comment

michaelosthege Apr 17, 2020

Choose a reason for hiding this comment

OriolAbril Apr 17, 2020

Choose a reason for hiding this comment

codecov bot commented Apr 16, 2020 • edited Loading

Codecov Report

michaelosthege commented Apr 17, 2020

OriolAbril commented Apr 17, 2020

AlexAndorra commented Apr 17, 2020

rpgoldman commented Apr 17, 2020

michaelosthege commented Apr 17, 2020

rpgoldman commented Apr 17, 2020

lucianopaz commented Apr 17, 2020

nitishp25 commented Apr 18, 2020 • edited Loading

OriolAbril commented Apr 18, 2020

twiecki commented Apr 19, 2020

twiecki commented Apr 20, 2020

codecov bot commented Apr 16, 2020 •

edited

Loading

nitishp25 commented Apr 18, 2020 •

edited

Loading