Update robust glm notebook #3908

jonsedar · 2020-05-02T23:48:41Z

The first purpose of this PR is to update the GLM robust regression notebook already in the examples here: https://docs.pymc.io/notebooks/GLM-robust-with-outlier-detection.html

Those updates are everything from v2.1 onwards:

Version history:

version	date	author	changes
1.0	2015-12-21	jonsedar	Create and publish
2.0	2018-07-24	twiecki	Restate outlier model using `pm.Normal.dist().logp()` and `pm.Potential()`
2.1	2019-11-16	jonsedar	Restate `nu` in StudentT model to be more efficient, drop explicit use of theano shared vars, generally improve plotting / explanations / layout
2.2	2020-05-21	jonsedar	Minor tidyup for plots and warnings and rerun with pymc3.8

The second purpose of this PR is to clarify the docstring within sampling.py, specifically the kwargs for step_kwargs when you have multiple steppers. I found the need for better clarity in the docstring during my rework of the notebook, so I hope it's valid to include a change in this single PR. I think this is also a fix for #3197

…to pass kwargs to the steppers I believe this fixes #3197 I also noted this need for more clarity in my updated notebook in this PR `pymc3/docs/source/notebooks/GLM-robust-with-outlier-detection.ipynb`

review-notebook-app · 2020-05-02T23:48:47Z

Check out this pull request on

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.

codecov · 2020-05-03T00:16:03Z

Codecov Report

Merging #3908 into master will increase coverage by 0.04%.
The diff coverage is 93.75%.

@@            Coverage Diff             @@
##           master    #3908      +/-   ##
==========================================
+ Coverage   83.41%   83.45%   +0.04%     
==========================================
  Files         103      103              
  Lines       14190    14178      -12     
==========================================
- Hits        11836    11832       -4     
+ Misses       2354     2346       -8

Impacted Files	Coverage Δ
pymc3/examples/samplers_mvnormal.py	`0.00% <0.00%> (ø)`
pymc3/sampling.py	`86.19% <ø> (+0.71%)`	⬆️
pymc3/stats/__init__.py	`92.00% <ø> (-1.94%)`	⬇️
pymc3/backends/ndarray.py	`92.63% <100.00%> (+0.15%)`	⬆️
pymc3/backends/sqlite.py	`93.51% <100.00%> (+0.10%)`	⬆️
pymc3/backends/text.py	`97.16% <100.00%> (+0.11%)`	⬆️
pymc3/backends/tracetab.py	`100.00% <100.00%> (ø)`
pymc3/tests/sampler_fixtures.py	`96.74% <100.00%> (ø)`

* remove file which is not used * remove deprecated code * repair tests and notebooks that used deprecated API * mention #3906 Co-authored-by: Michael Osthege <zufallsprinzip@hotmail.de>

* add deprecation warnings for old backends * mention backend deprecation #3902 * fix typo Co-authored-by: Colin <ColCarroll@users.noreply.github.com> Co-authored-by: Michael Osthege <zufallsprinzip@hotmail.de> Co-authored-by: Colin <ColCarroll@users.noreply.github.com>

twiecki · 2020-05-03T09:30:51Z

Wow, this is an amazing improvement on an already amazing post.

The one thing I want to make sure is that the headings are correct -- are section headings starting with ## rather than # which is reserved only for the title? Otherwise this creates problems with TOCs.

pymc3/sampling.py

notebook: dropped all headings one level lower to comply with TOC logic, and very minor language edits sampling.py: clarifired language around single vs compoundstep

jonsedar · 2020-05-03T13:27:40Z

Thanks for the feedback guys - I've tidied the notebook and clarified the docstring language in my latest commit

AlexAndorra · 2020-05-03T14:09:37Z

Thanks Jon! I'm reviewing the NB right now -- will let you know when I'm done

review-notebook-app · 2020-05-03T14:30:42Z

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-03T14:30:41Z
----------------------------------------------------------------

Can we skip displaying the yml file and substitute the watermark cell below?

jonsedar commented on 2020-05-03T17:47:09Z
----------------------------------------------------------------

Yep - removed. The important bits (python 3.6, pymc3 3.8) are already noted

review-notebook-app · 2020-05-03T14:30:43Z

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-03T14:30:42Z
----------------------------------------------------------------

You don't need %matplotlib inline in latest jupyter version -- MPL plots will display inline by default

jonsedar commented on 2020-05-03T17:48:32Z
----------------------------------------------------------------

probably worth keeping incase people run this in other IDEs (VSCode, PyCharm, Syder etc) I'm not sure how those work

review-notebook-app · 2020-05-03T14:30:43Z

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-03T14:30:43Z
----------------------------------------------------------------

Can you tidy up the imports as follows:

absolute imports of internal python packages
absolute imports of outside libraries
relative imports

In each layer, sort alphabetically.

Here that would be:

import warnings
import arviz as az

import matplotlib.pyplot as plt

import numpy as np

import pandas as pd

import pymc3 as pm

import seaborn as sns
from matplotlib.lines import Line2D

from scipy import stats

Also, you can use az.style.use('arviz-darkgrid') instead of sns.set(style='darkgrid', palette='muted', context='notebook') , and preferably set those plotting defaults in another cell below -- this is because of how MPL sets the defaults.

jonsedar commented on 2020-05-03T17:51:06Z
----------------------------------------------------------------

Sure, done.

review-notebook-app · 2020-05-03T14:30:44Z

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-03T14:30:44Z
----------------------------------------------------------------

This looks like a useless cell

jonsedar commented on 2020-05-03T17:52:46Z
----------------------------------------------------------------

Yeah, that can go - I tend to keep this around in my WIP templates as a location to abstract away long functions before they go to a source file

review-notebook-app · 2020-05-03T14:30:45Z

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-03T14:30:45Z
----------------------------------------------------------------

Nice plot!

jonsedar commented on 2020-05-03T17:54:36Z
----------------------------------------------------------------

ta :) though sometimes I really miss ggplot text_geom

review-notebook-app · 2020-05-03T14:30:46Z

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-03T14:30:46Z
----------------------------------------------------------------

Typo: "... the same range and being more directly comparable..."

jonsedar commented on 2020-05-03T17:56:03Z
----------------------------------------------------------------

good catch

review-notebook-app · 2020-05-03T14:30:47Z

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-03T14:30:46Z
----------------------------------------------------------------

This warning should be solved in the latest ArviZ
combined and compact are False by default IIRC

jonsedar commented on 2020-05-03T17:57:17Z
----------------------------------------------------------------

Cool, will make a mental note to update this once that's out :)

I kept the defaults in since I think it can be useful to see them

review-notebook-app · 2020-05-03T14:30:47Z

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-03T14:30:47Z
----------------------------------------------------------------

Very nice!

For future reference, you can also use az.plot_joint to get this type of plots "out of the box". This makes use of ArviZ InferenceData functionalities, and avoids pm.trace_to_dataframe , which will be deprecated soon.

jonsedar commented on 2020-05-03T18:02:04Z
----------------------------------------------------------------

Good to know - I'll definitely look out for that in future then: I find joint dists are massively helpful

review-notebook-app · 2020-05-03T14:30:48Z

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-03T14:30:48Z
----------------------------------------------------------------

Same comments as other plot_trace

review-notebook-app · 2020-05-03T14:30:49Z

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-03T14:30:49Z
----------------------------------------------------------------

Same comments as for other joint plot

jonsedar commented on 2020-05-03T18:02:33Z
----------------------------------------------------------------

Gotcha :)

review-notebook-app · 2020-05-03T14:30:50Z

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-03T14:30:50Z
----------------------------------------------------------------

Here also you can use az.plot_joint. It would shorten your code. Something like that should work:

idata_i = az.from_pymc3(trace_i) # do that once, so that conversion to InferenceData happens only once; then use in every ArviZ function
idata_d = az.from_pymc3(trace_d)
axes = az.plot_joint(idata_i, var_names=["mud", "muc"], joint_kwargs={"alpha": 0.1, "color": "red"}, marginal_kwargs={"color": "r"})

az.plot_joint(idata_d, var_names=["mud", "muc"], joint_kwargs={"alpha": 0.1}, ax=axes)

axes[0].set_xlim((0, 6))

axes[0].set_ylim((-3, 2))

axes[0].set_xlabel(r"$\mu_d$")

axes[0].set_ylabel(r"$\mu_c$");

jonsedar commented on 2020-05-03T18:11:11Z
----------------------------------------------------------------

Thanks, I've made a note to come back update arvin throughout when available https://github.com/jonsedar/pymc3_examples/issues/15

AlexAndorra · 2020-05-03T14:34:18Z

Thanks Jon, this looks really nice and is a clear improvement over the old NB! I posted the first part of my review of your NB above, and I'll review the second part later this afternoon 😉

jonsedar · 2020-05-03T14:52:44Z

Thanks Alex, I'll wait for you to complete the review before I make any changes that you've suggested.

Cheers,

AlexAndorra · 2020-05-03T15:14:17Z

Yeah that'd be best if you can. I'm almost finished.

review-notebook-app · 2020-05-03T15:25:11Z

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-03T15:25:11Z
----------------------------------------------------------------

Typo: "A Bernouilli distribution is used..."

jonsedar commented on 2020-05-03T18:12:09Z
----------------------------------------------------------------

good catch

review-notebook-app · 2020-05-03T15:25:12Z

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-03T15:25:12Z
----------------------------------------------------------------

Maybe you can quickly explain how you chose the testvals ?

jonsedar commented on 2020-05-03T18:35:14Z
----------------------------------------------------------------

Fair point, added a note:

testval for is_outlier initialised in order to create class asymmetry

The other testvals were set also in order to create asymmetry, but I've found they don't need it. Have reverted to mean values for b0, b1, y_est_out, sigma_y_out

jonsedar · 2020-05-03T19:02:38Z

righto - I've logged a note to update these at some point :) https://github.com/jonsedar/pymc3_examples/issues/15

View entire conversation on ReviewNB

jonsedar · 2020-05-03T19:05:27Z

Added to the list :) https://github.com/jonsedar/pymc3_examples/issues/15

View entire conversation on ReviewNB

jonsedar · 2020-05-03T19:06:32Z

ah sweet typos... good catch, thanks!

View entire conversation on ReviewNB

…to pass kwargs to the steppers I believe this fixes #3197 I also noted this need for more clarity in my updated notebook in this PR `pymc3/docs/source/notebooks/GLM-robust-with-outlier-detection.ipynb`

notebook: dropped all headings one level lower to comply with TOC logic, and very minor language edits sampling.py: clarifired language around single vs compoundstep

…dar/pymc3 into update-robust-glm-notebook

upgrade to arviz=0.7 set prior params to slightly simpler (more justifiable) values, and testvals to simplier defaults explanatory clarifications formatting, typos,

jonsedar · 2020-05-03T19:20:05Z

okie dokie, I've addressed all @AlexAndorra's points - great review thank you - and I think this is ready to go

One note: I did the olde:

❯ git fetch upstream
❯ git rebase upstream/master

prior to my latest commit in this PR and see quite a few files changed in the meantime , must be a weekend with lots of folks working :)

I don't see any clashes, but do let me know if you'd prefer a clean PR

Cheers, Jon

…e docstring

fonnesbeck · 2020-05-03T21:10:48Z

Hey, Jon. Looks great. Why is section 4.2.3 a markdown cell and not code?

AlexAndorra · 2020-05-03T21:11:15Z

Thanks for the updates Jon! I'll review that tomorrow Le dim. 3 mai 2020 à 21:20, Jonathan Sedar <notifications@github.com> a écrit :

…

okie dokie, I've addressed all @AlexAndorra <https://github.com/AlexAndorra>'s points - great review thank you - and I think this is ready to go One note: I did the olde: ❯ git fetch upstream ❯ git rebase upstream/master prior to my latest commit in this PR and see quite a few files changed in the meantime , must be a weekend with lots of folks working :) I don't see any clashes, but do let me know if you'd prefer a clean PR Cheers, Jon — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#3908 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHIJMTFA5ISJDKYS4S5MVTDRPW7XFANCNFSM4MX4IOIA> .

jonsedar · 2020-05-03T21:13:24Z

Hey, Jon. Looks great. Why is section 4.2.3 a markdown cell and not code?

Quite right - I'm an idiot! Will update now

review-notebook-app · 2020-05-04T08:34:25Z

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-04T08:34:24Z
----------------------------------------------------------------

Missed that yesterday, but just for future reference, here I think you could do this in one line with az.plot_forest ;)

AlexAndorra

Thanks for the updates Jon, this all looks good now! Just one last thing (sorry, forgot to tell you about it yesterday): could you please Black-format the NB with black_nbconvert?
That's what we use on the resources repo to standardize NBs: pip install black_nbconvert, then black_nbconvert /path/to/a/notebook.ipynb.
After that I'll merge 🍾

A lot of files have indeed been picked up by rebasing your branch. I don't know how you guys usually do, but I think it's no prb, as there is no conflicts and I'll squash & merge anyway.

jonsedar · 2020-05-04T11:57:55Z

Hi Alex,

Good call re: https://github.com/dfm/black_nbconvert, that's a new one to me, ands no surprise to see it's by the ever-helpful & prolific DanFM!

I've just run that reformatting and a re-run, all seems well, so will commit now

jonsedar · 2020-05-04T12:34:59Z

Interesting test failure, if this requires underlying fixes to pymc3 let me know and perhaps for simplicity I can just make a new clean PR

AlexAndorra

Thanks a lot Jon, this is all good now 👌
I'll merge as soon as tests pass -- this error is super weird BTW. It looks like pip couldn't install numpy. Do you have an idea how to fix that?

jonsedar · 2020-05-04T12:48:13Z

No idea - it could just have been a network hiccup at the time... For good measure I'll recommit and trigger again!

jonsedar · 2020-05-04T13:56:55Z

It was a ghost in the machine :D

AlexAndorra · 2020-05-04T13:58:03Z

Apparently 😅
Thanks again for this great update Jon!

jonsedar · 2020-05-04T14:00:12Z

Thanks for all your help and reviews! Will try to pick up some bugs next...

BTW is there a protocol for deleting this branch on origin?

AlexAndorra · 2020-05-04T14:37:18Z

Good question -- I'm not aware of any such protocol, but that'd be good practice

jonsedar added 2 commits May 2, 2020 17:10

updated the Hogg notebook

ae503b0

attempted to clarify the kwargs in sample() docstring describing how …

f8bde12

…to pass kwargs to the steppers I believe this fixes #3197 I also noted this need for more clarity in my updated notebook in this PR `pymc3/docs/source/notebooks/GLM-robust-with-outlier-detection.ipynb`

michaelosthege and others added 2 commits May 3, 2020 11:25

Remove deprecated stuff (#3906)

d59a6e8

* remove file which is not used * remove deprecated code * repair tests and notebooks that used deprecated API * mention #3906 Co-authored-by: Michael Osthege <zufallsprinzip@hotmail.de>

twiecki reviewed May 3, 2020

View reviewed changes

pymc3/sampling.py Outdated Show resolved Hide resolved

AlexAndorra self-assigned this May 3, 2020

minor formatting to notebook and rework of docstring for sample function

5185696

notebook: dropped all headings one level lower to comply with TOC logic, and very minor language edits sampling.py: clarifired language around single vs compoundstep

twiecki approved these changes May 3, 2020

View reviewed changes

AlexAndorra self-requested a review May 3, 2020 14:15

jonsedar added 5 commits May 3, 2020 15:12

updated the Hogg notebook

3b38f8d

attempted to clarify the kwargs in sample() docstring describing how …

f6b8441

…to pass kwargs to the steppers I believe this fixes #3197 I also noted this need for more clarity in my updated notebook in this PR `pymc3/docs/source/notebooks/GLM-robust-with-outlier-detection.ipynb`

minor formatting to notebook and rework of docstring for sample function

e58b4dd

notebook: dropped all headings one level lower to comply with TOC logic, and very minor language edits sampling.py: clarifired language around single vs compoundstep

Merge branch 'update-robust-glm-notebook' of https://github.com/jonse…

01bc0c1

…dar/pymc3 into update-robust-glm-notebook

updates folowing AlexAndorra review:

090b783

upgrade to arviz=0.7 set prior params to slightly simpler (more justifiable) values, and testvals to simplier defaults explanatory clarifications formatting, typos,

removed the note re step_kwargs, since this PR updates the appropriat…

4e62601

…e docstring

a cell had become markdown, silly. reset it to code and rerun

c369a1b

AlexAndorra requested changes May 4, 2020

View reviewed changes

minor code reformatting via black_nbconvert, final check and re-run

a902644

AlexAndorra approved these changes May 4, 2020

View reviewed changes

rerun notebook purely as a lazy but safe way to trigger new CI

9411945

AlexAndorra merged commit 727b88a into pymc-devs:master May 4, 2020

AlexAndorra mentioned this pull request May 4, 2020

return_inferencedata option for pm.sample #3911

Merged

9 tasks

jonsedar deleted the update-robust-glm-notebook branch May 4, 2020 14:26

AlexAndorra mentioned this pull request May 6, 2020

ValueError: Unused step method arguments: {'target_accept'} #3914

Closed

Update robust glm notebook #3908

Update robust glm notebook #3908

Conversation

jonsedar commented May 2, 2020 • edited

review-notebook-app bot commented May 2, 2020

codecov bot commented May 3, 2020 • edited

Codecov Report

twiecki commented May 3, 2020

jonsedar commented May 3, 2020

AlexAndorra commented May 3, 2020

review-notebook-app bot commented May 3, 2020 • edited

review-notebook-app bot commented May 3, 2020 • edited

review-notebook-app bot commented May 3, 2020 • edited

review-notebook-app bot commented May 3, 2020 • edited

review-notebook-app bot commented May 3, 2020 • edited

review-notebook-app bot commented May 3, 2020 • edited

review-notebook-app bot commented May 3, 2020 • edited

review-notebook-app bot commented May 3, 2020 • edited

review-notebook-app bot commented May 3, 2020 • edited

review-notebook-app bot commented May 3, 2020 • edited

review-notebook-app bot commented May 3, 2020 • edited

AlexAndorra commented May 3, 2020

jonsedar commented May 3, 2020 • edited

AlexAndorra commented May 3, 2020

review-notebook-app bot commented May 3, 2020 • edited

review-notebook-app bot commented May 3, 2020 • edited

jonsedar commented May 3, 2020

jonsedar commented May 3, 2020

jonsedar commented May 3, 2020

jonsedar commented May 3, 2020

fonnesbeck commented May 3, 2020

AlexAndorra commented May 3, 2020 via email

jonsedar commented May 3, 2020

review-notebook-app bot commented May 4, 2020

AlexAndorra left a comment

Choose a reason for hiding this comment

jonsedar commented May 4, 2020

jonsedar commented May 4, 2020

AlexAndorra left a comment

Choose a reason for hiding this comment

jonsedar commented May 4, 2020

jonsedar commented May 4, 2020

AlexAndorra commented May 4, 2020

jonsedar commented May 4, 2020

AlexAndorra commented May 4, 2020

jonsedar commented May 2, 2020 •

edited

codecov bot commented May 3, 2020 •

edited

review-notebook-app bot commented May 3, 2020 •

edited

review-notebook-app bot commented May 3, 2020 •

edited

review-notebook-app bot commented May 3, 2020 •

edited

review-notebook-app bot commented May 3, 2020 •

edited

review-notebook-app bot commented May 3, 2020 •

edited

review-notebook-app bot commented May 3, 2020 •

edited

review-notebook-app bot commented May 3, 2020 •

edited

review-notebook-app bot commented May 3, 2020 •

edited

review-notebook-app bot commented May 3, 2020 •

edited

review-notebook-app bot commented May 3, 2020 •

edited

review-notebook-app bot commented May 3, 2020 •

edited

jonsedar commented May 3, 2020 •

edited

review-notebook-app bot commented May 3, 2020 •

edited

review-notebook-app bot commented May 3, 2020 •

edited