Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update robust glm notebook #3908

Merged
merged 14 commits into from May 4, 2020
Merged

Update robust glm notebook #3908

merged 14 commits into from May 4, 2020

Conversation

jonsedar
Copy link
Contributor

@jonsedar jonsedar commented May 2, 2020

The first purpose of this PR is to update the GLM robust regression notebook already in the examples here: https://docs.pymc.io/notebooks/GLM-robust-with-outlier-detection.html

Those updates are everything from v2.1 onwards:

Version history:

version date author changes
1.0 2015-12-21 jonsedar Create and publish
2.0 2018-07-24 twiecki Restate outlier model using pm.Normal.dist().logp() and pm.Potential()
2.1 2019-11-16 jonsedar Restate nu in StudentT model to be more efficient, drop explicit use of theano shared vars, generally improve plotting / explanations / layout
2.2 2020-05-21 jonsedar Minor tidyup for plots and warnings and rerun with pymc3.8

The second purpose of this PR is to clarify the docstring within sampling.py, specifically the kwargs for step_kwargs when you have multiple steppers. I found the need for better clarity in the docstring during my rework of the notebook, so I hope it's valid to include a change in this single PR. I think this is also a fix for #3197

…to pass kwargs to the steppers

I believe this fixes #3197

I also noted this need for more clarity in my updated notebook in this PR

`pymc3/docs/source/notebooks/GLM-robust-with-outlier-detection.ipynb`
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.

@codecov
Copy link

codecov bot commented May 3, 2020

Codecov Report

Merging #3908 into master will increase coverage by 0.04%.
The diff coverage is 93.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3908      +/-   ##
==========================================
+ Coverage   83.41%   83.45%   +0.04%     
==========================================
  Files         103      103              
  Lines       14190    14178      -12     
==========================================
- Hits        11836    11832       -4     
+ Misses       2354     2346       -8     
Impacted Files Coverage Δ
pymc3/examples/samplers_mvnormal.py 0.00% <0.00%> (ø)
pymc3/sampling.py 86.19% <ø> (+0.71%) ⬆️
pymc3/stats/__init__.py 92.00% <ø> (-1.94%) ⬇️
pymc3/backends/ndarray.py 92.63% <100.00%> (+0.15%) ⬆️
pymc3/backends/sqlite.py 93.51% <100.00%> (+0.10%) ⬆️
pymc3/backends/text.py 97.16% <100.00%> (+0.11%) ⬆️
pymc3/backends/tracetab.py 100.00% <100.00%> (ø)
pymc3/tests/sampler_fixtures.py 96.74% <100.00%> (ø)

michaelosthege and others added 2 commits May 3, 2020 11:25
* remove file which is not used

* remove deprecated code

* repair tests and notebooks that used deprecated API

* mention #3906

Co-authored-by: Michael Osthege <zufallsprinzip@hotmail.de>
* add deprecation warnings for old backends

* mention backend deprecation #3902

* fix typo

Co-authored-by: Colin <ColCarroll@users.noreply.github.com>

Co-authored-by: Michael Osthege <zufallsprinzip@hotmail.de>
Co-authored-by: Colin <ColCarroll@users.noreply.github.com>
@twiecki
Copy link
Member

twiecki commented May 3, 2020

Wow, this is an amazing improvement on an already amazing post.

The one thing I want to make sure is that the headings are correct -- are section headings starting with ## rather than # which is reserved only for the title? Otherwise this creates problems with TOCs.

pymc3/sampling.py Outdated Show resolved Hide resolved
@AlexAndorra AlexAndorra self-assigned this May 3, 2020
notebook: dropped all headings one level lower to comply with TOC logic, and very minor language edits

sampling.py: clarifired language around single vs compoundstep
@jonsedar
Copy link
Contributor Author

jonsedar commented May 3, 2020

Thanks for the feedback guys - I've tidied the notebook and clarified the docstring language in my latest commit

@AlexAndorra
Copy link
Contributor

Thanks Jon! I'm reviewing the NB right now -- will let you know when I'm done

@AlexAndorra AlexAndorra self-requested a review May 3, 2020 14:15
@review-notebook-app
Copy link

review-notebook-app bot commented May 3, 2020

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-03T14:30:41Z
----------------------------------------------------------------

Can we skip displaying the yml file and substitute the watermark cell below?


jonsedar commented on 2020-05-03T17:47:09Z
----------------------------------------------------------------

Yep - removed. The important bits (python 3.6, pymc3 3.8) are already noted

@review-notebook-app
Copy link

review-notebook-app bot commented May 3, 2020

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-03T14:30:42Z
----------------------------------------------------------------

You don't need %matplotlib inline in latest jupyter version -- MPL plots will display inline by default


jonsedar commented on 2020-05-03T17:48:32Z
----------------------------------------------------------------

probably worth keeping incase people run this in other IDEs (VSCode, PyCharm, Syder etc) I'm not sure how those work

@review-notebook-app
Copy link

review-notebook-app bot commented May 3, 2020

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-03T14:30:43Z
----------------------------------------------------------------

Can you tidy up the imports as follows:

  • absolute imports of internal python packages
  • absolute imports of outside libraries
  • relative imports

In each layer, sort alphabetically.

Here that would be:

import warnings

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc3 as pm
import seaborn as sns

from matplotlib.lines import Line2D
from scipy import stats

Also, you can use az.style.use('arviz-darkgrid') instead of sns.set(style='darkgrid', palette='muted', context='notebook') , and preferably set those plotting defaults in another cell below -- this is because of how MPL sets the defaults.



jonsedar commented on 2020-05-03T17:51:06Z
----------------------------------------------------------------

Sure, done.

@review-notebook-app
Copy link

review-notebook-app bot commented May 3, 2020

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-03T14:30:44Z
----------------------------------------------------------------

This looks like a useless cell


jonsedar commented on 2020-05-03T17:52:46Z
----------------------------------------------------------------

Yeah, that can go - I tend to keep this around in my WIP templates as a location to abstract away long functions before they go to a source file

@review-notebook-app
Copy link

review-notebook-app bot commented May 3, 2020

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-03T14:30:45Z
----------------------------------------------------------------

Nice plot!


jonsedar commented on 2020-05-03T17:54:36Z
----------------------------------------------------------------

ta :) though sometimes I really miss ggplot text_geom

@review-notebook-app
Copy link

review-notebook-app bot commented May 3, 2020

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-03T14:30:46Z
----------------------------------------------------------------

Typo: "... the same range and being more directly comparable..."


jonsedar commented on 2020-05-03T17:56:03Z
----------------------------------------------------------------

good catch

@review-notebook-app
Copy link

review-notebook-app bot commented May 3, 2020

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-03T14:30:46Z
----------------------------------------------------------------

  • This warning should be solved in the latest ArviZ
  • combined and compact are False by default IIRC

jonsedar commented on 2020-05-03T17:57:17Z
----------------------------------------------------------------

Cool, will make a mental note to update this once that's out :)

I kept the defaults in since I think it can be useful to see them

@review-notebook-app
Copy link

review-notebook-app bot commented May 3, 2020

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-03T14:30:47Z
----------------------------------------------------------------

Very nice!

For future reference, you can also use az.plot_joint to get this type of plots "out of the box". This makes use of ArviZ InferenceData functionalities, and avoids pm.trace_to_dataframe , which will be deprecated soon.


jonsedar commented on 2020-05-03T18:02:04Z
----------------------------------------------------------------

Good to know - I'll definitely look out for that in future then: I find joint dists are massively helpful

@review-notebook-app
Copy link

review-notebook-app bot commented May 3, 2020

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-03T14:30:48Z
----------------------------------------------------------------

Same comments as other plot_trace


@review-notebook-app
Copy link

review-notebook-app bot commented May 3, 2020

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-03T14:30:49Z
----------------------------------------------------------------

Same comments as for other joint plot


jonsedar commented on 2020-05-03T18:02:33Z
----------------------------------------------------------------

Gotcha :)

@review-notebook-app
Copy link

review-notebook-app bot commented May 3, 2020

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-03T14:30:50Z
----------------------------------------------------------------

Here also you can use az.plot_joint. It would shorten your code. Something like that should work:

idata_i = az.from_pymc3(trace_i) # do that once, so that conversion to InferenceData happens only once; then use in every ArviZ function
idata_d = az.from_pymc3(trace_d)

axes = az.plot_joint(idata_i, var_names=["mud", "muc"], joint_kwargs={"alpha": 0.1, "color": "red"}, marginal_kwargs={"color": "r"})
az.plot_joint(idata_d, var_names=["mud", "muc"], joint_kwargs={"alpha": 0.1}, ax=axes)
axes[0].set_xlim((0, 6))
axes[0].set_ylim((-3, 2))
axes[0].set_xlabel(r"$\mu_d$")
axes[0].set_ylabel(r"$\mu_c$");



jonsedar commented on 2020-05-03T18:11:11Z
----------------------------------------------------------------

Thanks, I've made a note to come back update arvin throughout when available https://github.com/jonsedar/pymc3_examples/issues/15

@AlexAndorra
Copy link
Contributor

Thanks Jon, this looks really nice and is a clear improvement over the old NB! I posted the first part of my review of your NB above, and I'll review the second part later this afternoon 😉

@jonsedar
Copy link
Contributor Author

jonsedar commented May 3, 2020

Thanks Alex, I'll wait for you to complete the review before I make any changes that you've suggested.

Cheers,

@AlexAndorra
Copy link
Contributor

Yeah that'd be best if you can. I'm almost finished.

@review-notebook-app
Copy link

review-notebook-app bot commented May 3, 2020

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-03T15:25:11Z
----------------------------------------------------------------

Typo: "A Bernouilli distribution is used..."


jonsedar commented on 2020-05-03T18:12:09Z
----------------------------------------------------------------

good catch

@review-notebook-app
Copy link

review-notebook-app bot commented May 3, 2020

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-03T15:25:12Z
----------------------------------------------------------------

Maybe you can quickly explain how you chose the testvals ?


jonsedar commented on 2020-05-03T18:35:14Z
----------------------------------------------------------------

Fair point, added a note:

testval for is_outlier initialised in order to create class asymmetry

The other testvals were set also in order to create asymmetry, but I've found they don't need it. Have reverted to mean values for b0, b1, y_est_out, sigma_y_out

Copy link
Contributor Author

jonsedar commented May 3, 2020

righto - I've logged a note to update these at some point :) https://github.com/jonsedar/pymc3_examples/issues/15


View entire conversation on ReviewNB

Copy link
Contributor Author

jonsedar commented May 3, 2020

Added to the list :) https://github.com/jonsedar/pymc3_examples/issues/15


View entire conversation on ReviewNB

Copy link
Contributor Author

jonsedar commented May 3, 2020

ah sweet typos... good catch, thanks!


View entire conversation on ReviewNB

…to pass kwargs to the steppers

I believe this fixes #3197

I also noted this need for more clarity in my updated notebook in this PR

`pymc3/docs/source/notebooks/GLM-robust-with-outlier-detection.ipynb`
notebook: dropped all headings one level lower to comply with TOC logic, and very minor language edits

sampling.py: clarifired language around single vs compoundstep
upgrade to arviz=0.7
set prior params to slightly simpler (more justifiable) values, and testvals to simplier defaults
explanatory clarifications
formatting, typos,
@jonsedar
Copy link
Contributor Author

jonsedar commented May 3, 2020

okie dokie, I've addressed all @AlexAndorra's points - great review thank you - and I think this is ready to go

One note: I did the olde:

❯ git fetch upstream
❯ git rebase upstream/master

prior to my latest commit in this PR and see quite a few files changed in the meantime , must be a weekend with lots of folks working :)

I don't see any clashes, but do let me know if you'd prefer a clean PR

Cheers, Jon

@fonnesbeck
Copy link
Member

Hey, Jon. Looks great. Why is section 4.2.3 a markdown cell and not code?

@AlexAndorra
Copy link
Contributor

AlexAndorra commented May 3, 2020 via email

@jonsedar
Copy link
Contributor Author

jonsedar commented May 3, 2020

Hey, Jon. Looks great. Why is section 4.2.3 a markdown cell and not code?

Quite right - I'm an idiot! Will update now

@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

AlexAndorra commented on 2020-05-04T08:34:24Z
----------------------------------------------------------------

Missed that yesterday, but just for future reference, here I think you could do this in one line with az.plot_forest ;)


Copy link
Contributor

@AlexAndorra AlexAndorra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates Jon, this all looks good now! Just one last thing (sorry, forgot to tell you about it yesterday): could you please Black-format the NB with black_nbconvert?
That's what we use on the resources repo to standardize NBs: pip install black_nbconvert, then black_nbconvert /path/to/a/notebook.ipynb.
After that I'll merge 🍾

A lot of files have indeed been picked up by rebasing your branch. I don't know how you guys usually do, but I think it's no prb, as there is no conflicts and I'll squash & merge anyway.

@jonsedar
Copy link
Contributor Author

jonsedar commented May 4, 2020

Hi Alex,

Good call re: https://github.com/dfm/black_nbconvert, that's a new one to me, ands no surprise to see it's by the ever-helpful & prolific DanFM!

I've just run that reformatting and a re-run, all seems well, so will commit now

@jonsedar
Copy link
Contributor Author

jonsedar commented May 4, 2020

Interesting test failure, if this requires underlying fixes to pymc3 let me know and perhaps for simplicity I can just make a new clean PR

Copy link
Contributor

@AlexAndorra AlexAndorra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot Jon, this is all good now 👌
I'll merge as soon as tests pass -- this error is super weird BTW. It looks like pip couldn't install numpy. Do you have an idea how to fix that?

@jonsedar
Copy link
Contributor Author

jonsedar commented May 4, 2020

No idea - it could just have been a network hiccup at the time... For good measure I'll recommit and trigger again!

@jonsedar
Copy link
Contributor Author

jonsedar commented May 4, 2020

It was a ghost in the machine :D

@AlexAndorra AlexAndorra merged commit 727b88a into pymc-devs:master May 4, 2020
@AlexAndorra
Copy link
Contributor

Apparently 😅
Thanks again for this great update Jon!

@jonsedar
Copy link
Contributor Author

jonsedar commented May 4, 2020

Thanks for all your help and reviews! Will try to pick up some bugs next...

BTW is there a protocol for deleting this branch on origin?

@jonsedar jonsedar deleted the update-robust-glm-notebook branch May 4, 2020 14:26
@AlexAndorra
Copy link
Contributor

Good question -- I'm not aware of any such protocol, but that'd be good practice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants