Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new intro tutorial #2991

Merged
merged 23 commits into from
Dec 14, 2021
Merged

Add a new intro tutorial #2991

merged 23 commits into from
Dec 14, 2021

Conversation

eb8680
Copy link
Member

@eb8680 eb8680 commented Dec 12, 2021

open in nbviewer

Derived from Bayesian regression tutorials

Tasks:

  • Draft tutorial
  • Mention pyro.condition in sample section
  • Add better introduction and conclusion/next steps
  • Support CI smoke testing
  • Table of contents
  • Fix seaborn warnings
  • Add images separately if necessary
  • Fix model rendering
  • Check all links in notebook and add any missing ones e.g. to SVI tutorials
  • Update examples homepage sphinx
  • Deprecate existing intro tutorials
  • Update links to intro tutorial(s) in other documentation
  • Rename file to intro_long.ipynb
  • Fix Sphinx rendering of links around code markdown
  • Check and fix HTML rendering locally

@martinjankowiak
Copy link
Collaborator

cool looks good! will take a closer look tomorrow but two comments from a quick perusal:

  • would be nice to have a little more intro at the top. what is the purpose of the tutorial etc
  • given the length probably nice to add a table of contents linking to various sections

@martinjankowiak
Copy link
Collaborator

martinjankowiak commented Dec 13, 2021

more detailed comments/questions/etc:

  • is there any material from the previous intro tutorials worth keeping (for elsewhere)? importance sampling? geometric distribution? pyro.condition? something else? obviously no need to block this PR but something we should think about...
  • smoke_test = ('CI' in os.environ) etc
  • typo 'observed variabels'
  • 'we can efficiently compute the pointwise log probability density' => add log to eqn
  • put in a plated graphic after you introduce "plate notation" in bold?
  • is it best to avoid lambdas in params in an intro tutorial? or at least explain why you're using them?
  • "would produce calibrated" well only if the model is well-specified...
  • some caveats may be more confusing than helpful for an intro tutorial? "Inference algorithms in Pyro change its behavior,"
  • what does this mean? "when any random variable is observed, the meaning of every other pyro.sample statement in a model changes"
  • "it stores its argument" -> "it stores init"
  • nit: "to perform them in parallel in a single" them = ?
  • "random scale parameter σy for the output" => "that controls the observation noise" or something?
  • you're using both \sigma and \sigma_y
  • the part around "A theoretically appealing choice..." flows a bit strangely. maybe proceed with "If we're going to somehow convert Bayesian inference into an optimization problem, we need an objective function or loss function." or some other glue. on the same note the first KL figure is just sort of plopped in there without discussion. tie into discussion? walk through figure?
  • discuss the rendered model for custom_guide i.e. independence of nodes
  • "The guide alone does not fully specify an inference algorithm, however. " connect back to graphic? something like "We've defined an initial distribution and a variational family of distributions but now we have to move the initial distribution towards the posterior by solving an optimization problem"
  • i would argue we should avoid too large learning rates (e.g. 0.03) in intros even if it's fine for a particular problem because it may suggest bad habits to users
  • interpret the locs and scales printout for the reader. loc is mean of normal etc
  • discuss use of plate for sampling from autoguide?
  • "this is done by first generating samples for the unobserved sites in " maybe avoid "sites" or explain what you mean by site? this appears to be your only usage
  • comment on log_gdp=None in svi_samples?
  • section heading: redoing => revisiting?
  • does the rendered model for the mvn require more explanation? i.e. discuss the auxiliary latent that's introduced by the autoguide?
  • "via the posterior_samples keyword argument instead of passing the guide as above" => "as in the previous example"?
  • close by pointing to a few "next step" tutorials?

@eb8680
Copy link
Member Author

eb8680 commented Dec 13, 2021

@martinjankowiak thanks, addressed most of your comments and added the rest to the PR todos.

@eb8680
Copy link
Member Author

eb8680 commented Dec 13, 2021

This is ready for more content review, the other tasks are mostly boilerplate.

@martinjankowiak
Copy link
Collaborator

looks great!!! a bit long, but such is the price of thoroughness...

  • what's the new file name going to be? intro?
  • remove pip install before merging?
  • nit: "Probabilistic models in Pyro are given" => "specified as", "encoded as", ...
  • nit: "strongly suggested" => suggests
  • "when any pyro.sample statement is observed, the meaning of every other pyro.sample statement in a model changes" i still think this is confusing. as i see it it's not so much that that every other statement in the model changes but rather that queries on the model (e.g. computing the posterior) change
  • nit: "that does not depend on q" -> q_phi
  • nit: "... return a dictionary of values for each pyro.sample site they contain...." maybe add "(which are themselves functions)" or the like (to make it clear you're talking about a vanilla python return and not some mysterious pyro internal thing)
  • "...which repeats and vectorizes the sampling operations..." i guess the caveat as always is that that only works if the model is vectorized correctly. guess no need to mention here...
  • nit: add jax link
  • supernit: remove the dead cell at the end : )

@fritzo fritzo mentioned this pull request Dec 14, 2021
@eb8680
Copy link
Member Author

eb8680 commented Dec 14, 2021

what's the new file name going to be? intro?

How about intro_long?

@eb8680
Copy link
Member Author

eb8680 commented Dec 14, 2021

Ok, I think I've finished everything and this should be OK to merge pending final reviews. I'd prefer to merge an imperfect version and release to update the website and docs rather than iterate this PR too much more so that the old intro tutorials are moved out of sight of new users.

@eb8680
Copy link
Member Author

eb8680 commented Dec 14, 2021

"when any pyro.sample statement is observed, the meaning of every other pyro.sample statement in a model changes" i still think this is confusing. as i see it it's not so much that that every other statement in the model changes but rather that queries on the model (e.g. computing the posterior) change

This is the one thing that I don't have an obvious idea for improving, suggestions and/or direct edits are welcome.

:name: deprecated

intro_part_i
intro_part_ii
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept these and stuck them under a "Deprecated" header because there are links on the forum to the old intro tutorials that I don't want to break (yet).

@martinjankowiak
Copy link
Collaborator

@eb8680 looks great. i'd suggest this last change but we can merge as is if you prefer:

"the meaning of every other sample statement in a model changes following Bayes' rule" bayes rule is good. how about "...the cumulative effect of all the sample statements in a model changes following Bayes' rule"

@fritzo
Copy link
Member

fritzo commented Dec 14, 2021

Looks great, this is a big improvement!

A note on performance: this optimization process is surprisingly slow

Whoa, no need to recommend switching to JAX, we just need to tune the optimizer and use multiple particles. This is a tiny model and inference should easily run in 1000 steps and <30sec.

nits:

  • pyro.enable_validation(True) is no longer necessary, I recommend removing

@fritzo
Copy link
Member

fritzo commented Dec 14, 2021

Can I tune the optimizers and add %%time to the inference cells or sth? I'd really like to give users a good first impression.

@eb8680
Copy link
Member Author

eb8680 commented Dec 14, 2021

@fritzo sure, although I think it's still going to be pretty slow. I had also turned down the learning rates at @martinjankowiak's suggestion but you could turn them up again.

@martinjankowiak
Copy link
Collaborator

the issue is that users will copy paste code, change models, and then generate lots of forum questions that wonder why their elbos don't converge using a giant LR. i'd recommend not going above lr = 0.005 to discourage high LRs as a default choice

@eb8680
Copy link
Member Author

eb8680 commented Dec 14, 2021

@martinjankowiak could you make html the tutorials locally and check the rendered HTML for glaring errors? I did this and I think I fixed everything but a second pair of eyes couldn't hurt.

Copy link
Member

@fehiepsi fehiepsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me!! Some minor nits:

  • Add plt.show() at the end of cell 17 to remove the text <matplotlib.legend.Legend at 0x7f93ab767090>
  • The first formula in section "Example: revisiting Bayesian regression with a full-rank guide" is not split into two lines
  • Probably add links to docs for each reference classes (rather than the first one that appears) just in case some users jump to a random section and still can click to go the the docs.

@fritzo
Copy link
Member

fritzo commented Dec 14, 2021

It's really important to give users a good first impression. Inference easily converges in 5 seconds with a fast learning rate. I see lower loss using Adam({"lr": 0.02}) and 1000 learning steps, in under 5 seconds.

@eb8680
Copy link
Member Author

eb8680 commented Dec 14, 2021

@fritzo I am inclined to agree given how slow 20000 steps is, how about adding @martinjankowiak's caveat about lowering learning rates in other models or a link to SVI part 4? We can always change it to a safer value later if it becomes an issue.

@fritzo
Copy link
Member

fritzo commented Dec 14, 2021

If you're worried about too-high learning rates, then the right solution is to add an example with a too-high learning rate and say "oops the learning rate was too high and loss didn't decrease, see this loss plot. let's try decreasing learning rate".

@eb8680
Copy link
Member Author

eb8680 commented Dec 14, 2021

Probably add links to docs for each reference classes (rather than the first one that appears) just in case some users jump to a random section and still can click to go the the docs.

Agreed this would be an improvement. I'll do this in a followup PR that doesn't block the release because there are quite a lot of references and I need to figure out how to get Sphinx to render links whose names are formatted as code.

@fritzo
Copy link
Member

fritzo commented Dec 14, 2021

Can we please remove the "Use JAX instead" statement before merging though? This is a silly thing to say in a first tutorial with mis-used optimization.

@eb8680
Copy link
Member Author

eb8680 commented Dec 14, 2021

Can we please remove the "Use JAX instead" statement before merging though?

Sure, do you want to do this along with your fixes or should I?

@fritzo
Copy link
Member

fritzo commented Dec 14, 2021

Sure, I can make some minor edits and push

@fritzo
Copy link
Member

fritzo commented Dec 14, 2021

@eb8680 I added a comment about convergence, tuned the optimizers, and encouraged users to look at the loss curve over time. However I haven't been able to regenerate plots due to a missing seaborn.histplot, even after pip install -U seaborn. not sure what's wrong. Could you regenerate plots?

@eb8680
Copy link
Member Author

eb8680 commented Dec 14, 2021

However I haven't been able to regenerate plots due to a missing seaborn.histplot, even after pip install -U seaborn. not sure what's wrong.

Hmm this is concerning. FYI I have seaborn==0.11.2 and matplotlib==3.5.0 locally.

Could you regenerate plots?

Will do

@martinjankowiak
Copy link
Collaborator

can address later but typo: espcially

@eb8680
Copy link
Member Author

eb8680 commented Dec 14, 2021

I'll address seaborn compatibility issues, documentation linking and other typos in non-blocking followup PRs. Should be good to merge unless there are any obvious rendering issues.

@martinjankowiak martinjankowiak merged commit 5a1d74e into dev Dec 14, 2021
@eb8680 eb8680 deleted the new-intro-tutorial branch December 14, 2021 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants