Skip to content

Conditional autoregressive priors #547

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

daniel-saunders-phil
Copy link
Contributor

@daniel-saunders-phil daniel-saunders-phil commented May 26, 2023

#417 is got stuck and the original author isn't pursing the PR anymore. However, it would be nice to get this notebook finished. I just don't have the permissions to change #417 directly. So I've made a new PR to, at the very least, test out the preview of read the docs and see if it passes continuous integration checks on github.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@daniel-saunders-phil
Copy link
Contributor Author

What's the right technique to generate and then fix these merge conflicts locally?

@daniel-saunders-phil
Copy link
Contributor Author

daniel-saunders-phil commented May 26, 2023

I've managed to pass all the pre-commit checks locally, so that's progress for this notebook. I ended up finding a half dozen other typos that were causing pre-commit to fail. However, the preview file is rendered terribly plus the branch conflicts. Not sure how to fix those yet.

$ git commit -m "changed aesara.tensor to pytensor.tensor"
jupytext......................................................................................Passed
black-jupyter.................................................................................Passed
nbqa-isort....................................................................................Passed
nbqa-pyupgrade................................................................................Passed
Check cells were executed sequentially........................................................Passed
bibtex-tidy...............................................................(no files to check)Skipped
Check notebooks have watermark (see Jupyter style guide from PyMC docs).......................Passed
Check no internal links are in the docs.......................................................Passed
jupytext......................................................................................Passed
codespell.....................................................................................Passed
[conditional_autoregressive_priors 4d2741d] changed aesara.tensor to pytensor.tensor
 2 files changed, 346 insertions(+), 17 deletions(-)

@daniel-saunders-phil daniel-saunders-phil marked this pull request as ready for review May 31, 2023 20:35
@daniel-saunders-phil daniel-saunders-phil changed the title DRAFT: Conditional autoregressive priors Conditional autoregressive priors May 31, 2023
@daniel-saunders-phil
Copy link
Contributor Author

I think it might be done! @bwengals and I fought through a lot of pre-commit to get here 🫠

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments after going over the notebook. I haven't gone over the contents in depth yet though. Before going over the comments and further review, first note:

  1. I have commented everything on the myst file. Don't edit the myst file directly, apply the changes to the notebook and then run pre-commit.
  2. I am completely lost on what is the relation between this notebook and the existing one. Are they complementary, should this replace the other one, is the old one still relevant and salvageable (even if this one is somewhat complementary)?

# Conditional Autoregressive (CAR) Models for Spatial Data

:::{post} Jul 29, 2022
:tags: spatial
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also an autoregressive tag https://www.pymc.io/projects/examples/en/latest/blog/tag/autoregressive.html which seems relevant for the notebook based on the title. I'd also recommend going over the list of tags (left sidebar of https://www.pymc.io/projects/examples/en/latest/gallery.html) and add other that might be relevant too.

:::{post} Jul 29, 2022
:tags: spatial
:category: beginner, tutorial
:author: Conor Hassan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add yourself here

Comment on lines 11 to 12
substitutions:
extra_dependencies: bambi seaborn
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
substitutions:
extra_dependencies: bambi seaborn
myst:
substitutions:
extra_dependencies: bambi seaborn

I need to update the docs, they recenly updated the metadata keys for this. Ref: https://myst-parser.readthedocs.io/en/latest/syntax/optional.html#substitutions-with-jinja2. It also looks like geopandas and pylibsal and the like should also be added here.

Copy link
Contributor Author

@daniel-saunders-phil daniel-saunders-phil Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything I should do on my end? The ipynb file just has:

:::{include} ../extra_installs.md
:::

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, only the metadata and these two lines are enough. You can see in the preview how this two lines add into the rendered notebook a predefined template with install instructions with pip, conda and from within jupyter: https://pymcio--547.org.readthedocs.build/projects/examples/en/547/case_studies/conditional_autoregressive_priors.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not clear on where I can add the metadata. The substitutions lines only appears in the myst file (as far as I can tell) and I thought that's entirely generated from the .ipynb.

Also, if it matters we don't need seaborn or bambi for this one. Just geopandas and pylibsal. Both should be installed with conda-forge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The section I linked of the style guide covers how to edit the metadata from jupyter. This (together with cell level metadata which we also use) is one of the main reasons we use myst files in addition to ipynb. The ipynb is needed to store outputs and render the website properly, myst ones are needed for proper reviews to be possible (this might change with the improvements in jupyer diffs github is doing but for now I think they are still needed)

Copy link
Contributor Author

@daniel-saunders-phil daniel-saunders-phil Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still pretty unsure - the .ipynb metadata now reads:

  "substitutions": {
    "extra_dependencies": "geopandas libpysal"
  },

but the document preview is unchanged. What might I be missing? Is there an example of a notebook in the pymc library that has used this myst substitutions trick successfully I can model?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been updated, you might have been a bit too eager and not waited for the rtd check (has status mark below) finished. I do see the new libraries now.

This behaviour is the one that appears in the style guide (and still works) but was recently deprecated. It should be:

  "myst": {
    "substitutions": {
      "extra_dependencies": "geopandas libpysal"
    }
  },

We need to load in the dataset to access the variables $\{y_i, x_i, E_i\}_{i=1}^N$. But more unique to the use of CAR models, is the creation of the necessary spatial adjacency matrix. For the models that we fit, all neighbours are weighted as $1$, circumventing the need for a weight matrix. The dataset can be accessed via the `pm.get_data` function.

```{code-cell} ipython3
df_scot_cancer = pd.read_csv(pm.get_data("scotland_lips_cancer.csv"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


```{code-cell} ipython3
independent_stacked = az.extract(independent_idata)
spat_df["INDEPENDENT_RES"] = independent_stacked.res.mean(axis=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
spat_df["INDEPENDENT_RES"] = independent_stacked.res.mean(axis=1)
spat_df["INDEPENDENT_RES"] = independent_stacked.res.mean(dim="sample")

or, given I don't see independent_stacked variable being used later, it could even be:

spat_df["INDEPENDENT_RES"] = independent_data["posterior"]["res"].mean(dim=["chain", "draw"])

Comment on lines 252 to 253
fixed_spatial_stacked = az.extract(fixed_spatial_idata)
spat_df["SPATIAL_RES"] = fixed_spatial_stacked.res.mean(axis=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar comment as above, axis=... should not be used with xarray objects

Comment on lines 302 to 304
```{code-cell} ipython3
car_stacked = az.extract(car_idata)
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this cell is not used anywhere, should be removed

Comment on lines 329 to 330
* Adapted from a previous PyMC example notebook, authored by Junpeng Lao {ref}`conditional_autoregressive_model` by Conor Hassan on July, 2022.
* Re-executed by Daniel Saunders in May, 2023
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably say something like:

* Adapted from {ref}`another PyMC example notebook <conditional_autoregressive_model>` by Conor Hassan and Daniel Saunders ([pymc-examples#417](https://github.com/pymc-devs/pymc-examples/pull/417) and [pymc-examples#547](https://github.com/pymc-devs/pymc-examples/pull/547/)).

Authorship of the other notebook should be on the other notebook I think.


```{code-cell} ipython3
%load_ext watermark
%watermark -n -u -v -iv -w -p pytensor,xarray
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
%watermark -n -u -v -iv -w -p pytensor,xarray
%watermark -n -u -v -iv -w -p xarray

pytensor is imported, so adding it here duplicates it on the watermark

@daniel-saunders-phil
Copy link
Contributor Author

  1. I am completely lost on what is the relation between this notebook and the existing one. Are they complementary, should this replace the other one, is the old one still relevant and salvageable (even if this one is somewhat complementary)?

I think this notebook replaces the other one for most users. The old CAR notebook is great for explaining why PyMC's car prior is implemented the way it is. So users who are interested in understanding how to write and modify their own algorithms for similar models autoregressive models would benefit from reading the old notebook (albeit, with up dates to pymc 5). For everyone else, the new notebook is more useful - it explain a bit about why we should care about spatial autocorrelation and is far more concise.

@OriolAbril
Copy link
Member

I think this notebook replaces the other one for most users. The old CAR notebook is great for explaining why PyMC's car prior is implemented the way it is. So users who are interested in understanding how to write and modify their own algorithms for similar models autoregressive models would benefit from reading the old notebook (albeit, with up dates to pymc 5). For everyone else, the new notebook is more useful - it explain a bit about why we should care about spatial autocorrelation and is far more concise.

Thanks, this is great!

Would you be interested in updating the other one too? If so we can continue the discussion once the time comes, otherwise (or in the meantime if it is expected to be a longish wait), what do you think about updating the title of the other notebook to be "About CAR models in PyMC"? I think it is more fitting given your description

@daniel-saunders-phil
Copy link
Contributor Author

daniel-saunders-phil commented Jun 6, 2023

Yeah I'd like to revise the other one at some point. I think I'll get to it after my next task (trying to develop the ICAR) so an interim title change sounds good. Would I update the (conditional_autoregressive_model)= at the top of the file plus the references to the title in my current notebook?

A quick explanation of the other changes to the file: previously the notebook suggested model 3 was unidentifiable and trying to sample from it would typically result in chains with loads of divergences. I wasn't able to reliably reproduce these divergences and can't find evidence of degenerate geometries in the posterior distribution. I think the divergences discussed in the earlier drafts might be due to bad luck. To avoid discussion of divergences, I rewrote the bottom few cells to provide a different motivation for the ICAR.

@OriolAbril
Copy link
Member

Would I update the (conditional_autoregressive_model)= at the top of the file plus the references to the title in my current notebook?

The main thing that needs to be updated is the title, the # Conditional Autoregressive (CAR) model. The ()= part is a target/anchor. It is something to be used for cross-referencing, and doesn't appear anywhere on the rendered website. As it is brand new and we can be sure no notebooks are using it, it can be updated too. Otherwise, it'd be best to add two targets, so we don't need to search for references to that in all notebooks and modify them (with potential git conflicts...). We already do that in https://github.com/pymc-devs/pymc-examples/blob/main/examples/case_studies/multilevel_modeling.ipynb for example

@daniel-saunders-phil
Copy link
Contributor Author

Hi @OriolAbril I think it might be ready to go again - I adjusted the metadata and the title of the old CAR notebook as you suggested. My read the docs preview doesn't seem to be up to date but the committed files are all where I think they should be.

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a comment about the categories for the "about" notebook, other than that I think it is good to go, thanks!


:::{post} Aug 14, 2020
:tags: spatial, autoregressive, count data
:category: advanced, reference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the notebook is more explanation type than reference.

@OriolAbril OriolAbril merged commit 35cd42f into pymc-devs:main Jun 15, 2023
@OriolAbril
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants