Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update data_container example to pymc 5.6 #559

Merged
merged 22 commits into from
Jul 18, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
2ca63ba
update data_container example to pymc 5.6
jessegrabowski Jul 11, 2023
9ba20e7
run pre-commit hooks
jessegrabowski Jul 11, 2023
4519864
rerun notebook
jessegrabowski Jul 11, 2023
08a46ff
add author information
jessegrabowski Jul 12, 2023
43ee41e
remove link to out-of-date example
jessegrabowski Jul 12, 2023
ba3d2c8
refactor baby weight example
jessegrabowski Jul 12, 2023
d8e5b0d
change title, fix typo
jessegrabowski Jul 13, 2023
3939e7e
add more detail on the ConstantData vs MutableData
jessegrabowski Jul 13, 2023
4370d39
fix typos
jessegrabowski Jul 13, 2023
708b6a3
address comments from review
jessegrabowski Jul 14, 2023
c8f29d0
remove reference to PyMC5
jessegrabowski Jul 14, 2023
070b730
add examples showing data on DAG, add more analysis/discussion to exa…
jessegrabowski Jul 14, 2023
5b9c347
reduce emphasis on observed keyword in introduction
jessegrabowski Jul 15, 2023
74fcfb1
more distinction between endogenous and exogenous (X and y) uses of M…
jessegrabowski Jul 15, 2023
1275eae
remove reference to
jessegrabowski Jul 15, 2023
aaa9bca
convert multiple model example to parameter recovery exercise
jessegrabowski Jul 15, 2023
bf069d8
remove filterwarnings
jessegrabowski Jul 15, 2023
3913f48
fix typos, add example showing exogenous data stored in idata, use RA…
jessegrabowski Jul 15, 2023
5bb72cc
fix broken references
jessegrabowski Jul 15, 2023
a050f09
fix broken references
jessegrabowski Jul 15, 2023
dfb7bf9
fix typo
jessegrabowski Jul 18, 2023
d24fb04
more typo fies
jessegrabowski Jul 18, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 2 additions & 2 deletions examples/howto/data_container.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@
"\n",
"After building the statistical model of your dreams, you're going to need to feed it some data. Data is typically introduced to a PyMC model in one of two ways. Some data is used as an exogenous input, called `X` in linear regression models, where `mu = X @ beta`. Other data are \"observed\" examples of the endogenous outputs of your model, called `y` in regression models, and is used as input to the likelihood function implied by your model. These data, either exogenous or endogenous, can be included in your model as wide variety of datatypes, including numpy `ndarrays`, pandas `Series` and `DataFrame`, and even pytensor `TensorVariables`. \n",
"\n",
"Although you can pass these \"raw\" datatypes to your PyMC model, the best way to introduce data into your model is to use one of two {func}`pymc.data.Data` containers. These containers make it extremely easy to work with data in a PyMC model. They offer a range of benefits, including:\n",
"Although you can pass these \"raw\" datatypes to your PyMC model, the best way to introduce data into your model is to use one of two {func}`pymc.Data` containers. These containers make it extremely easy to work with data in a PyMC model. They offer a range of benefits, including:\n",
"\n",
"1. Visualization of data as a component of your probabilistic graph\n",
"2. Access to labeled dimensions for readability and accessibility\n",
Expand All @@ -80,7 +80,7 @@
"source": [
"## Types of Data Containers\n",
"\n",
" PyMC offers two data containers, depending on your needs: {func}`pymc.data.ConstantData` and {func}`pymc.data.MutableData`. Both will help you visualize how data fits into your model, store the data in an `InfereceData` for reproducibility, and give access to labeled dimenions. As the names suggest, however, only `MutableData` allows you to change your data. When `X` is `MutableData`, this enables out-of-sample inference tasks. When `y` is `MutableData`, it allows you to reuse the same model on multiple datasets to perform parameter recovery studies or sensitivity analysis. These abilities do, however, come with a small performance cost.\n",
" PyMC offers two data containers, depending on your needs: {func}`pymc.ConstantData` and {func}`pymc.MutableData`. Both will help you visualize how data fits into your model, store the data in an `InfereceData` for reproducibility, and give access to labeled dimenions. As the names suggest, however, only `MutableData` allows you to change your data. When `X` is `MutableData`, this enables out-of-sample inference tasks. When `y` is `MutableData`, it allows you to reuse the same model on multiple datasets to perform parameter recovery studies or sensitivity analysis. These abilities do, however, come with a small performance cost.\n",
" \n",
" In past versions of PyMC, the only data container was `pm.Data`. This container is still available for backwards compatability, but the current best practice is to use either `pm.MutableData` or `pm.ConstantData`. "
]
Expand Down
4 changes: 2 additions & 2 deletions examples/howto/data_container.myst.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ az.style.use("arviz-darkgrid")

After building the statistical model of your dreams, you're going to need to feed it some data. Data is typically introduced to a PyMC model in one of two ways. Some data is used as an exogenous input, called `X` in linear regression models, where `mu = X @ beta`. Other data are "observed" examples of the endogenous outputs of your model, called `y` in regression models, and is used as input to the likelihood function implied by your model. These data, either exogenous or endogenous, can be included in your model as wide variety of datatypes, including numpy `ndarrays`, pandas `Series` and `DataFrame`, and even pytensor `TensorVariables`.

Although you can pass these "raw" datatypes to your PyMC model, the best way to introduce data into your model is to use one of two {func}`pymc.data.Data` containers. These containers make it extremely easy to work with data in a PyMC model. They offer a range of benefits, including:
Although you can pass these "raw" datatypes to your PyMC model, the best way to introduce data into your model is to use one of two {func}`pymc.Data` containers. These containers make it extremely easy to work with data in a PyMC model. They offer a range of benefits, including:
jessegrabowski marked this conversation as resolved.
Show resolved Hide resolved

1. Visualization of data as a component of your probabilistic graph
jessegrabowski marked this conversation as resolved.
Show resolved Hide resolved
2. Access to labeled dimensions for readability and accessibility
Expand All @@ -60,7 +60,7 @@ This notebook will illustrate each of these benefits in turn, and show you the b

## Types of Data Containers

PyMC offers two data containers, depending on your needs: {func}`pymc.data.ConstantData` and {func}`pymc.data.MutableData`. Both will help you visualize how data fits into your model, store the data in an `InfereceData` for reproducibility, and give access to labeled dimenions. As the names suggest, however, only `MutableData` allows you to change your data. When `X` is `MutableData`, this enables out-of-sample inference tasks. When `y` is `MutableData`, it allows you to reuse the same model on multiple datasets to perform parameter recovery studies or sensitivity analysis. These abilities do, however, come with a small performance cost.
PyMC offers two data containers, depending on your needs: {func}`pymc.ConstantData` and {func}`pymc.MutableData`. Both will help you visualize how data fits into your model, store the data in an `InfereceData` for reproducibility, and give access to labeled dimenions. As the names suggest, however, only `MutableData` allows you to change your data. When `X` is `MutableData`, this enables out-of-sample inference tasks. When `y` is `MutableData`, it allows you to reuse the same model on multiple datasets to perform parameter recovery studies or sensitivity analysis. These abilities do, however, come with a small performance cost.

In past versions of PyMC, the only data container was `pm.Data`. This container is still available for backwards compatability, but the current best practice is to use either `pm.MutableData` or `pm.ConstantData`.

Expand Down