-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add minimal pymc example #7281
Closed
Closed
Add minimal pymc example #7281
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -33,6 +33,140 @@ Features | |
* Simple extensibility | ||
- Transparent support for missing value imputation | ||
|
||
|
||
Linear Regression Example | ||
========================== | ||
|
||
|
||
Plant growth can be influenced by multiple factors, and understanding these relationships is crucial for optimizing agricultural practices. | ||
|
||
Imagine we conduct an experiment to predict the growth of a plant based on different environmental variables. | ||
|
||
.. code-block:: python | ||
|
||
import pymc as pm | ||
|
||
# Taking draws from a normal distribution | ||
seed = 42 | ||
x_dist = pm.Normal.dist(shape=(100, 3)) | ||
x_data = pm.draw(x_dist, random_seed=seed) | ||
|
||
# Independent Variables: | ||
# Sunlight Hours: Number of hours the plant is exposed to sunlight daily. | ||
# Water Amount: Daily water amount given to the plant (in milliliters). | ||
# Soil Nitrogen Content: Percentage of nitrogen content in the soil. | ||
|
||
|
||
# Dependent Variable: | ||
# Plant Growth (y): Measured as the increase in plant height (in centimeters) over a certain period. | ||
|
||
|
||
# Define coordinate values for all dimensions of the data | ||
coords={ | ||
"trial": range(100), | ||
"features": ["sunlight hours", "water amount", "soil nitrogen"], | ||
} | ||
|
||
# Define generative model | ||
with pm.Model(coords=coords) as generative_model: | ||
x = pm.Data("x", x_data, dims=["trial", "features"]) | ||
|
||
# Model parameters | ||
betas = pm.Normal("betas", dims="features") | ||
sigma = pm.HalfNormal("sigma") | ||
|
||
# Linear model | ||
mu = x @ betas | ||
|
||
# Likelihood | ||
# Assuming we measure deviation of each plant from baseline | ||
plant_growth = pm.Normal("plant growth", mu, sigma, dims="trial") | ||
|
||
|
||
# Generating data from model by fixing parameters | ||
fixed_parameters = { | ||
"betas": [5, 20, 2], | ||
"sigma": 0.5, | ||
} | ||
with pm.do(generative_model, fixed_parameters) as synthetic_model: | ||
idata = pm.sample_prior_predictive(random_seed=seed) # Sample from prior predictive distribution. | ||
synthetic_y = idata.prior["plant growth (z-scored)"].sel(draw=0, chain=0) | ||
|
||
|
||
# Infer parameters conditioned on observed data | ||
with pm.observe(generative_model, {"plant growth (z-scored)": synthetic_y}) as inference_model: | ||
idata = pm.sample(random_seed=seed) | ||
|
||
summary = pm.stats.summary(idata, var_names=["betas", "sigma"])) | ||
print(summary) | ||
|
||
|
||
From the summary, we can see that the mean of the inferred parameters are very close to the fixed parameters | ||
|
||
===================== ====== ===== ======== ========= =========== ========= ========== ========== ======= | ||
Params mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat | ||
===================== ====== ===== ======== ========= =========== ========= ========== ========== ======= | ||
betas[sunlight hours] 4.972 0.054 4.866 5.066 0.001 0.001 3003 1257 1 | ||
betas[water amount] 19.963 0.051 19.872 20.062 0.001 0.001 3112 1658 1 | ||
betas[soil nitrogen] 1.994 0.055 1.899 2.107 0.001 0.001 3221 1559 1 | ||
sigma 0.511 0.037 0.438 0.575 0.001 0 2945 1522 1 | ||
===================== ====== ===== ======== ========= =========== ========= ========== ========== ======= | ||
|
||
.. code-block:: python | ||
|
||
# Simulate new data conditioned on inferred parameters | ||
new_x_data = pm.draw( | ||
pm.Normal.dist(shape=(3, 3)), | ||
random_seed=seed, | ||
) | ||
new_coords = coords | {"trial": [0, 1, 2]} | ||
|
||
with inference_model: | ||
pm.set_data({"x": new_x_data}, coords=new_coords) | ||
idata = pm.sample_posterior_predictive( | ||
idata, | ||
predictions=True, | ||
extend_inferencedata=True, | ||
random_seed=seed, | ||
) | ||
|
||
pm.stats.summary(idata.predictions, kind="stats") | ||
|
||
The new data conditioned on inferred parameters would look like: | ||
|
||
========================== ====== ===== ======== ========= | ||
Output mean sd hdi_3% hdi_97% | ||
========================== ====== ===== ======== ========= | ||
plant growth (z-scored)[0] 14.21 0.509 13.232 15.144 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is still the old name ("z-scored"). |
||
plant growth (z-scored)[1] 24.43 0.518 23.347 25.32 | ||
plant growth (z-scored)[2] -6.743 0.515 -7.778 -5.834 | ||
========================== ====== ===== ======== ========= | ||
|
||
.. code-block:: python | ||
|
||
# Simulate new data, under a scenario where the first beta is zero | ||
with pm.do( | ||
inference_model, | ||
{inference_model["betas"]: inference_model["betas"] * [0, 1, 1]}, | ||
) as plant_growth_model: | ||
new_predictions = pm.sample_posterior_predictive( | ||
idata, | ||
predictions=True, | ||
random_seed=seed, | ||
) | ||
|
||
pm.stats.summary(new_predictions, kind="stats") | ||
|
||
The new data, under the above scenario would look like: | ||
|
||
========================== ====== ===== ======== ========= | ||
Output mean sd hdi_3% hdi_97% | ||
========================== ====== ===== ======== ========= | ||
plant growth (z-scored)[0] 14.153 0.509 13.181 15.096 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also needs updated name. |
||
plant growth (z-scored)[1] 23.85 0.517 22.915 24.878 | ||
plant growth (z-scored)[2] -7.302 0.515 -8.315 -6.374 | ||
========================== ====== ===== ======== ========= | ||
|
||
Getting started | ||
=============== | ||
|
||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we show all the summaries outputs? Why only the first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for starters it's TMI and can scare people off. Convergence diagnostics is more advanced than what we want to demo here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I thought you meant more columns, but you meant more rows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean't every time the have
pm.stats.summary
, we should show the output. I already removed the extra convergence columns withkind="stats"
. Right now it's only showing for the first usage