regression_analysis_101

Estimating the parameters of a model by optimising the fit of the model to empirical data is a staple of computational data science.

This repository contains a set of Jupyter notebooks that demonstrate how to implement a range of regression methods and a Bayesian MCMC approach for fitting models and estimating model parameters. These are demonstrated for a variety of geoscience type applications, e.g. fitting an isochron to isotope data to estimate a geologcal age, fitting observed Bouguer gravity anomaly data from the Isle of Mull in Scotland using simple gravity model for a buried spherical object, and estimating parameters for a simple continental geotherm model based on PT estimates derived from mantle xenolith geochemistry.

Various other routine coding tasks (e.g. reading and writing data files) and plotting techniques (using Matplotlib routines) are also illustrated.

A minimal example-fitting a straight line to set of x and y values

A minimal example is fitting a simple linear model, i.e. a straight line, $y=mx+c$ , to some set of measurements of y (commonly called the dependent variable or response variable) for a given range of x values (commonly called the predicter variable or independent variable), to estimate the slope, m, and intercept c, of the line.

A routine method for achieving this is Ordinary Least Squares regression (OLS). OLS aims to minimise the difference between the observed values of y and the values predicted by the model. It achieves this by finding the model that has the smallest value of the sum of the squared differences, these differences are usually called the residuals. These are illustrated below by the vertical red lines between the green observed values and the blue line representing the model predictions. The routine in the scipy package that implements this is scipy.stats.linregress.

Taking care of the errors-how to handle uncertainty in measured data

OLS is quick and straightforward, but does not take account of the magnitude of measurement uncertainties on x or y values. Orthogonal Distance Regression (ODR) enables measurement errors in both variables to be accounted for, and is particularly appropropriate if the measured data includes outliers with large errors.

The Bayesian approach using a Markov Chain Monte Carlo sampling strategy

An alternative to using regression techniques to estimating model parameters is to use a Bayesian approach and a Markov Chain Monte Carlo sampling strategy.

An advantage of the Bayesian MCMC technique is that you can treat both the model parameters and the observed data as uncertain, and estimate their likely values (credible intervals) and the posterior probability distributions. This can be implemented in python using the pyMC3 package. See the pyMC3 docs page for details.

Name		Name	Last commit message	Last commit date
Latest commit History 109 Commits
data		data
images		images
notebooks		notebooks
python		python
.DS_Store		.DS_Store
README.md		README.md
_config.yml		_config.yml
boundary_condition_test.ipynb		boundary_condition_test.ipynb
environment.yml		environment.yml
postBulid		postBulid

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

images

images

notebooks

notebooks

python

python

.DS_Store

.DS_Store

README.md

README.md

_config.yml

_config.yml

boundary_condition_test.ipynb

boundary_condition_test.ipynb

environment.yml

environment.yml

postBulid

postBulid

Repository files navigation

regression_analysis_101

A minimal example-fitting a straight line to set of x and y values

Taking care of the errors-how to handle uncertainty in measured data

The Bayesian approach using a Markov Chain Monte Carlo sampling strategy

About

Releases

Packages

Languages

skerryvore/regression_analysis_101

Folders and files

Latest commit

History

Repository files navigation

regression_analysis_101

A minimal example-fitting a straight line to set of x and y values

Taking care of the errors-how to handle uncertainty in measured data

The Bayesian approach using a Markov Chain Monte Carlo sampling strategy

About

Topics

Resources

Stars

Watchers

Forks

Languages