Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function needed to consistently apply Bayes theorem (using Bayes factor and lambda) #166

Closed
samnlindsay opened this issue Jan 13, 2021 · 4 comments

Comments

@samnlindsay
Copy link
Contributor

samnlindsay commented Jan 13, 2021

Background

The same equation crops up in various forms, sometimes without even realising, where we update match probabilities based on specific observations, for example:

In general form, these are all operations that can be expressed as:

posterior = f(prior, B)

where f is the “magic function” (Bayes' theorem) and B is a product of Bayes factors for the observations concerned. This can be for one column (e.g. intuition reports) or many (e.g. match probabilities), describing Bayes factors for full matches only (e.g. correcting for blocking columns to estimate global lambda) or any combination of observed gammas (e.g. match probabilities)

How to add/subtract the influence of blocking on lambda

IMG_20210113_150614

Mapping this function to Bayes

IMG_20210113_163611

Examples

  • Updated match probability for a single column
match_probability = f(lambda, B)

where B is the Bayes factor for that column at the observed gamma level

  • Match probability for multiple columns
# One column at a time
match_probability = f(lambda, b1)
match_probability = f(match_probability, b2)
match_probability = f(match_probability, b3)
...
etc.

# All at once
B = b1 * b2 * b3 ....
match_probability = f(lambda, B)
  • Global lambda <-> blocking lambda
# bayes factor for a match on each blocking variable
B = b1 * b2 * b3     

lambda_blocking = f(lambda_global, B)
lambda_global = f(lambda_blocking, 1/B)

Pedantic side note

The widely used symbol for Bayes factor is K, so when documenting the above, I would prefer we use K to refer to Bayes factors and perhaps reserve B for the argument given to the function above (some kind of "Bayesian update parameter" that defines the new information that turns our prior into our posterior).

@RobinL
Copy link
Member

RobinL commented Mar 23, 2021

The more I think about this the more I think we should use the log bayes factor everywhere internally, which is additive rather than multiplicative, and only convert to probabilities at the last moment when they're needed

@RobinL
Copy link
Member

RobinL commented Apr 11, 2021

Here's a set of formulas plus interactive functions which should provide the basis for this PR:

https://observablehq.com/d/65332d708285fead

@samnlindsay
Copy link
Contributor Author

FYI, I retract my earlier objection to using B rather than K for Bayes Factor. It looks like B is actually quite common too. 🤷

@RobinL
Copy link
Member

RobinL commented Oct 26, 2021

Closed by #201

@RobinL RobinL closed this as completed Oct 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants