# Chatper 13. Bayesian Estimation in Hierarchical Models

* 싸이그래머 / 인지모델링 : 파트2 - 수리심리학 [1]
* 김무성

# Contents

* The Ideas of Hierarchical Bayesian Estimation
* Example: Shrinkage and Multiple Comparisons of Baseball Batting Abilities
* Example: Clinical Individual Differences in Attention Allocation
* Model Comparison as a Case of Estimation in Hierarchical Models
* Conclusion

# The Ideas of Hierarchical Bayesian Estimation

* Hierarchical Models Have Parameters with Hierarchical Meaning
* Advantages of the Bayesian Approach
* Some Mathematic and Mechanics of Bayesian Estimation

#### 참고
* [2] Bayesian Estimation in MARK - http://warnercnr.colostate.edu/~gwhite/fw663/Bayesian%20Overview.ppt 

Bayesian estimation provides an entire distribution of credibility over the space of parameter values, not merely a single “best” value.
* The distribution precisely captures our uncertainty about the parameter estimate. 
* The essence of Bayesian estimation is to formally describe how uncertainty changes when new data are taken into account.

<img src = "https://github.com/psygrammer/bayesianR/raw/master/part1/ch01_02/figure/fig2.1.png" width=600 />

<img src="https://github.com/psygrammer/bayesianR/raw/master/part1/ch01_02/figure/fig2.2.png" width=600 />

<img src="https://github.com/psygrammer/bayesianR/raw/master/part1/ch01_02/figure/fig2.3.png" width=600 />

<img src="https://github.com/psygrammer/bayesianR/raw/master/part1/ch01_02/figure/fig2.4.png" width=600 />

## Hierarchical Models Have Parameters with Hierarchical Meaning

<img src="https://github.com/psygrammer/bayesianR/raw/master/part2/ch09/img/fig9.1.png" />

Examples

* a type of trick coin, manufactured by the Acme Toy Company
* childhood obesity - weights of children, different schools, different school lunch programs, unknown socioeconomic statuses. 

In general, a model is hierarchical if the probability of one parameter can be conceived to depend on the value of another parameter.

* Expressed formally, suppose the observed data, denoted D, are described by a model with two parameters, denoted α and β.
* likelihood - p(D|α,β)
* prior - p(α,β)
* p(D|α,β)p(α,β) 
    - --> hierarchical 
        - p(D|α,β)p(α,β) = p(D|α)p(α|β)p(β)

One of the primary applications of hierarchical models is describing data from individuals within groups.

* individual-level
* group-level
* The individual- level and group-level parameters are estimated simultaneously. 

## Advantages of the Bayesian Approach

* Bayesian methods provide tremendous flexibility in designing models that are appropriate for describing the data at hand, and Bayesian methods provide a complete representation of parameter uncertainty (i.e., the posterior distribution) that can be directly interpreted.

* In a frequentist approach, although it may be possible to find a maximum-likelihood estimate (MLE) of parameter values in a hierarchical nonlinear model, the subsequent task of interpreting the uncertainty of the MLE can be very difficult.

## Some Mathematics and Mechanics of Bayesian Estimation

<img src="figures/cap13.1.png" width=600 />

* In some simple situations, the mathematical form of the posterior distribution can be analytically derived.
* A large class of algorithms for generating a representative random sample from a distribution is called Markov chain Monte Carlo (MCMC) methods.

# Example: Shrinkage and Multiple Comparisons of Baseball Batting Abilities

* The Data
* The Descriptive Model with Its Meaningful Parameters
* Results: Interpreting the Posterior Distribution
* Shrinkage and Multiple Comparisons

#### 참고
* [3] baseball terms -  http://www.howbaseballworks.com/Fielding.htm
* [4] Chapter 9. Hierarchical Models -  https://github.com/psygrammer/bayesianR/blob/master/part2/ch09/ch09_HierarchicalModels.md


An important goal for enthusiasts of baseball is estimating each player’s ability to bat the ball.

There are nine players in the field at once, who specialize in different positions. 

Therefore, based on the structure of the game, we know that players with different primary positions are likely to have different batting abilities.

## The Data

* The data consist of records from 
    - 948 players 
    - in the 2012 regular season of Major League Baseball 
    - who had at least one at-bat.2 
    - For player i, 
        - we have his number of opportunities at bat, ABi , 
        - his number of hits Hi, and 
        - his primary position when in the field pp(i). 
* In the data, there were 
    - 324 pitchers 
         - with a median of 4.0 at-bats, 
    - 103 catchers 
        - with a median of 170.0 at-bats, and 
    - 60 right fielders 
        - with a median of 340.5 at-bats, 
    - along with 461 players in six other positions.

## The Descriptive Model with Its Meaningful Parameters

#### 참고
* [5] 베타 분포 - https://ko.wikipedia.org/wiki/%EB%B2%A0%ED%83%80_%EB%B6%84%ED%8F%AC
* [6] 감마 분포 - https://ko.wikipedia.org/wiki/%EA%B0%90%EB%A7%88_%EB%B6%84%ED%8F%AC
* [7] Digging into the Dirichlet Distribution by Max Sklar - http://www.slideshare.net/g33ktalk/machine-learning-meetup-12182013
* [8] PATTERN RECOGNITION AND MACHINE LEARNING / CHAPTER 2: PROBABILITY DISTRIBUTIONS - http://research.microsoft.com/en-us/um/people/cmbishop/PRML/slides/prml-slides-2.ppt
* [9] Conjugate families of distributions -  http://halweb.uc3m.es/esp/Personal/personas/mwiper/docencia/English/PhD_Bayesian_Statistics/ch3_2009.pdf

* We want to estimate, for each player, his underlying probability θi of hitting the ball when at bat. 
* The primary data to inform our estimate of θi are 
    - the player’s number of hits, Hi, and 
    - his number of opportunities at bat, ABi.
* But the estimate will also be informed by 
    - our knowledge of the player’s primary position, pp(i), and 
    - by the data from all the other players (i.e., their hits, at- bats, and positions). 
* For example, 
    - if we know that player i is a pitcher, 
    - and we know that pitchers tend to have θ values around 0.13 (because of all the other data), 
    - then our estimate of θi should be anchored near 0.13 and 
    - adjusted by the specific hits and at-bats of the individual player.

We will construct a hierarchical model that
    - rationally shares information 
        - across players within positions,and 
        - across positions within all major league players

<img src="figures/cap13.2.png" width=600 />

* We denote the ith player’s underlying probability of getting a hit as θi.
    - Then the number of hits Hi out of ABi at-bats is a random draw from a binomial distribution that has success rate θi, as illustrated at the bottom of Figure 13.1.
    - The arrow pointing to Hi is labeled with a “∼” symbol to indicate that the number of hits is a random variable distributed as a binomial distribution.
* To formally express our prior belief that 
    - different primary positions emphasize 
        - different skills and hence have 
        - different batting abilities, 
            - we assume that the player abilities θi come from
                - distributions specific to each position. 
* We model the distribution of θi’s for a position as a beta distribution,
    - which is a natural distribution for describing values that fall between zero and one, and is often used in this sort of application
    - The mean of the beta distribution for primary position pp is denoted μpp, and 
    - the narrowness of the distribution is denoted κpp.
    - The value of μpp represents the typical batting ability of players in primary position pp,
    - and the value of κpp represents how tightly clustered the abilities are across players in primary position pp.

There are 970 parameters in the model alto- gether: 948 individual θi , plus μpp , κpp for each of nine primary positions, plus μμ, κμ across positions, plus sκ and rκ. The Bayesian analysis yields credible combinations of the parameters in the 970-dimensional joint parameter space.

## Results: Interpreting the Posterior Distribution

* check of robustness against changes in top-level prior constants
* comparisons of positions
* comparisons of individual players

#### 참고
* [10] 베이지안 가설검정(1) - 신용구간과 ROPE - http://egloos.zum.com/posterior/v/9634717
* [11] Chapter 13 - Goals, Power, and Sample Size - http://nbviewer.ipython.org/github/psygrammer/bayesianR/blob/master/part3/ch13/CHAPTER_13_GoalsPowerandSampleSize.ipynb

#### MCMC
* We used MCMC chains with total saved length of 15,000 after adaptation of 1,000 steps and burn- in of 1,000 steps, using 3 parallel chains called from the runjags package (Denwood, 2013), thinned by 30 merely to keep a modest file size for the saved chain.

#### posterior
* The diagnostics (see Box 1) assured us that the chains were adequate to provide an accurate and high-resolution representation of the posterior distribution. 

#### ESS
* The effective sample size (ESS) for all the reported parameters and differences exceeded 6,000, with nearly all exceeding 10,000.


### check of robustness against changes in top-level prior constants

* Because we wanted the top-level prior distribution to be noncommittal and have minimal influence on the posterior distribution, we checked whether the choice of prior had any notable effect on the posterior.
* We conducted the analysis with different constants in the top-level gamma distri- butions, to check whether they had any notable influence on the resulting posterior distribution.
    - Whether all gamma distributions used shape and rate constants of 0.1 and 0.1, or 0.001 and 0.001, the results were essentially identical. The results reported here are for gamma constants of 0.001 and 0.001.

### comparisons of positions

<img src="figures/cap13.3.png" width=600 />

<img src="figures/cap13.4.png" width=600 />

### comparisons of individual players

<img src="figures/cap13.5.png" width=600 />

<img src="figures/cap13.6.png" width=600 />

<img src="figures/cap13.7.png" width=600 />

<img src="figures/cap13.8.png" width=600 />

## Shrinkage and Multiple Comparisons

# Example: Clinical Individual Differences in Attention Allocation

* The Data
* The Descriptive Model with Its Meaningful Parameters
* Results: Interpreting the Posterior Distribution

<img src="figures/cap13.9.png" width=600 />

## The Data

## The Descriptive Model with Its Meaningful Parameters

* hierarchical structure

<img src="figures/cap13.10.png" width=600 />

<img src="figures/cap13.11.png" width=600 />

<img src="figures/cap13.12.png" width=600 />

### hierarchical structure

## Results: Interpreting the Posterior Distribution

* check of robustness against changes in top-level prior constants
* comparison across groups of attention to body size
* comparisons across individual women’s attention to body size

### check of robustness against changes in top-level prior constants

<img src="figures/cap13.13.png" width=600 />

<img src="figures/cap13.14.png" width=600 />

### comparison across groups of attention to body size

<img src="figures/cap13.15.png" width=600 />

<img src="figures/cap13.16.png" width=600 />

### comparisons across individual women’s attention to body size

<img src="figures/cap13.17.png" width=600 />

# Model Comparison as a Case of Estimation in Hierarchical Models

<img src="figures/cap13.18.png" width=600 />

# Conclusion

# 참고자료

* [1] The Oxford Handbook of Computational and Mathematical Psychology - http://www.amazon.com/Handbook-Computational-Mathematical-Psychology-Library/dp/0199957991
* [2] Bayesian Estimation in MARK - http://warnercnr.colostate.edu/~gwhite/fw663/Bayesian%20Overview.ppt 
* [3] baseball terms -  http://www.howbaseballworks.com/Fielding.htm
* [4] Chapter 9. Hierarchical Models -  https://github.com/psygrammer/bayesianR/blob/master/part2/ch09/ch09_HierarchicalModels.md
* [5] 베타 분포 - https://ko.wikipedia.org/wiki/%EB%B2%A0%ED%83%80_%EB%B6%84%ED%8F%AC
* [6] 감마 분포 - https://ko.wikipedia.org/wiki/%EA%B0%90%EB%A7%88_%EB%B6%84%ED%8F%AC
* [7] Digging into the Dirichlet Distribution by Max Sklar - http://www.slideshare.net/g33ktalk/machine-learning-meetup-12182013
* [8] PATTERN RECOGNITION AND MACHINE LEARNING / CHAPTER 2: PROBABILITY DISTRIBUTIONS - http://research.microsoft.com/en-us/um/people/cmbishop/PRML/slides/prml-slides-2.ppt
* [9] Conjugate families of distributions -  http://halweb.uc3m.es/esp/Personal/personas/mwiper/docencia/English/PhD_Bayesian_Statistics/ch3_2009.pdf
* [10] 베이지안 가설검정(1) - 신용구간과 ROPE - http://egloos.zum.com/posterior/v/9634717
* [11] Chapter 13 - Goals, Power, and Sample Size - http://nbviewer.ipython.org/github/psygrammer/bayesianR/blob/master/part3/ch13/CHAPTER_13_GoalsPowerandSampleSize.ipynb