# System Dynamics for, with, along the Hierarchy

## 0. Introduction and EM algorithm
### 0.1. Introduction
As the central dogma (RNA-DNA-Protein) inspired numerous biological analysis techniques, knowing the generating process of nature helps us decrypt it to simulate. Fig.1 shows three steps of updating the prior knowledge of the system to its posterior: elicit prior, complete data, and estimate posterior. This document aims to describe how hierarchical structure appears in System dynamics aiding its estimation, analysis, and decision. Sec.1, 2, 3 describe how to construct prior, data, and data-averaged posterior. Sec.4. illustrates the iterative process of this update (Sec.1-3) until it is well-calibrated. Sec.5 is the table of contents of this book which contains modules and examples of the principles introduced in Sec.1-4.

In summary, this book shows how a hierarchically generated model (state, Sec.1) can be hierarchically updated (action, Sec.1, 2, 3) to explore the hierarchy of models (goal, Sec.4). We introduce recurring modules of  architecture, actors, relation, policy, and parameter. The context is inventory management in military settings. War is the competition between two or more actors whose subcomponents cooperate with the goal of maximizing payoff. For simplicity, we limit the number of actors to two. We call this dynamic optimization within the feasible set defined by architecture G and policy $\pi$. Our analytic principles can be easily applied to situations that involve strategizing for competition and cooperative relationships.

![image.png](attachment:726b0299-7ce7-4761-aa68-ecd96dc19a1b.png)

Fig.1. Overview of three steps in inflow and outflow of model development

### 0.2. Preview of the first and third step with EM algorithm framework

> "EM alg. can be viewed as an iterative method for finding the mode of the marginal posterior density, $p(\phi|y)$, and is extremely useful for many common models for which it is hard to maximize $p(\phi|y)$ directly but easy to work with $p(\gamma|\phi, y)$ and $p(\phi|\gamma, y)$ ... EM is widely applicable because many models, including mixture models and some hierarchical models, can be re-expressed as distributions on augmented parameter spaces, where the added parameters $\gamma$ can be thought of as missing data...The name ‘EM’ comes from the two alternating steps: finding the c (the sufficient statistics) of the missing values, and maximizing the resulting posterior density to estimate the parameters as if these functions of the missing data were observed. For many standard models, both steps—estimating the missing values given a current estimate of the parameter and estimating the parameters given current estimates of the missing values—are straightforward. EM is widely applicable because many models, including mixture models and some hierarchical models, can be re-expressed as distributions on augmented parameter spaces, where the added parameters $\gamma$ can be thought of as missing data. -[Bayesian Data Analysis](http://www.stat.columbia.edu/~gelman/book/BDA3.pdf)"

Our first and third step correspond to the expectation and maximization step described above. To elaborate, the purpose of sequential conditioning in the first step is to quantify the prior knowledge (or prior we assume in our research hypothesis) about hierarchical structure of the entire model along with prior at each level: system architecture prior $P(G)$ and policy prior $P(\pi)$. Conditioning is necessary as data generated from heterogeneous groups are mixtures; for instance only when we condition on one of the two competing shops then we are able to tell one is using fixed period policy and the other is using fixed quantity policy. In other words, we turn our implicit knowledge into explicit distribution by deconvoluting the mixture (i.e. instead of maximizing $p(\phi|y)$ directly, we work with $p(\gamma|\phi, y)$ and $p(\phi|\gamma, y)$). In this sense, the first step is in fact eliciting the prior distribution with the help of prior knowledge on the structure of the model.

With the draft of our generator, we run prior predictive checks to see how well this explains the observed data. We review and update the model until $y, \tilde{y}$ are reasonably matched. Once the generator is completed, we next estimate parameters, policy, system architecture, which we put prior on. Estimation is by maximizing the payoff (as transformation of augmented parameters weighted with data-averaged posterior). Finding payoff maximizing prior is the destination of sequences of hypothesis testing.

## 1. Elicit Prior: Relations for the Hierarchy

: Hierarchical structure can be formed by analyzing competitive and cooperative relations between actors.

Given the system architecture, modelers perturb the system by adding one actor or one relation and then observe the result of the unit change. Viewing the unit addition as 'branching in a network of models', modelers are drawing a large binary tree as they explore. As the purpose of this branching is to measure the contribution of one unit change, the two branches are in competitive relation while all the components within the same branch are in cooperative relation. An actor can be identified as a sequence of competitive and cooperative relations, which we will call as  relational id. For instance, in the figure, 1 and 2 can be identified with "red" while "red-green" indicate A1, A2, A3, A4. In this book, we focus on this "relation id". This id not only justifies hierarchical modeling but also provides heterogeneity based interpretation. For instance, coefficient parameter binary parameter (i.e. branching) denotes the difference between two groups $\beta = E[Y|X=1]- E[Y|X=0]$. Mastering this inherent tree structure brings benefits, one can explore the network of models with greater efficiency.

The purpose of sequential conditioning in the first step is to construct hierarchical structure of the model with prior distribution assumed in each layer $P(G)$ as prior over choices of System architecture and $P(\pi)$ as prior over policy candidates. Multiverse analysis (Figure 23. from [Bayesian Workflow](https://arxiv.org/pdf/2011.01808.pdf)) describes these endless layers of distribution but for brevity we fix at the level of architecture. For instance, different architectures are possible in inventory models such as allocation (one supply to two demand actors), dual sourcing (two supply to one demand), multi echelon. Most research sets the hypothesis that the defined payoff generated from the proposed architecture (or policy) is differentiably higher than those generated from other candidates.  

![image.png](attachment:6c7fa5c7-53da-4b95-b5b0-03839e3e9ad6.png) 

Fig.2 (Left) Hierarchies from relations; red as competitive and green as cooperative relations. (Right) Example of system architectures. Prior P(G) can be given over them depending on what our researh hypothesis is.

## 2. Data Completion with the Hierarchy

> "Data augmentation is designed to allow simulation-based computations to be performed more simply on the larger space of “complete data,” by analogy to the workings of the EM algorithm for maximum likelihood (Dempster, Laird, and Rubin 1977). Examples are censored data, latent mixture indicators, and latent continuous variables for discrete regressions. - [Parameterization and Bayesian Modeling](http://www.stat.columbia.edu/~gelman/research/published/parameterization.pdf)"

Both synthetic (augmented) and observed data contribute to data. In specific, data generated from prior predictive check is synthetic data and if this is largely different from the observed data, it is a sign to revisit the first step: elicitation. Using only the observed data for further steps is not recommended as the data can be missing, too small, not well-balanced etc.

## 3. Uncertainty Propagation along the Hierarchy

: Architecture-Policy-Parameter hierarchy built in step 1 by sequential conditioning allows uncertainty propagation with which dynamic hypothesis is tested in step 4.

This step is the reverse of 1. Conditional on the dataset from 2, samples of parameter, policy, and system architecture are generated as estimates of uncertainty interval. Here, estimate is defined as the distribution of parameters maximizing the given payoff function. System dynamics' modeling power shines in this step. It serves as the generator with which modelers can determine one with the greatest payoff among the given candidates. Candidates can be policies or architecture of models which we will cover in this book as case studies. Discriminating the difference between the defined payoff of different policies or graph network may be another interpretation of modeler's role.

## 4. Dynamic Hypothesis Test on network of model

> "Iterative simulation has the role of transforming an intractable spatial process into a tractable space–time process - [Parameterization and Bayesian Modeling](http://www.stat.columbia.edu/~gelman/research/published/parameterization.pdf)"

Repeated perturbation analysis with dynamic hypothesis testing makes modeler's understanding of the system more concrete.

![image.png](attachment:6488714f-4f4a-42b1-87bb-bd34bd62984b.png) 

 Fig.4 Using simulation-based calibration to test dynamic hypothesis


Hypothesis on system, policy, parameters needs to be updated depending on the results we get and this iterative process can be boosted by the use of diagnostics. It is recommended to stress-test the system by starting from different regions of possible values of parameters then observe distributional outcome. For this, using starting from prior samples of true parameter value and using MCMC which returns estimates in a sample format are two necessary steps. Both of these come at the cost of increased computational costs, but in most cases are worth the effort. In some sense, this allows modelers to be confident in their contribution. Given no choice but to streamline their model for distributional test, modelers are forced to differentiate the already tested (proven hypothesis) part of the model from the novel part. Stress testing the proposal would make their experimental design and workflow downstream  much straightforward.

However, we acknowledge the excessive computational cost (given that at least a hundred prior samples are recommended in the vanilla simulation-based calibration (SBC) introduced as introduced in [this](https://hyunjimoon.github.io/SBC/articles/SBC.html#simple-poisson-regression) vignette on Poisson regression. Hence, for modelers who can't afford the cost we recommend comparing point mass prior with posterior samples. It is important to stay away from comparing point mass prior with point mass posterior (for instance when using optimization inference) which is a projection to one dimensional discarding substantial amounts of information. For further model development and therefore distributional output is needed as then the right part of the figure shows how empirical cumulative distribution function difference (ecdf_diff) diagnoses the distributional difference between the prior and posterior. For further prescriptions, see [Recommendations-for-problematic-diagnostics](https://github.com/hyunjimoon/SBC/wiki/Recommendations-for-problematic-diagnostics).

The following are not covered in the book. First is extending SBC to test/calibrate policy and architecture. As comparing the prior distribution (constructed with uncertainty propagation in step 3) and comparing with its data-averaged version is possible, SBC use cases can be extended. [Workflow Techniques for the Robust Use of Bayes Factors](https://arxiv.org/pdf/2103.08744.pdf) and [Bayes factors and posterior estimation: Two sides of the very same coin](https://arxiv.org/pdf/2204.06054.pdf)
are two works guiding how constructed uncertainty can be understood in the context of testing.  The first paper proposes setting a data-driven threshold for the Bayes factor and the second alarms modelers to be consistent on the assumption that M0 is correct when reporting Bayes factor and posterior estimates.

Second is making SBC less computationally heavy while maintaining its distributional-comparison identity. Ongoing attempts adjust for the miscalibration that will arise when we’re no longer averaging over the prior. This includes calculating data-based gradient or importance sampling. Use Cases are when the number of replication draws N is not large or because we want to focus on the posterior or some other area of interest of parameter space.

# 5. Table of Contents of our Book

Branching is the main theme of our book. Starting from the simplest "supply-stock-demand" model, we try to inspect the global network of "Military Inventory model with two competing actors". Modules (Mod1~5) are introduced and examples are introduced.

## Elicit Prior for Hierarchy
### 1. Basic Architecture and Policy
- Mod1) **Inventory** Model with Supply and Demand
- Mod2) **Fixed Order Quantity** Inventory Model
- Mod3) **Fixed Order Period** Inventory Model

### 2. Advanced Architecture of Supply and Demand
- e.g. Inventory Allocation Model (1:2)
- e.g. Dual Sourcing Inventory Model (1:2)
- e.g. Multi Echelon Inventory Model (1:1:1)

### 3. Uncertainty-targeted Policy: Competing with Nature
- Mod4) **Forecast** against Demand Uncertainty
- Mod5) **Adjustment** against Delay Uncertainty
- e.g. Demand Disruption Model (Wartime)
- e.g. Supply Disruption Model

### 4. Actor-targeted Policy: Compete and Cooperate with other Actors
- Mod6) Combat Model: A1 vs A2
- e.g. Lead Time Pooling Model: (A1 + A2) vs !(A1 + A2)
- e.g. Repair Model with Inventory: (A1 + A2 + A3) vs !(A1 + A2 + A3)

## Data Completion with the Hierarchy
### 5. Data Pooling
- e.g. Total Life Cycle of Ship Engine
- e.g. Covid or/and Bookstore

## Estimate Posterior with Hierarchy
### 6. Parameter Posterior
- Inventory Optimization Model
- Newsvendor Model
- Lead Time and Demand uncertainty
- Decision variable with Forecasting Model

### 7. Policy Posterior

### 8. Architecture Posterior

## Explore Hierarchy of Models
### 9. Distributional Comparison between Prior and DAP
- Mod7) Simulation-based calibration
- Mod7) Bayes factor
- e.g. Decision in Ship Engine Failure Model
