# ARCHITECTURES

## STRUCTURAL EQUATION MODEL

### Specification

SEM can estimate coefficients for both observed and latent variables.

If the model includes latent factors, using OLS directly would be incorrect or incomplete, because OLS only accounts for relationships among observed variables.

SEM provides more accurate estimates of βs by taking latent variables and measurement errors into account.

In this direction, suppose we introduce a latent variable $\eta$ representing a construct measured by observed indicators $X_1$ and $X_2$. Then the structural model becomes:

$$
Y = \beta \eta + \zeta_Y
$$

and the measurement model for the latent variable $\eta$ is:

$$
X_1 = \lambda_1 \eta + \varepsilon_1
$$

$$
X_2 = \lambda_2 \eta + \varepsilon_2
$$

where  

- $\beta$: regression coefficient of latent variable $\eta$ on $Y$  
- $\lambda_1, \lambda_2$: factor loadings of observed variables $X_1$ and $X_2$ on $\eta$  
- $\zeta_Y$: structural error term for $Y$  
- $\varepsilon_1, \varepsilon_2$: measurement errors of $X_1$ and $X_2$  

The **path diagram** can be illustrated as:

- Latent variable $\eta$ → $Y$ (regression path)  
- $\eta$ → $X_1$, $\eta$ → $X_2$ (factor loadings)  
- Measurement errors $\varepsilon_1$, $\varepsilon_2$, $\zeta_Y$ for each endogenous variable  

This structure explicitly separates **measurement model** (how observed variables reflect latent constructs) from the **structural model** (how latent variables influence outcomes), which is the core strength of SEM.

### Identification

Suppose the latent variable $\eta$ is measured by $X_1$ and $X_2$, and affects $Y$.  

The covariance matrix of the observed variables can be written as:

$$
\Sigma =
\begin{bmatrix}
Var(X_1) & Cov(X_1, X_2) & Cov(X_1, Y) \\
Cov(X_1, X_2) & Var(X_2) & Cov(X_2, Y) \\
Cov(X_1, Y) & Cov(X_2, Y) & Var(Y)
\end{bmatrix}
$$

Unknown parameters now include:

- Factor loadings: $\lambda_1, \lambda_2$  
- Structural coefficient: $\beta$  
- Error variances: $Var(\varepsilon_1), Var(\varepsilon_2), Var(\zeta_Y)$  

So, total number of unknowns = 6  

The number of independent elements in the covariance matrix is:

$$
n_{independent} = \frac{n \cdot (n+1)}{2}
$$

where  

- $n$ = number of observed variables  
- $n_{independent}$ = number of independent pieces of information available for identification  

Number of independent pieces of information in covariance matrix of 3 observed variables:

$$
\frac{3 \cdot (3+1)}{2} = 6
$$

- To scale the latent variable, we typically fix $Var(\eta)=1$ or set one loading (e.g. $\lambda_1=1$).
- Since number of unknowns = number of independent pieces of information means **just-identified** (model can be solved exactly, no extra test possible)  
- Adding more observed indicators for the latent variable or more latent constructs can make the model **over-identified**, which allows testing the model fit.  

### Estimation

With latent variable $\eta$, estimation is based on **model-implied covariance matrix** rather than simple correlations.

The model-implied covariances are:

$$
Cov(X_1, Y) = \lambda_1 \cdot \beta \cdot Var(\eta)
$$

$$
Cov(X_2, Y) = \lambda_2 \cdot \beta \cdot Var(\eta)
$$

$$
Cov(X_1, X_2) = \lambda_1 \cdot \lambda_2 \cdot Var(\eta)
$$

$$
Var(X_1) = \lambda_1^2 \cdot Var(\eta) + Var(\varepsilon_1)
$$

$$
Var(X_2) = \lambda_2^2 \cdot Var(\eta) + Var(\varepsilon_2)
$$

$$
Var(Y) = \beta^2 \cdot Var(\eta) + Var(\zeta_Y)
$$

The parameters ($\lambda_1, \lambda_2, \beta, Var(\varepsilon_1), Var(\varepsilon_2), Var(\zeta_Y)$) are estimated by **Maximum Likelihood (ML)**, minimizing the difference between the observed covariance matrix $\Sigma$ and the model-implied covariance matrix $\hat{\Sigma}(\theta)$:

$$
F_{ML}(\theta) = \log |\hat{\Sigma}(\theta)| + \text{tr}(\Sigma \hat{\Sigma}^{-1}(\theta)) - \log |\Sigma| - p
$$

where  

- $p$ = number of observed variables  
- $\theta$ = set of parameters  

### Modification

Suppose estimation gives:

$$
\hat{\beta} = 0.6, \quad \hat{\lambda}_2 = 0.3
$$

But in the data we observe $Cov(X_2, Y)$ is much larger than predicted.  

**Modification Index (MI)** may suggest freeing a currently fixed parameter, e.g., adding a direct path from $X_2$ to $Y$, or allowing $\lambda_2$ to vary freely.

Formally, if the ML fit function is  

$$
F_{ML}(\theta) = \log |\hat{\Sigma}(\theta)| + \text{tr}(\Sigma \hat{\Sigma}^{-1}(\theta)) - \log |\Sigma| - p,
$$  

then for a fixed parameter $\theta_j$, the modification index is approximated as:

$$
MI_j = \frac{\left( \frac{\partial F}{\partial \theta_j} \big|_{\theta=\hat{\theta}} \right)^2}{\frac{\partial^2 F}{\partial \theta_j^2} \big|_{\theta=\hat{\theta}}}
$$

where  

- $\frac{\partial F}{\partial \theta_j}$ = first derivative (score function)  
- $\frac{\partial^2 F}{\partial \theta_j^2}$ = second derivative (information matrix)  
- $\hat{\theta}$ = estimated parameters under the current model  

$MI_j$ approximates the expected drop in the chi-square statistic ($\Delta \chi^2$) if the parameter $\theta_j$ is freed. A large $MI_j$ suggests adding this path/parameter could substantially improve model fit.

**Example of modified equations with latent variable $\eta$:**

$$
Y = \beta \eta + \gamma X_2 + \zeta_Y
$$

$$
X_1 = \lambda_1 \eta + \varepsilon_1
$$

$$
X_2 = \lambda_2 \eta + \varepsilon_2
$$

Here, the new path $\gamma X_2$ allows $X_2$ to have a direct effect on $Y$, in addition to its effect through the latent variable $\eta$.

### Interpretation

Final interpretation of parameters:  

- $\beta$ = 0.6 means that the latent factor $\eta$ has a moderate-to-strong direct effect on $Y$.  
- $\lambda_1$ = 0.8 means that $X_1$ is a strong indicator of the latent variable $\eta$.  
- $\lambda_2$ = 0.3 means that $X_2$ is a moderate indicator of the latent variable $\eta$.  
- Indirect effect means that if $X_2$ also has a direct effect on $Y$ ($\gamma \neq 0$), then $X_2$ influences $Y$ both directly (via $\gamma$) and indirectly through $\eta$ (via $\lambda_2 \cdot \beta$).  

## STRUCTURAL CAUSAL MODEL

### Specification

SEM cannot give causal effects but the causal effects can be calculated via SCM and also in SCM, there are no latent variables; causal inference is made solely based on observed variables and their relationships because of it, coefficients in models are calculated via linear or non-linear regression.

In SCM, relationships between variables are expressed as structural equations:

$$
Y = f_Y(X_1, X_2, U_Y)
$$  
$$
X_1 = f_{X_1}(U_{X_1}), \quad X_2 = f_{X_2}(U_{X_2})
$$

Where $U$ represents **exogenous (external) variables**.

A linear SCM is:

$$
Y = \beta_1 X_1 + \beta_2 X_2 + U_Y
$$

- Here, $\beta_1$ & $\beta_2$ computed via OLS.

Graphically, this is represented as a **Directed Acyclic Graph (DAG)** based on SEM **path diagram** and acyclicity is important for both because of that if there is a cycle, it becomes unclear which variable should be computed first, and the model cannot be solved.

$$
X_1 \rightarrow Y \leftarrow X_2
$$

### Identification

In a Structural Causal Model (SCM), identification refers to whether a causal effect can be uniquely estimated from observational data.

The causal effect of a variable $X$ on $Y$ is expressed using the **do-operator**:

$$
P(Y \mid do(X=x))
$$

The do-operator represents an **intervention**: we set $X$ to $x$ externally, breaking any incoming edges into $X$ in the DAG and this distinguishes causal effects from mere correlations.

And then, the **backdoor criterion** is a graphical method to determine if $P(Y \mid do(X))$ can be computed from observational data **theoretically**:

A set of variables $Z$ satisfies the backdoor criterion relative to $(X, Y)$ if:

1. No variable in $Z$ is a descendant of $X$.
2. $Z$ blocks all backdoor paths from $X$ to $Y$.  
   - A backdoor path is any path from $X$ to $Y$ that starts with an incoming arrow into $X$.

If such a set $Z$ exists, the causal effect is identifiable:

$$
P(Y \mid do(X)) = \sum_z P(Y \mid X, Z=z) P(Z=z)
$$

This formula is called the **backdoor adjustment formula**.

Consider a DAG:

$$
Z \rightarrow X_1 \rightarrow Y \quad \text{and} \quad Z \rightarrow Y
$$

Here, $Z$ is a confounder creating a backdoor path $X_1 \leftarrow Z \rightarrow Y$.  
To identify the causal effect of $X_1$ on $Y$, we **condition on $Z$**:

$$
P(Y \mid do(X_1)) = \sum_z P(Y \mid X_1, Z=z) P(Z=z)
$$

This removes the bias introduced by the backdoor path, giving the true causal effect.

### Identification

In a Structural Causal Model (SCM), identification refers to whether a causal effect can be uniquely estimated from observational data**.

The causal effect of a variable $X$ on $Y$ is expressed using the **do-operator**:

$$
P(Y \mid do(X=x))
$$

The do-operator represents an **intervention**: we set $X$ to $x$ externally, breaking any incoming edges into $X$ in the DAG and this distinguishes causal effects from mere correlations.

And then, the **backdoor criterion** is a graphical method to determine if $P(Y \mid do(X))$ can be computed from observational data **theoretically**:

A set of variables $Z$ satisfies the backdoor criterion relative to $(X, Y)$ if:

1. No variable in $Z$ is a descendant of $X$.
2. $Z$ blocks all backdoor paths from $X$ to $Y$.  
   - A backdoor path is any path from $X$ to $Y$ that starts with an incoming arrow into $X$.

If such a set $Z$ exists, the causal effect is identifiable:

$$
P(Y \mid do(X)) = \sum_z P(Y \mid X, Z=z) P(Z=z)
$$

This formula is called the **backdoor adjustment formula**.

Consider a DAG:

$$
Z \rightarrow X_1 \rightarrow Y \quad \text{and} \quad Z \rightarrow Y
$$

Here, $Z$ is a confounder creating a backdoor path $X_1 \leftarrow Z \rightarrow Y$.  
To identify the causal effect of $X_1$ on $Y$, we **condition on $Z$**:

$$
P(Y \mid do(X_1)) = \sum_z P(Y \mid X_1, Z=z) P(Z=z)
$$

This removes the bias introduced by the backdoor path, giving the true causal effect.

Also if some confounders between $X$ and $Y$ are **unobservable** (cannot be included in a backdoor set), the backdoor criterion cannot be applied.  

In such cases, if there exists a **mediator $M$** such that $X \to M \to Y$, and the following conditions hold:

1. $X$ affects $M$.
2. All backdoor paths from $X$ to $M$ are blocked by observed variables.
3. $M$ intercepts all causal paths from $X$ to $Y$.

Then, the **frontdoor criterion** can be used to identify the causal effect via $M$:

$$
P(Y \mid do(X)) = \sum_m P(M=m \mid X) \sum_{x'} P(Y \mid M=m, X=x') P(X=x')
$$

This allows identification even in the presence of **unobserved confounding**.

### Estimation

If a set of variables $Z$ satisfies the backdoor criterion relative to $(X, Y)$, the causal effect of $X$ on $Y$ can be computed as:

$$
P(Y \mid do(X=x)) = \sum_z P(Y \mid X=x, Z=z) \, P(Z=z)
$$

- Here, $P(Y \mid X, Z)$ is obtained from **observational data**, not theoretical like in identification.  
- The summation over $z$ integrates out the confounding effects.  
- This gives the **full causal distribution** of $Y$ under the intervention $do(X=x)$.

In a linear SCM:

$$
Y = \beta_1 X_1 + \beta_2 X_2 + U_Y
$$

where $U_Y$ is an exogenous error term independent of $X_1$ and $X_2$.

- The expected value of $Y$ under an intervention on $X_1$ depends on whether $X_2$ is independent of $X_1$:

If $X_2$ is independent of $X_1$:

$$
E[Y \mid do(X_1=x_1)] = \beta_1 x_1 + \beta_2 E[X_2]
$$

- In this case, $\beta_1$ represents the **causal effect** of $X_1$ on $Y$.

If $X_2$ is dependent on $X_1$ (confounding present):

$$
E[Y \mid do(X_1=x_1)] = \beta_1 x_1 + \beta_2 E[X_2 \mid do(X_1=x_1)]
$$

- Here, $\beta_1$ alone does **not** represent the total causal effect.  
- The term $E[X_2 \mid do(X_1=x_1)]$ accounts for the influence of $X_1$ on $X_2$, i.e., the **confounding pathway**.

### Evaluation

In SCM, model evaluation is done by comparing the causal inferences with either observational or experimental (RCT) data.

Suppose our SCM predicts the causal effect of $X_1$ on $Y$:

$$
E[Y \mid do(X_1=1)] - E[Y \mid do(X_1=0)] = 0.5
$$

- This means that intervening and setting $X_1$ from 0 to 1 increases the expected outcome by 0.5.  

**Observational data** means check if the predicted causal effect is consistent with adjusted estimates while **experimental (RCT) data** means directly intervene on $X_1$ and measure $Y$.  

Let $\hat{E}[Y \mid do(X_1=1)]$ be the predicted effect, and $E_{\text{exp}}[Y \mid X_1=1]$ the experimental outcome:

$$
\text{Fit Error} = \left| \hat{E}[Y \mid do(X_1=1)] - E_{\text{exp}}[Y \mid X_1=1] \right|
$$

- Smaller fit error means better model.  

To evaluate the SCM more comprehensively, consider the causal effect across the **entire range of $X_1$**. Metrics such as **Mean Squared Error (MSE)**, **Bias**, and **Variance** can be computed.

**Mean Squared Error (MSE):**

$$
\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} \left(\hat{E}[Y \mid do(X_1=x_i)] - E_{\text{exp}}[Y \mid X_1=x_i] \right)^2
$$

- Smaller MSE indicates better overall fit across the intervention range.

**Bias:**

$$
\text{Bias} = \frac{1}{n} \sum_{i=1}^{n} \left(\hat{E}[Y \mid do(X_1=x_i)] - E[Y \mid do(X_1=x_i)] \right)
$$

- Positive bias means overestimation; negative bias means underestimation & zero bias indicates unbiased causal effect estimation on average.

**Variance:**

$$
\text{Variance} = \frac{1}{n} \sum_{i=1}^{n} \left(\hat{E}[Y \mid do(X_1=x_i)] - \overline{\hat{E}[Y \mid do(X_1)]} \right)^2
$$

- High variance indicates predictions are sensitive to changes in $X_1$, even if unbiased.

- These metrics provide a more complete assessment of model performance across all possible interventions on $X_1$, rather than just a single value.

### Modification

If confounders are missing or directions are wrong, the DAG must be revised via theory or causal discovery algorithms:

- **PC Algorithm**: Uses conditional independence tests to infer DAG edges and directions.  
- **GES (Greedy Equivalence Search)**: Score-based search over DAG structures to find the best-fitting model.  
- **LiNGAM (Linear Non-Gaussian Acyclic Model)**: Assumes linear relations with non-Gaussian errors to identify causal directions.

Previous:

$$
P(X_1, X_2, Y) = P(X_1) P(X_2) P(Y \mid X_1, X_2)
$$

means

$$
X_1 \;\; \longrightarrow \;\; Y \;\; \longleftarrow \;\; X_2
$$

Updated:

$$
P(X_1, X_2, Y) = P(X_2) P(X_1 \mid X_2) P(Y \mid X_1, X_2)
$$

means

$$
X_2 \;\; \longrightarrow \;\; X_1 \;\; \longrightarrow \;\; Y, \;\; X_2 \;\; \longrightarrow \;\; Y
$$

### Interpretation

SCM interprets **causal effects** rather than regression coefficients:

$$
E[Y \mid do(X_1=1)] - E[Y \mid do(X_1=0)] = 0.5
$$

It means that a one-unit increase in $X_1$ **causally increases** $Y$ by 0.5 on average.

### Counterfactualization

SCM allows us to reason about **counterfactuals**, which answer questions like:

*"What would $Y$ have been if $X_1$ had taken a different value, given what we actually observed?"*

A counterfactual is denoted as:

$$
Y_{X_1=x'} \;\;\text{given that actually}\;\; X_1=x, X_2=x_2, Y=y
$$

Counterfactual computation in SCM typically involves three steps:

**Abduction:** Estimate the exogenous variables $U$ based on the observed data:

$$
U_Y = Y - (\beta_1 X_1 + \beta_2 X_2), \quad
U_{X_2} = f_{X_2}^{-1}(X_2)
$$

**Action:** Intervene to set $X_1$ to the counterfactual value $x'$ using the do-operator:

$$
do(X_1 = x')
$$

**Prediction:** Compute the counterfactual outcome using the structural equations and the exogenous variables, adjusting all other relevant variables:

$$
X_2^{cf} = f_{X_2}(U_{X_2}) \quad \Rightarrow \quad
Y_{X_1=x'} = \beta_1 x' + \beta_2 X_2^{cf} + U_Y
$$

- Here, $X_2^{cf}$ is the counterfactual value of $X_2$ computed using its exogenous variable $U_{X_2}$ and this ensures that all dependent variables are correctly updated, rather than keeping them fixed at observed values.

As is seen, SCM without counterfactuals gives **average causal effects**, while SCM with counterfactuals gives **individual causal effects** that properly account for all dependencies.

# ARCHITECTURES IN PYTHON

## STRUCTURAL EQUATION MODEL IN PYTHON

In [None]:
# SOON

## STRUCTURAL CAUSAL MODEL IN PYTHON

In [None]:
# SOON