---
title: "Downscaling Global Economic Activity Data : A Spatial and Sectorial perspective using Bayesian Hierarchical Modelling"
format: 
    revealjs:
        theme: ["./pres_style.scss",default]
    # pdf : default
    # beamer
author:
  - name: Ivann Schlosser
    email: ivann.schlosser@ouce.ox.co.uk
    url: ischlosser.com
    affiliations:
      - name: Oxford Progamme for Sustainable Infrastructure Systems (OPSIS)
        address: South Parks Road
        postal-code: OX1 3QY
        city: Oxford
# bibliography: references.bib
---


# Introduction

## Glossary

**BHM** - Bayesian Hierarchical Model

**MCMC** - Markov Chain Monte Carlo

**Marginal Probability** - probability of a single variable

## Where we left off

- Pycnophylactic condition
- Rescaling, using Overture POIs and their categories as source for density distribution of activity on the ground.

### Pros

- Simple
- Efficient
- Global

### Cons

- Did not use additional available information : sector level data, regional accounts from local sources, other global layers such as population, Non-Residential built infrastructure etc...
- Univariate model
- Hard constraint
- No flexibility

## Can we do better ?

**YES**, but at a bigger cost.

However, it allows us to answer all off the previous cons, while preserving the pros.

The methodology was significantly expanded to use Bayesian Hierarchical Modelling (BHM).


## Methods

### BHM in 2 slides

As the name suggests, at the centre of this method relies the well familiar Bayes rule. 
Let us remind ourselves. If we have two random variables (RV), and we are modelling them together, we might find ourselves asking questions of the nature :

> What is the probability of a joint event, or a conditional event, having observed only the outcome of a single one ?

This, in practice, takes the form $p(x,z)$. And in such cases, Bayes rule tells us that 
$$
p(x,z) = p(z) * p(x|z) \\
p(x,z) = p(x) * p(z|x)
$$

 <!-- "What is the probability of observing $x$, having already observed $z$ ?"
 
 Is equivalent to : 
 
 "What is the probability of observing both $x$ and $z$ ?"  -->

We can rewrite this to link both marginals with the joint one :

$$
p(x) = \frac{p(z) * p(x|z)}{p(z|x)}
$$

Now the most important point in BM, is that we apply such reasoning to our model and its assumptions. In other words we are asking the question :

> What is the probability of observing some data from my model, given it has some specified parameters $\vec{\mathbf{\theta}}$. 

### Simple example
Let's say we have a RV : $X ~ \mathcal{N(\mu,\sigma)}$. In the most common and simple cases, both parameters are fixed and known. However, we could think of a situation, when one or both of them are actually not. In such a case, our RV actually becomes parametrised and can be expressed in the following way : $p_X(x)\equiv p_X(x|\mu,\sigma)$. The values we will draw from such an RV will be conditioned on the parameters, which need to be *specified*. 

### What about observed data ?
Let's say now we have some observed data $\{X_i\}$. Having laid a general behaviour for our model in the previous paragraph, we can now turn to our data and ask ourselves the question :

"What are the chances of observing the sample $\{X_i\}$ conditioned to the parameters $\vec{\mathbf{\theta}} = \{\mu, \sigma\}$ ?"

In other words we are looking at $p(\{X_i\}|\mu,\sigma)$.

##

### Posterior distribution

The posterior distribution emerges once we have adapted the prior using the likelihood we measure with respect to observed data. This step is similar to a learning epoch in the training process of Deep Learning.

### MCMC
The method allows us to fine tune the posterior distribution, by sampling synthetic data out of the prior and adapting it to be more similar to the observed data at every new iteration. Markov Chains Monte Carlo is tool that allows us to do this.

## Spatial Economic Activity

After this brief review, we get in the specific details of our problem. How do we develop a downscaling model with this ?
The idea is to embed a fine scale spatial econometrics model into this Bayesian framework. On the one hand informing a behaviour at the fine spatial scale, dictated by the econometrics model, and controling that this behaviour aligns with our prior knowledge and constraints, leveraging the formalism described earlier. This problem is relying on 2 main hierarchies itself. The spatial and sectorial. 

## Spatial and Sectorial Hierarchies
::: {.columns}
::: {.column}
### Levels of Spatial Granularity
![](imgs/spat_hierarchy_plot.png)
:::
::: {.column }
### Levels of Economic Activity Classification
![](imgs/sector_levels_plot.png)
:::
::: 
<!-- end columns -->


## Econometrics modelling

We use a simplified linear model, inspired by (spatial) econometrics, to *inform* the bayesian method on how we expect our predictor variables to be linked to the industry level output. We apply this model to every location that has some non-zero predictor variable.  

$$
\mu_{S_i} = \sum_m \alpha_m * x_m
$$

where $\alpha_m$ are learned parameters and $x_m$ are the proxy variables that we assign to the relevant sectors. The result is a multilinear model, where some variables where masked, based on reasonable assumptions on their impact on a particular output, this steps refers to the *expert knowledge* that Bayesian methods rely on. Additionally, selecting only most relevant variables for each output helps reduce the dimensionality of the problem, which poses great challenges in the context of often lacking data to validate or inform the model.


The linear model in turn yields a value $\mu_{S_i}$ for a specific sector, which is interpreted as a mean value for a secor in a location by the Bayesian method and is combined with an uncertainty metric $\sigma$ measured before hand as the average availability of data in the system. The tuple of values $(\mu_{S_i}, \sigma)$, with $\sigma$ fixed and $\mu_{S_i}$ which is obtained from the sampled linear combination of the $\alpha$ parameters. 

## All together

::: {.columns}
::: {.column}


In [None]:
    with pm.Model() as model:

        W = pm.Normal("W", mu=mu, sigma=3, shape=(n_sectors, n_proxies))
        W_masked = pm.Deterministic("W_masked",W * w_mask)
        X_ = pm.Data("X", X)
        
        mu_pred = pm.Deterministic(
            "mu_pred", 
            X_ @ W_masked.T,
        )
        y = pm.LogNormal(
            "y", 
            mu=mu_pred, 
            sigma=std,
            shape=(n_locations, n_sectors),
        )
        # Matrix multiplication to aggregate to region-sector
        R_shared = pm.Data("R", R)
        G_shared = pm.Data("G", G)
        # spatial aggregation
        y_region_sector = pm.Deterministic("y_region_sector", R_shared @ y)
        # sectoril aggregation
        y_region_industry = pm.Deterministic("y_region_industry", y_region_sector @ G_shared.T)
        if entropy:
            eps = 1e-10
            y_soft = y / y.sum()
            entropy_ = - pm.math.sum(
                y_soft * pm.math.log(y_soft + eps), 
                # axis=1,
                axis=None,
            )
            entropy = pm.Deterministic("entropy_", entropy_)
            pm.Potential("entropy", var = alpha*entropy)
            
        Y_obs = pm.Normal(
            "Y_obs",
            mu=y_region_industry,
            sigma=region_sector_totals*region_sector_totals_uncert, # eps 
            observed=region_sector_totals,
        )

        trace = pm.sample(
            samples, 
            tune=tunes, 
            target_accept=target_accept,
            max_treedepth=max_treedepth,
            cores=n_cores,
            chains=chains,
        )


:::

::: {.column}

![](imgs/NGA.20_1_model_graph.png)

:::
::: <!-- end columns -->



## Data 

### input

- pois
- nres
- DOSE-WDI
- Copernicus
- GHSL pop


- bea
- ilostat
- UK Value added
- EU IO tables

### Validation

- Kummu
- Bea
- EU IO
- Null model (OLD)

## Catalogue 

### subsectors 

- GEM
- CGFI
- Climatrace
- Edgar
- MAPSPAM

## All together 
