# Adapting mixing matrices

## Goal of this notebook
Unfortunately, a lot of countries around the world have
no empirically collected survey data on contact patterns,
so we often have to infer from one setting to another.
There are many different ways that we could do this,
and the social interactions between population groups
are likely highly dependent on many factors,
including the age structure and composition of workplaces and schools,
and a wide range of social and cultural factors.
These factors are outside the scope of this series,
but let's work through how we adjust only for differences in
age structure between the setting in which the data were collected
and the age structure in the setting we want to model.

## Example to consider
Let's try to adapt our mixing matrix from the United Kingdom data from the
POLYMOD study to the age structure of the population of India.
In one sense, this is not a good example, because there are such 
huge cultural differences between these two settings other than just 
the population distribution.
However, there are also major differences in the population distribution,
so this example should illustrate that consideration well.

In [None]:
import numpy as np
import pandas as pd
import plotly.express as px
pd.options.plotting.backend = "plotly"

from summer2 import CompartmentalModel, Stratification, Multiply
from summer2.parameters import Parameter, DerivedOutput, Function

## Collating our data
First, let's define the age structure for each of the two settings we're considering,
and have a look at the results.
As we can see, the differences in the population structure are considerable.

In [None]:
# Use the same age groups as in previous notebooks
age_groups = [i for i in range(0, 75, 5)]

# United Kingdom
uk_pops_list = [
    3458060, 3556024, 3824317, 3960916, 3911291, 3762213, 4174675, 4695853, 
    4653082, 3986098, 3620216, 3892985, 3124676, 2706365, 6961183,
]
uk_age_pops = pd.Series(uk_pops_list, index=age_groups)
uk_age_props = uk_age_pops / uk_age_pops.sum()

# India
india_pops_list = [
    129542986, 125382900, 120793708, 116260529, 107868736, 94680659, 84290793, 
    74654335, 66648349, 58227885, 49861117, 36403014, 28622039, 22542089, 31830788,
]
india_age_pops = pd.Series(india_pops_list, index=age_groups)
india_age_props = india_age_pops / india_age_pops.sum()

pd.DataFrame(
    {
        "india": india_age_props,
        "uk": uk_age_props,
    }
).plot(
    labels=dict(index="age", value="proportion of population"),
)

## Get the base matrix
Next, let's get our base matrix for the United Kingdom,
that we want to adapt, and have a look at this matrix.

In [None]:
def build_polymod_britain_matrix():
    matrix = [
        [1.92, 0.65, 0.41, 0.24, 0.46, 0.73, 0.67, 0.83, 0.24, 0.22, 0.36, 0.20, 0.20, 0.26, 0.13],
        [0.95, 6.64, 1.09, 0.73, 0.61, 0.75, 0.95, 1.39, 0.90, 0.16, 0.30, 0.22, 0.50, 0.48, 0.20],
        [0.48, 1.31, 6.85, 1.52, 0.27, 0.31, 0.48, 0.76, 1.00, 0.69, 0.32, 0.44, 0.27, 0.41, 0.33],
        [0.33, 0.34, 1.03, 6.71, 1.58, 0.73, 0.42, 0.56, 0.85, 1.16, 0.70, 0.30, 0.20, 0.48, 0.63],
        [0.45, 0.30, 0.22, 0.93, 2.59, 1.49, 0.75, 0.63, 0.77, 0.87, 0.88, 0.61, 0.53, 0.37, 0.33],
        [0.79, 0.66, 0.44, 0.74, 1.29, 1.83, 0.97, 0.71, 0.74, 0.85, 0.88, 0.87, 0.67, 0.74, 0.33],
        [0.97, 1.07, 0.62, 0.50, 0.88, 1.19, 1.67, 0.89, 1.02, 0.91, 0.92, 0.61, 0.76, 0.63, 0.27],
        [1.02, 0.98, 1.26, 1.09, 0.76, 0.95, 1.53, 1.50, 1.32, 1.09, 0.83, 0.69, 1.02, 0.96, 0.20],
        [0.55, 1.00, 1.14, 0.94, 0.73, 0.88, 0.82, 1.23, 1.35, 1.27, 0.89, 0.67, 0.94, 0.81, 0.80],
        [0.29, 0.54, 0.57, 0.77, 0.97, 0.93, 0.57, 0.80, 1.32, 1.87, 0.61, 0.80, 0.61, 0.59, 0.57],
        [0.33, 0.38, 0.40, 0.41, 0.44, 0.85, 0.60, 0.61, 0.71, 0.95, 0.74, 1.06, 0.59, 0.56, 0.57],
        [0.31, 0.21, 0.25, 0.33, 0.39, 0.53, 0.68, 0.53, 0.55, 0.51, 0.82, 1.17, 0.85, 0.85, 0.33],
        [0.26, 0.25, 0.19, 0.24, 0.19, 0.34, 0.40, 0.39, 0.47, 0.55, 0.41, 0.78, 0.65, 0.85, 0.57],
        [0.09, 0.11, 0.12, 0.20, 0.19, 0.22, 0.13, 0.30, 0.23, 0.13, 0.21, 0.28, 0.36, 0.70, 0.60],
        [0.14, 0.15, 0.21, 0.10, 0.24, 0.17, 0.15, 0.41, 0.50, 0.71, 0.53, 0.76, 0.47, 0.74, 1.47],
    ]
    return np.array(matrix).T

In [None]:
uk_matrix = build_polymod_britain_matrix()
px.imshow(uk_matrix)

## Adaptation
Next, we need to calculate the ratio of the proportions
of the population in each age group between India and the UK.
Last, we need to multiply each row of our matrix through
by the age ratios we calculated previously.
This is equivalent to multiplying the original matrix through
by a matrix of the same dimensions, 
with the population ratios on the diagonal of the matrix we previously calculated.

In [None]:
india_uk_ratios = india_age_props / uk_age_props
adjusted_matrix = np.dot(uk_matrix, np.diag(india_uk_ratios))

## Results
As you can see from the adapted matrix displayed below,
we now have a matrix in which the contacts are much more skewed towards
the younger age groups, as we would expect for a population with 
a younger age distribution like India.

In [None]:
px.imshow(adjusted_matrix)

This matrix is not ideal for 
a model of an infectious disease in India.
However, failing to apply this adjustment
and using the UK matrices as reported
with the Indian population distribution
would be plain wrong.