# Preliminaries

In [1]:
!rm -f *.pdf

<br>

### Paths

In [2]:
import os
import pathlib
import sys

In [10]:
if not 'google.colab' in str(get_ipython()):
    
    parts = pathlib.Path(os.getcwd()).parts    
    limit = max([index for index, value in enumerate(parts) if value == 'infections'])    
    parent = os.path.join(*list(parts[:(limit + 1)]))
    
    sys.path.append(os.path.join(parent, 'src'))


In [11]:
parent

'J:\\library\\premodelling\\projects\\infections'

<br>
<br>

### Libraries

In [4]:
%matplotlib inline

import datetime

import logging
import collections

import numpy as np
import pandas as pd

import time

<br>
<br>

### Custom

In [5]:
import src.preprocessing.interface

import src.virus.measures
import src.virus.agegroupcases
import src.virus.agegroupvaccinations

import config

<br>

Setting-up

In [6]:
configurations = config.Config()

<br>

The coronavirus.data.gov.uk API (application programming interface) data fields that would be extracted per LTLA (lower tier local authority) geographic area, and per NHS Trust, of England. 

In [7]:
fields_ltla = configurations.fields_ltla
fields_trusts = configurations.fields_trust

<br>

England's unique set of LTLA & NHS Trust codes.

In [8]:
districts = configurations.districts()
codes_ltla = districts.ltla.unique()

In [9]:
trusts = configurations.trusts()
codes_trusts = trusts.trust_code.unique()

<br>
<br>

### Logging

In [10]:
logging.basicConfig(level=logging.INFO,
                    format='\n\n%(message)s\n%(asctime)s.%(msecs)03d',
                        datefmt='%Y-%m-%d %H:%M:%S')
logger = logging.getLogger(__name__)

<br>
<br>

# Part I

## Integration, Features Engineering


### The Supplementary Data Files

In [11]:
times = src.preprocessing.interface.Interface().exc()



preprocessing ...
2022-02-13 13:04:50.125



districts: 
 ['2020: succeeded', '2019: succeeded', '2018: succeeded', '2017: succeeded', '2016: succeeded', '2015: succeeded']

patients: 
 ['2019: succeeded', '2018: succeeded', '2017: succeeded', '2016: succeeded', '2015: succeeded', '2014: succeeded', '2013: succeeded', '2012: succeeded', '2011: succeeded']

populations MSOA: 
 ['2012: succeeded', '2013: succeeded', '2014: succeeded', '2015: succeeded', '2016: succeeded', '2017: succeeded', '2018: succeeded', '2019: succeeded', '2020: succeeded']

MSOA Populations Disaggregated by Sex & Age Group: 
 ['2012: succeeded', '2013: succeeded', '2014: succeeded', '2015: succeeded', '2016: succeeded', '2017: succeeded', '2018: succeeded', '2019: succeeded', '2020: succeeded']

LTLA Populations: 
 ['2012: succeeded', '2013: succeeded', '2014: succeeded', '2015: succeeded', '2016: succeeded', '2017: succeeded', '2018: succeeded', '2019: succeeded', '2020: succeeded']

LTLA Populations Disaggregated by Sex & Age Group: 
 ['2011: succeeded', 

<br>

Delete compute DAG diagrams

In [12]:
!rm -f *.pdf

<br>

Times

In [13]:
pd.DataFrame.from_records(data=times['programs'])

Unnamed: 0,desc,program,seconds
0,districts,preprocessing.districts,1.701097
1,patients,preprocessing.patients,285.608336
2,MSOA populations,preprocessing.populationsmsoa,220.888634
3,MSOA populations: age group & sex brackets,preprocessing.agegroupsexmsoa,3.115178
4,LTLA populations,preprocessing.populationsltla,2.546145
5,LTLA populations: age group & sex brackets,preprocessing.agegroupsexltla,1.814104
6,2011 demographic data,preprocessing.exceptions,4.37525
7,special MSOA demographics for vac,preprocessing.vaccinationgroupsmsoa,8.172467
8,special LTLA demographics for vac,preprocessing.vaccinationgroupsltla,1.733099


<br>
<br>

### coronavirus.data.gov.uk

England's SARS-CoV-2 infections and coronavirus 19 disease measures are acquireable via the United Kingdom's coronavirus.data.gov.uk API.  Four different data sets are of interest, which are read via the 4 steps that follow.  Instead of the 4 steps below you may run

> %%bash
>
> `python src/virus/interface.py`

 <br>
 
 **Lower Tier Local Authority Level Measures**

> ```python
measures = src.virus.measures.Measures(fields=fields_ltla, path=os.path.join('ltla', 'measures')) \
    .exc(area_codes=codes_ltla, area_type='ltla')
logger.info('%d LTLA areas queried.', len(measures))
time.sleep(60)
```

<br>

**Trust level measures**

> ```python
measures = src.virus.measures.Measures(fields=fields_trusts, path=os.path.join('trusts', 'measures')) \
    .exc(area_codes=codes_trusts, area_type='nhsTrust')
logger.info('%d NHS Trusts queried.', len(measures))
time.sleep(60)
```

<br>

**LTLA Level measures: Cases disaggregated by Age Group**

> ```python
measures = src.virus.agegroupcases.AgeGroupCases().exc(area_codes=codes_ltla, area_type='ltla')
logger.info('%d LTLA areas queried.', len(measures))
time.sleep(60)
```

<br>

**LTLA Level measures: Vaccinations disaggregated by Age Group** 

A few areas do not have any data, albeit their request response status is 200

> ```python
area_codes = list(set(codes_ltla) - {'E06000053', 'E09000001', 'E06000060'})
measures = src.virus.agegroupvaccinations.AgeGroupVaccinations().exc(area_codes=area_codes, area_type='ltla')
logger.info('%d LTLA areas queried.', len(measures))
```

<br>
<br>

### Weights

determining multi-granularity patient flow weights, from LTLA $\longrightarrow$ NHS Trust, via MSOA $\longrightarrow$ NHS Trust numbers

In [2]:
%%bash

python src/catchments/interface.py



2011
2022-02-13 16:24:43.205


2011: weights calculated for approx. 139 trusts
2022-02-13 16:24:56.418


2012
2022-02-13 16:24:56.419


2012: weights calculated for approx. 140 trusts
2022-02-13 16:25:10.309


2013
2022-02-13 16:25:10.310


2013: weights calculated for approx. 140 trusts
2022-02-13 16:25:24.140


2014
2022-02-13 16:25:24.140


2014: weights calculated for approx. 140 trusts
2022-02-13 16:25:37.835


2015
2022-02-13 16:25:37.836


2015: weights calculated for approx. 140 trusts
2022-02-13 16:25:51.067


2016
2022-02-13 16:25:51.067


2016: weights calculated for approx. 140 trusts
2022-02-13 16:26:04.781


2017
2022-02-13 16:26:04.781


2017: weights calculated for approx. 140 trusts
2022-02-13 16:26:18.567


2018
2022-02-13 16:26:18.567


2018: weights calculated for approx. 140 trusts
2022-02-13 16:26:32.859


2019
2022-02-13 16:26:32.859


2019: weights calculated for approx. 140 trusts
2022-02-13 16:26:47.484


For approx. 140 trusts, a file has been created per t

<br>

Delete compute DAG diagrams

In [1]:
!rm -f *.pdf

<br>
<br>

### Vaccination Specific Weights

determining the vaccinations specific multi-granularity patient flow weights; different because its age groupings/brackets differ from the standard 5 year groupings/brackets

In [None]:
%%bash

python src/vaccinations/interface.py

<br>
<br>

### Design Matrix & Outcome Variables


Estimating NHS Trust coronavirus measures per NHS Trust BY transforming LTLA measures to weighted NHS Trust Components via the calculated multi-granularity patient flow weights.  Subsequently, a tensor consisting of the raw matrix of independent variables vectors, and the outcome vector is constructed.

In [None]:
%%bash

python src/design/interface.py

<br>
<br>

## Delete DAG Diagrams

In [None]:
%%bash

rm -rf *.pdf