# Preliminaries

In [None]:
!rm -f *.pdf

<br>

### Paths

In [None]:
import os
import pathlib
import sys

In [None]:
if not 'google.colab' in str(get_ipython()):
    
    parts = pathlib.Path(os.getcwd()).parts    
    limit = max([index for index, value in enumerate(parts) if value == 'infections'])    
    parent = os.path.join(*list(parts[:(limit + 1)]))
    
    sys.path.append(os.path.join(parent, 'src'))


In [None]:
parent

<br>
<br>

### Libraries

In [None]:
%matplotlib inline

import datetime

import logging
import collections

import numpy as np
import pandas as pd

import time

<br>
<br>

### Custom

In [None]:
import src.preprocessing.interface

import src.virus.measures
import src.virus.agegroupcases
import src.virus.agegroupvaccinations

import config

<br>

Setting-up

In [None]:
configurations = config.Config()

<br>

The coronavirus.data.gov.uk API (application programming interface) data fields that would be extracted per LTLA (lower tier local authority) geographic area, and per NHS Trust, of England. 

In [None]:
fields_ltla = configurations.fields_ltla
fields_trusts = configurations.fields_trust

<br>

England's unique set of LTLA & NHS Trust codes.

In [None]:
districts = configurations.districts()
codes_ltla = districts.ltla.unique()

In [None]:
trusts = configurations.trusts()
codes_trusts = trusts.trust_code.unique()

<br>
<br>

### Logging

In [None]:
logging.basicConfig(level=logging.INFO,
                    format='\n\n%(message)s\n%(asctime)s.%(msecs)03d',
                        datefmt='%Y-%m-%d %H:%M:%S')
logger = logging.getLogger(__name__)

<br>
<br>

# Part I

## Integration, Features Engineering


### The Supplementary Data Files

In [None]:
times = src.preprocessing.interface.Interface().exc()

<br>

Delete compute DAG diagrams

In [None]:
!rm -f *.pdf

<br>

Times

In [None]:
pd.DataFrame.from_records(data=times['programs'])

<br>
<br>

### coronavirus.data.gov.uk

England's SARS-CoV-2 infections and coronavirus 19 disease measures are acquireable via the United Kingdom's coronavirus.data.gov.uk API.  Four different data sets are of interest, which are read via the 4 steps that follow.  Instead of the 4 steps below you may run

> %%bash
>
> `python src/virus/interface.py`

 <br>
 
 **Lower Tier Local Authority Level Measures**

> ```python
measures = src.virus.measures.Measures(fields=fields_ltla, path=os.path.join('ltla', 'measures')) \
    .exc(area_codes=codes_ltla, area_type='ltla')
logger.info('%d LTLA areas queried.', len(measures))
time.sleep(60)
```

<br>

**Trust level measures**

> ```python
measures = src.virus.measures.Measures(fields=fields_trusts, path=os.path.join('trusts', 'measures')) \
    .exc(area_codes=codes_trusts, area_type='nhsTrust')
logger.info('%d NHS Trusts queried.', len(measures))
time.sleep(60)
```

<br>

**LTLA Level measures: Cases disaggregated by Age Group**

> ```python
measures = src.virus.agegroupcases.AgeGroupCases().exc(area_codes=codes_ltla, area_type='ltla')
logger.info('%d LTLA areas queried.', len(measures))
time.sleep(60)
```

<br>

**LTLA Level measures: Vaccinations disaggregated by Age Group** 

A few areas do not have any data, albeit their request response status is 200

> ```python
area_codes = list(set(codes_ltla) - {'E06000053', 'E09000001', 'E06000060'})
measures = src.virus.agegroupvaccinations.AgeGroupVaccinations().exc(area_codes=area_codes, area_type='ltla')
logger.info('%d LTLA areas queried.', len(measures))
```

<br>
<br>

### Weights

determining multi-granularity patient flow weights, from LTLA $\longrightarrow$ NHS Trust, via MSOA $\longrightarrow$ NHS Trust numbers

In [None]:
%%bash

python src/catchments/interface.py

<br>

Delete compute DAG diagrams

In [None]:
!rm -f *.pdf

<br>
<br>

### Vaccination Specific Weights

determining the vaccinations specific multi-granularity patient flow weights; different because its age groupings/brackets differ from the standard 5 year groupings/brackets

In [None]:
%%bash

python src/vaccinations/interface.py

<br>
<br>

### Design Matrix & Outcome Variables


Estimating NHS Trust coronavirus measures per NHS Trust BY transforming LTLA measures to weighted NHS Trust Components via the calculated multi-granularity patient flow weights.  Subsequently, a tensor consisting of the raw matrix of independent variables vectors, and the outcome vector is constructed.

In [None]:
%%bash

python src/design/interface.py

<br>
<br>

## Delete DAG Diagrams

In [None]:
%%bash

rm -rf *.pdf