ML bases covid cases nowcasting

Description

TLDR: This project trains a logistic regression model on daily vital data (resting heart rate and steps) that is augmented with survey data consisting of symptoms and sex and age as well as covid variant shares to predict covid test results. The trained model is then used to estimate the covid case incidence in the user population over time.

Figures in this README are updated with latest data on a daily basis via Github actions.

Usage:

from the project root:

run poetry install to setup env and install dependencies,
run make notebook to start jupyter notebook in virtual env.

Data:

Data consists of daily values for resting heart rate and steps for each user on many (not all) days. In addition, users report approximately once a week on which symptoms they experienced and whether they got tested for COVID during the past seven days. If they took a test, they also report the test result. In addition, users reported age and sex once.

Feature Construction:

Reported symptoms are coded as follows:

symptoms: 1 if the user experienced the symptom and 0 if not,
Age: as age groups in 5 year brackets starting 1935, ending 2010 labeled as integers from 1 to 16,
Sex: 1 female, 2 male, 3 diverse
Resting heart rate (rhr): beats per minute once per day
Steps: number of steps per day once per day

For vital data we construct the following derived features:

We calculate the median over 60 days before the week for which symptoms and test results were reported, if these 60 days contain data on more than 30 days. From that, we subtract the maximum value during the seven days for which test and symptoms were reported (if the week contains data on 3 or more days). Same for Steps except that instead of the maximum, we subtract the mean during the test week.

Model:

The Model contains of sklearn normal scaler and logistic regression classifiers:

import pandas as pd
from typing import List
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

def train_model(features: List[str], target: str, data: pd.DataFrame):

    X = features[features].values
    y = features[target].astype(int).values

    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.33, random_state=42
    )

    model = make_pipeline(StandardScaler(), LogisticRegression())
    model.fit(X_train, y_train)

    return model, X_test, y_test

Model evaluation

To evaluate the models quality, I use the following plots displaying precission and recall vs. decision threshold in A), true positive and false positive rate vs. decision threshold in B, feature importance as regression coefficients in C) and confusion matrix in D).

To nowcast covid cases I

score all data (also the data without test results) with the trained model,
classify the resulting infection probabilities with a threshold of 0.5
exclude positive classifications in weeks after a previous positive classification to only count new cases
use a rolling average over 7 days, normalize by all observations on each day and multiply by to calculate the daily incidence per 100.000.

This incidence is compared to reported incidences stratified by vaccination status in the following plot:

Name		Name	Last commit message	Last commit date
Latest commit History 424 Commits
.github/workflows		.github/workflows
.gitsecret		.gitsecret
model		model
.gitattributes		.gitattributes
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

.gitsecret

.gitsecret

model

model

.gitattributes

.gitattributes

.gitignore

.gitignore

Makefile

Makefile

README.md

README.md

poetry.lock

poetry.lock

pyproject.toml

pyproject.toml

Repository files navigation

ML bases covid cases nowcasting

Description

Usage:

Data:

Feature Construction:

Model:

Model evaluation

About

Releases

Packages

Contributors 2

Languages

jakobkolb/ml-covid-nowcasting

Folders and files

Latest commit

History

Repository files navigation

ML bases covid cases nowcasting

Description

Usage:

Data:

Feature Construction:

Model:

Model evaluation

About

Resources

Stars

Watchers

Forks

Languages