# Back-track people with symptoms using geo-location data to identify exposed population

### The code takes inputs:
- **location data** of a population
- **optional parameters** - *max_radius*, *time_window*

### The code returns:
- **id** - IDs of risky people
- **duration** - Time they spent with the infected in the same radius in minutes
- **lat_point** and **lng_point** - Coordinates of a their location closest to the infected
- **min_dist** - Minimal distance between the infected and risky
- **num_encounters** - Number of encounters with the infected
- **score** - Risk score calculated as *duration(normalized)* X *num_encounters(normalized)* X *min_dist_inverse*

In [1]:
# First load some necessary packages
import pandas as pd
import datetime
import numpy as np
import math

# Load backtrack
import backtrack

In [2]:
# load a sample geo-location data
# the data has to contain 4 columns: latitude, longitude, time and user_id

data = pd.read_csv('data\\locations_sample.csv', header=None).rename(columns={1:'lat',2:'lng',3:'time',4:'id'}).drop(columns=[0])
data['time'] = pd.to_datetime(data['time'])
data = data.drop_duplicates()
data.sample(5)

Unnamed: 0,lat,lng,time,id
287967,-18.889738,-48.300511,2018-04-29 00:59:31,4407
76419,-18.907586,-48.329319,2018-04-01 08:06:02,6766
257171,-21.169048,-47.827628,2018-04-23 06:05:10,9988
256467,-18.91565,-48.281382,2018-04-23 07:49:54,5673
19606,-18.861924,-48.868697,2018-03-04 10:20:49,7682


In [3]:
# We provide following parameters
target_id = 3706 # This is the ID of the infected user
max_radius = 70 # In meters. This will eliminate all the encounters with more than max_radius meters distance
time_window = 20 # In minutes. Time window when two persons appear in the similar location. Depending on the time granularity of the data we can decide what is the minimum time step.

In [4]:
# we run the code
backtrack.get_risky(data, target_id, max_radius, time_window)

[{'id': 4526,
  'duration': 45000.0,
  'lat_point': -19.747628,
  'lng_point': -47.9341096,
  'min_dist': 17.62430757596907,
  'num_encounters': 26.0,
  'score': 0.05673981775962141},
 {'id': 2102,
  'duration': 16200.0,
  'lat_point': -19.7476675,
  'lng_point': -47.9343416,
  'min_dist': 21.266039243729303,
  'num_encounters': 9.0,
  'score': 0.005859830464299074}]

The code returns a list of dictionaries with one dictionary for each risky individual and the following keys:
- 'id' - ID of the risky individual
- 'duration' - the total time in minutes, that individual was in the vicinity of the infected
- 'lat_point' - latitude of the point where the risky individual was the nearest to the infected
- 'lng_point' - longitude of the point where the risky individual was the nearest to the infected
- 'min_dist' - minimal distance recorded between the risky individual and the infected individual
- 'num_encounters' - total number of relevant recorded points when the risky individual was near the infected
- 'score' - the risk score, calculated as duration(normalized) x num_encounters(normalized) x min_dist_inverse