# Back-track people with symptoms using geo-location data to identify exposed population

* The code gets the **location data** of a population as an input
* It returns: **risky population**, together with a **SCORE** - The score is calculated based on: **number of encounters with infected**, **distance from infected**, **time spent with infected**

In [1]:
# First load some necessary packages
import pandas as pd
import datetime
import numpy as np
import math

# Load backtrack
import backtrack

In [2]:
# load a sample geo-location data
# the data has to contain 4 columns: latitude, longitude, time and user_id

data = pd.read_csv('data\\locations_sample.csv', header=None).rename(columns={1:'lat',2:'lng',3:'time',4:'id'}).drop(columns=[0])
data['time'] = pd.to_datetime(data['time'])
data = data.drop_duplicates()
data.sample(5)

Unnamed: 0,lat,lng,time,id
242288,-20.596386,-47.647472,2018-04-19 09:16:48,9817
94399,-18.925207,-48.279933,2018-04-03 20:07:53,8207
423741,-18.919943,-48.330219,2018-05-26 04:06:06,9752
380005,-19.870217,-44.60679,2018-05-17 05:38:56,10017
326131,-18.912178,-48.285765,2018-05-06 22:48:38,10624


In [3]:
# We provide following parameters
target_id = 3706 # This is the ID of the infected user
max_radius = 70 # In meters. This will eliminate all the encounters with more than max_radius meters distance
time_window = 20 # In minutes. Time window when two persons appear in the similar location. Depending on the time granularity of the data we can decide what is the minimum time step.

In [6]:
# we run the code
backtrack.get_risky(data, target_id, max_radius, time_window)

Unnamed: 0,id,duration,lat_point,lng_point,min_dist,num_encounters,min_dist_inverse,score
0,4526,45000.0,-19.747628,-47.93411,17.624308,26.0,0.05674,26.0
1,2102,16200.0,-19.747667,-47.934342,21.266039,9.0,0.047023,2.685162
