<a href="https://colab.research.google.com/github/samvatsan/MaskTask/blob/master/MaskTask_Code.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Parts of the Program
(The code may be out of order as it was imported from multiple files)
1. Raw Mask Data Generator (generation of random numbers based on Gaussian Distribution—around a mean and standard deviation—to model data that the program would get from the iButton Hygrochron sensor)
    **This program is located toward the end of the code, and is marked by a text block**

2. Generated data used in determining whether mask is on or off
    **This is located right above the random generation code**, and uses a threshold value for temperature and relative humidity within a mask (data given by the random generation function) in order to determine when a mask is on or off, and list the values when it is on

3. Calculate the number of hours the mask is worn (given timestamps for when it was worn)
      This portion is not complete

4. Create a dataframe that lists the names of the people who wore the mask for the required amount of time (or more) and another dataframe listing the people (or MaskID) that did not wear the mask for the required amount of time
    Using a threshold value for how long the mask should be on, we created this program around a sample community of students, who were expected to wear their mask for 6 hours (a week). **This code is located toward the beginning of the file.**

    NOTE: All realistically simulated data for humidity and temperature, as well as the calculation of hours and assignment of thresholds were done using research from a study conducted ([link text](https://www.ncbi.nlm.nih.gov/pmc/


Calculation from hours into names (part 3)

In [None]:
# Code to read csv file into Colaboratory:
!pip install -U -q PyDrive
!pip install -U protobuf==3.8.0

# Needed to manipulate data
import pandas as pd

# Needed to convert to numpy to feed keras
import numpy as np

# Needed to access data files from Gdrive
#from pydrive.auth import GoogleAuth
#from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
import gspread

Requirement already up-to-date: protobuf==3.8.0 in /usr/local/lib/python3.6/dist-packages (3.8.0)


In [None]:
#This function returns the dataframe of the csv file
def df_from_sheet(link, sheet_name):
  wb = gc.open_by_url(link)
  sheet = wb.worksheet(sheet_name)
  data = sheet.get_all_values()
  df = pd.DataFrame(data)

  # Move first row data to header row
  new_header = df.iloc[0] #grab the first row for the header
  df = df[1:] #take the data less the header row
  df.columns = new_header #set the header row as the df header
  # Strip spaces from column headers
  df = df.rename(columns=lambda x: x.strip())

  return df


In [None]:
#function that checks if the temperature is less than a threshold
  #threshold - NOT the minimum time for wearing a mask
  #(different threshold for this funtion)
def get_all_below_threshold(df, col_name, threshold_value):
  return df[df[col_name] < threshold_value].dropna()

#function that checks if the humidity is greater than threshold
def get_all_above_threshold(df, col_name,  threshold_value):
  return df[df[col_name] > threshold_value].dropna()

def get_good_rows(df, col1, thresh1, col2, thresh2):
  df1 = get_all_above_threshold(df, col1, thresh1)
  df2 = get_all_above_threshold(df, col2, thresh2)
  return pd.concat([df1, df2]).drop_duplicates().reset_index(drop=True)

In [None]:
# Authenticate (user) and create the PyDrive client. All my data is in there
auth.authenticate_user()
gc = gspread.authorize(GoogleCredentials.get_application_default())


In [None]:
# Load Hours data
durationDF = df_from_sheet('https://docs.google.com/spreadsheets/d/1kJVCIEfbbdKgTj3Qp7Bzu5GnSlht00gYOJswEeK9Nf8/edit?usp=sharing', 'Person Mask hours')
list(durationDF.columns.values)

['Name', 'Hours']

In [None]:
#class used for color of text displayed on website
class color:
   PURPLE = '\033[95m'
   CYAN = '\033[96m'
   DARKCYAN = '\033[36m'
   BLUE = '\033[94m'
   GREEN = '\033[92m'
   YELLOW = '\033[93m'
   RED = '\033[91m'
   BOLD = '\033[1m'
   UNDERLINE = '\033[4m'
   END = '\033[0m'


In [None]:
# Convert the Hours column to numeric type (it is string by default when read from Google sheets) - so we can statistically compute on that column
durationDF[["Hours"]] = durationDF[["Hours"]].apply(pd.to_numeric)
# Sort column by descending order
sorted_df = durationDF.sort_values(["Hours"], ascending=False)
print(sorted_df)

# People should wear the mask atleast for THRESHOLD_HOURS to prevent contagious transmissions
THRESHOLD_HOURS = 6

# OFfendors are those who did not wear a mask atleast for the threshold value
offendors = get_all_below_threshold(durationDF, "Hours", THRESHOLD_HOURS)

# Do-Gooders are those who wear the mask atleast for the threshold value
goodfolks = get_all_above_threshold(durationDF, "Hours", THRESHOLD_HOURS)

print(color.RED + color.BOLD + 'PRINCIPAL\'S NOTICE BOARD POST OF OFFENDERS' + color.END)
print(offendors["Name"])
print(color.DARKCYAN + color.BOLD + 'PRINCIPAL\'S NOTICE BOARD POST OF DO-GOODERS' + color.END)
print(goodfolks)


0  Name  Hours
4     D    8.2
1     A    7.9
8     H    7.5
9     I    7.2
2     B    6.5
5     E    6.0
6     F    5.6
3     C    4.0
7     G    3.8
10    J    1.0
[91m[1mPRINCIPAL'S NOTICE BOARD POST OF OFFENDERS[0m
3     C
6     F
7     G
10    J
Name: Name, dtype: object
[36m[1mPRINCIPAL'S NOTICE BOARD POST OF DO-GOODERS[0m
0 Name  Hours
1    A    7.9
2    B    6.5
4    D    8.2
8    H    7.5
9    I    7.2


Print raw mask data and classify mask as on or off (if someone is wearing it or not) (part 2)


In [None]:
# Load Mask data
rawDF = df_from_sheet('https://docs.google.com/spreadsheets/d/1kJVCIEfbbdKgTj3Qp7Bzu5GnSlht00gYOJswEeK9Nf8/edit?usp=sharing', 'Raw Mask Data')
#list(rawDF.columns.values)
rawDF

Unnamed: 0,MaskID,Timestamp,RelativeHumidity,Temperature
1,A,7:00,55.2,19.7
2,A,7:10,62.1,25.6
3,A,7:20,63.0,25.8
4,A,7:30,73.0,23.8
5,A,7:40,70.9,30.5
6,A,7:50,62.0,27.4
7,A,8:00,60.0,20.3
8,A,9:00,55.0,21.3
9,A,10:00,54.9,31.4
10,B,7:00,90.0,23.5


In [None]:
rawDF[["RelativeHumidity"]] = rawDF[["RelativeHumidity"]].apply(pd.to_numeric)
rawDF[["Temperature"]] = rawDF[["Temperature"]].apply(pd.to_numeric)
mask_on_humidity = get_all_above_threshold(rawDF, "RelativeHumidity", 65)
mask_on_humidity
#mask_on

#25th percentiles//threshold values: T/on: 24.5  (CELSIUS); H/on: 65 (PERCENT)
#before we drop the values that fall below the threshold, we classify it as "off" (append to a list called off)
#similarly, append the values that are above to a list called "on"
##GENERATE RANDOM FLOATS WITHIN A RANGE TO FILL RANDOM DATA IN CHART

#FOR LATER: maybe find get all below thresh and then subtract the amount of time that they are  


Unnamed: 0,MaskID,Timestamp,RelativeHumidity,Temperature
4,A,7:30,73.0,23.8
5,A,7:40,70.9,30.5
10,B,7:00,90.0,23.5
12,B,9:00,71.2,22.6
13,B,10:00,92.3,35.6
14,B,11:00,80.0,32.3
15,B,12:00,85.4,18.7
16,B,13:00,91.9,15.6


In [None]:
#call get_all_above_threshold function to test if it works
mask_on_temp = get_all_above_threshold(rawDF, "Temperature", 24.5)
mask_on_temp

Unnamed: 0,MaskID,Timestamp,RelativeHumidity,Temperature
2,A,7:10,62.1,25.6
3,A,7:20,63.0,25.8
5,A,7:40,70.9,30.5
6,A,7:50,62.0,27.4
9,A,10:00,54.9,31.4
13,B,10:00,92.3,35.6
14,B,11:00,80.0,32.3


In [None]:
#returns any MaskID with the "correct" humidity AND temperature
two_tables = get_good_rows(rawDF, "RelativeHumidity", 65, "Temperature", 24.5)
two_tables

Unnamed: 0,MaskID,Timestamp,RelativeHumidity,Temperature
0,A,7:30,73.0,23.8
1,A,7:40,70.9,30.5
2,B,7:00,90.0,23.5
3,B,9:00,71.2,22.6
4,B,10:00,92.3,35.6
5,B,11:00,80.0,32.3
6,B,12:00,85.4,18.7
7,B,13:00,91.9,15.6
8,A,7:10,62.1,25.6
9,A,7:20,63.0,25.8


Random Mask Data Generation (part 1) -- we do not have access to a sensor, so it was more feasible to generate our own data

(copied from other Colab file)

In [None]:
# Code to read csv file into Colaboratory:
#!pip install -U -q PyDrive
#!pip install -U protobuf==3.8.0
!pip install -U gspread

# Needed to manipulate data
import pandas as pd

# Needed to convert to numpy to feed keras
import numpy as np

# Needed to access data files from Gdrive
#from pydrive.auth import GoogleAuth
#from pydrive.drive import GoogleDrive
from google.colab import auth
import gspread
from oauth2client.client import GoogleCredentials
import datetime
import random


Requirement already up-to-date: gspread in /usr/local/lib/python3.6/dist-packages (3.6.0)


In [None]:
# Authenticate (user) and create the PyDrive client. All my data is in there
auth.authenticate_user()
gc = gspread.authorize(GoogleCredentials.get_application_default())

In [None]:
LOW_RH = 29
HIGH_RH = 93
LOW_TEMP = 21
HIGH_TEMP = 37
NUM_STUDENTS = 10
NUM_SAMPLES = 100
START_TIME = datetime.datetime(2020, 7,24, 8, 0, 0)
END_TIME = datetime.datetime(2020, 7,24, 15, 0, 0)
initial_data = [[1, START_TIME, LOW_RH, LOW_TEMP]]
TIME_FORMAT="%Y-%m-%d %H:%M:%S"

df = pd.DataFrame(columns=['StudentID', 'Timestamp', 'RelativeHumidity', 'Temperature'])

# Mask has been caliberated at beginning of day
for student_id in range(1, NUM_STUDENTS+1):
    df = df.append({'StudentID': student_id, 'Timestamp': START_TIME.strftime(TIME_FORMAT), 'RelativeHumidity': LOW_RH, 'Temperature': LOW_TEMP}, ignore_index = True)
df

Unnamed: 0,StudentID,Timestamp,RelativeHumidity,Temperature
0,1,2020-07-24 08:00:00,29,21
1,2,2020-07-24 08:00:00,29,21
2,3,2020-07-24 08:00:00,29,21
3,4,2020-07-24 08:00:00,29,21
4,5,2020-07-24 08:00:00,29,21
5,6,2020-07-24 08:00:00,29,21
6,7,2020-07-24 08:00:00,29,21
7,8,2020-07-24 08:00:00,29,21
8,9,2020-07-24 08:00:00,29,21
9,10,2020-07-24 08:00:00,29,21


In [None]:
prev_time = START_TIME
random.seed(1)
for student_id in range (1, NUM_STUDENTS+1):
  for ts in pd.date_range(start=START_TIME + datetime.timedelta(hours=1), end=END_TIME, periods=NUM_SAMPLES):
    rh = round(random.gauss(70,17),1)
    temp = round(random.gauss(25,5),1)
    df = df.append({'StudentID': student_id, 'Timestamp': ts.strftime(TIME_FORMAT), 'RelativeHumidity': rh, 'Temperature': temp}, ignore_index = True)

df

Unnamed: 0,StudentID,Timestamp,RelativeHumidity,Temperature
0,1,2020-07-24 08:00:00,29,21
1,2,2020-07-24 08:00:00,29,21
2,3,2020-07-24 08:00:00,29,21
3,4,2020-07-24 08:00:00,29,21
4,5,2020-07-24 08:00:00,29,21
...,...,...,...,...
1005,10,2020-07-24 14:45:27,83.8,27.8
1006,10,2020-07-24 14:49:05,59.9,26.1
1007,10,2020-07-24 14:52:43,38.4,28.1
1008,10,2020-07-24 14:56:21,80.9,25.7
