# Exposure Analysis with pre-created Activity Space Rasters (KDE/DR)

////////////////////////////////////////////////////////////////////////////////////
#### [Metadata]
##### Author: Jay (Jiue-An) Yang, @JiueAnYang
##### Organization: Health Data at Scale Collaboratory, City of Hope
##### Last Updated: Dec 20, 2023
##### Latest Run On: Jun 30, 2022
////////////////////////////////////////////////////////////////////////////////////
***

#### [Requirements]
##### 1. A file directory/.gdb containing pre-generaed **[Activity Space Rasters]** from [GPS points]
##### 2. A file directory/.gdb with **[raster files]** of env variables that needs to be processed
##### 3. A polygon of the Research Area Boundary, will be used to restrict exposure within the area.

---
#### [Update Notes] 
#####  2021-10-16: 1. comment out the place where the KDE-Exposure layers are saved to a new DB

***
### Model Workflow as shown using ArcGIS Model Builder

![Alt text](step2_Calculate_Exposure_withActivitySpace_Raster)

---
## Step 1: Parameter Setup

In [1]:
### Import required modules
import arcpy
from arcpy import env
from arcpy.sa import *
arcpy.CheckOutExtension("Spatial")

import glob
import numpy as np
import pandas as pd
from tqdm.notebook import tqdm
from IPython.display import clear_output

### Set environment options
arcpy.env.overwriteOutput = True

### Specify spatial reference/projection for the analysis and the outputs
spatial_ref = arcpy.SpatialReference('North America Albers Equal Area Conic')


### ----------------------------------- ###
### -----  Set Project Parameters ----- ###  

### Define if this is 'Daily'  or 'Total' exposure calculation
expo_type = 'Daily'             # options: 'Total' or 'Daily'

dataset = 'MENU'                # Set the study name (PQ/RfH/Menu...etc) 
running_method = 'DR'           # Set the methods that you are running (KDE/DR/Buffer...etc)
point_type = 'AllPoint'         # Set the point type that you are running (AllPoint/Stationary/Vehicle/Walking...etc)
exposure_name = 'NDVI2014'      # Set the actual name of the exposure layer (Walkability/RecreationCt/NDVI2014/PM252013 ... etc)
runDate = '20221015'            # Record the date for reference
scale = 200                     # The search radius that was used to create the Activity Space Rasters              


### -------------------------------- ###
### -----  Set Input Data Path ----- ###  

### Project directory and project workspace (the main GDB)
project_default_workspace = r"C:/Users/jiyang/Documents/ArcGIS/Projects/TWSA_PA/TWSA_PA.gdb/" 
project_dir = r"C:/Users/jiyang/Documents/ArcGIS/Projects/TWSA_PA/"
env.workspace = project_default_workspace

### Set the path of the activity space rasters (should be a GDB with rasters)
activity_raster_dir = r"C:/Users/jiyang/Documents/ArcGIS/Projects/TWSA_PA/TWSA_PA_ActivitySpace_PQ_KDE_200r_50c.gdb/"  

### Set the path of the GDB where the environmental raster is stored 
ENV_raster_dir = r"C:/Users/jiyang/Documents/ArcGIS/Projects/TWSA_PA/TWSA_PA_Exposure_Layers.gdb/"

### Set path to some specific layers (research area...etc)
research_area = r"C:/Users/jiyang/Documents/ArcGIS/Projects/TWSA_PA/TWSA_PA_Exposure_Layers.gdb/SoCal_CatchmentArea_Boundary_proj"    

### If you want to save the output Activity Space with Exposure (NOT recommended if storage space is limited)
# KDE_Exposure_output_dir = r"C:/Users/jiyang/Documents/ArcGIS/Projects/TWSA_PA/ActivitySpace_withExposure_.gdb/" 


### ------------------------------- ###
### -----  Set output paths   ----- ###  

## Set output folder, where the exposure results would be saved to
output_dir = r"C:/Users/jiyang/Documents/ArcGIS/Projects/TWSA_PA/Outputs/"

## Set name for final output stored as a Table in the geodatabase
final_output_table = "Exposure_Table_" + expo_type + '_' + exposure_name + '_' + dataset + '_' + running_method + '_' + str(scale) + '_' + runDate

## Set name for final output stored as a .csv file in output folder
final_output_table_csv_name =  "Exposure_Table_" + expo_type + '_' + exposure_name + '_' + dataset + '_' + running_method + '_' + str(scale) + '_' + runDate + ".csv"  

---
## Step 2: Pre-Processing, Check Data and Exposure Layer

### 2.1 Check if input directory contains valid data

In [2]:
### Check if directory contains valid data 

## Set the current workspace to Raster gdb
arcpy.env.workspace = activity_raster_dir

## Get the Activity Space raster list
activity_raster_list = arcpy.ListRasters("*")
total_i = len(arcpy.ListRasters("*"))

## Placeholder for a list of: Participant and Corresponding Raster Names
pt_rasters_lists = []  

if total_i > 0:
    print ("There is a total of {} raster files to be processed in the directory/geodatabase.".format(total_i))   
    for img in activity_raster_list:
        
        if expo_type == 'Daily':
            if ((dataset == 'RFH') or (dataset == 'MENU')) and (running_method=='KDE'):  ## These are customized to the RfH/MENU study dataset, can be changed accordingly
                pt_id =  img.split("_")[0]
                img_date = img.split("_")[5]
                pt_rasters_lists.append([pt_id, img_date])
            else:
                pt_id =  img.split("_")[0]
                img_date = img.split("_")[3]
                pt_rasters_lists.append([pt_id, img_date])
        elif expo_type == 'Total':
            pt_id =  img.split("_")[0]
            pt_rasters_lists.append(pt_id)
        else:
            print ("Check your expo_type parameter !")
        
else:
    print ("Warning! No input rasters were found at your '/raster_data_dir/' directory, double check your path and data.")
    print ("-"*30)
    print ("Your current '/raster_data_dir/' directory is at {}".format(activity_raster_dir))
    
# Set environment back to project default
arcpy.env.workspace = project_default_workspace

There is a total of 2613 raster files to be processed in the directory/geodatabase.


### 2.2 Confirm exposure layers (raster format) are in place

In [5]:
## Get the list of ENV layers ['name','path'], to make sure the exposure rasters are there

## Set the current workspace to ENV gdb
arcpy.env.workspace = ENV_raster_dir

## Get the ENV list
rasters = arcpy.ListRasters("*")   
env_rasters_names = []
for env_raster in rasters:
    env_rasters_names.append(env_raster)
    print ([env_raster, ENV_raster_dir + env_raster])
    
print ("-"*90)
print ("There is a total of {} ENV rasters in the geodatabase: {}".format(len(rasters), env_rasters_names))
    
## Set environment back to project workspace
arcpy.env.workspace = project_default_workspace

['RecreationCt', 'C:/Users/jiyang/Documents/ArcGIS/Projects/TWSA_PA/TWSA_PA_Exposure_Layers_200.gdb/RecreationCt']
['Walkability', 'C:/Users/jiyang/Documents/ArcGIS/Projects/TWSA_PA/TWSA_PA_Exposure_Layers_200.gdb/Walkability']
['NDVI2014', 'C:/Users/jiyang/Documents/ArcGIS/Projects/TWSA_PA/TWSA_PA_Exposure_Layers_200.gdb/NDVI2014']
------------------------------------------------------------------------------------------
There is a total of 3 ENV rasters in the geodatabase: ['RecreationCt', 'Walkability', 'NDVI2014']


---
## Step 3: Calculate Exposure 

### 3.0  (Optional) Run this block if you want to clear the output table before the workflow

In [None]:
## Check the name of the exposure output table (in GDB) 
print (final_output_table)

In [7]:
# Set environment back 
arcpy.env.workspace = project_default_workspace

### Clear final output table if there is data inside
if arcpy.Exists(final_output_table):
    arcpy.DeleteRows_management(final_output_table)
    arcpy.DeleteRows_management("Exposure_Table_" + exposure_name)
    print ("Output table --> {} <-- in geodabase is cleared: ".format(final_output_table))
else:
    print ("Output table not in the geodabase, good to go!")

Output table not in the geodabase, good to go!


### 3.1 Loop through Activity Space and calculate Exposure

In [9]:
from datetime import datetime
start = time.time()

### Define a log file, to keep records 
now = datetime.now()
dt_string = now.strftime("%d-%m-%Y-%H-%M-%S")
log_file_name = "log_" + dataset + '_' + dt_string + ".txt"

### Inititate the run with log file recording msgs
with open(output_dir + log_file_name, "a") as f:
    
    ### Loop through all Activity Rasters in the data directory
    i = 0
    total_i = len(activity_raster_list)
    for item in tqdm(list(zip(pt_rasters_lists, activity_raster_list))):   ### Full-Run with TDQM
    
        clear_output(wait=True)
        
        ## Get the PT ID (and date, for daily) that are being processed
        if expo_type == 'Daily':
            pt_ID = item[0][0]
            date = item[0][1]
        else:
            pt_ID = item[0]
            
        raster_file = item[1]

        msg = "Working on : {pt}-{dt} ({index}/{total})".format(pt = pt_ID, dt= date, index = i+1, total= total_i)
        print (msg)

        try: 
            ### -------------------------------------------------------------
            ### Step 1: Process: Raster Calculator (Raster Calculator) (sa)
            ### -------------------------------------------------------------
            weighted_exposure = Raster(ENV_raster_dir + exposure_name) * Raster(activity_raster_dir + raster_file)
            ### Uncomment these two lines to save the Exposure Output Raster into another geodatabase
            # output_raster_name= "/KDE_Exposure_" + exposure_name + "_" + pt_ID + "_" + date 
            # weighted_exposure.save(KDE_Exposure_output_dir + output_raster_name) 
            print ("Step 1: Weight exposure computed")

            
            ### --------------------------------------------------------
            ### Step 2: Calculate statistics for the participant
            ### --------------------------------------------------------
            exposure_statistics_table = ZonalStatisticsAsTable(research_area, "OBJECTID", weighted_exposure, env.workspace + "/Exposure_Table_" + exposure_name, "DATA", "ALL")
            print ("Step 2: Exposure statistics computed")

            
            ### --------------------------------------------------------
            ### Step 3: Add an new fields to the table
            ### --------------------------------------------------------

            ### Add PT_ID to the field
            arcpy.AddField_management("Exposure_Table_" + exposure_name, "PT_ID", "TEXT", "","", 100, "", "NULLABLE", "","")
            arcpy.CalculateField_management("Exposure_Table_" + exposure_name, "PT_ID", "pt_ID", "PYTHON3")    
            
            ### Add DATE to the field if this is for DAILY exposure
            if expo_type == 'Daily':
                arcpy.AddField_management("Exposure_Table_" + exposure_name, "date_int", "LONG", "","", 100, "", "NULLABLE", "","")
                arcpy.CalculateField_management("Exposure_Table_" + exposure_name, "date_int", "date", "PYTHON3")    

            ### Add exposure_layer name to the field
            arcpy.AddField_management("Exposure_Table_" + exposure_name, "exposure_layer", "TEXT", "","", 100, "", "NULLABLE", "","")
            arcpy.CalculateField_management("Exposure_Table_" + exposure_name, "exposure_layer", "exposure_name", "PYTHON3")    
            print ("Step 3: Additional files added to output table.")
            
            ### ------------------------------------------------------------------------
            ### Step 4: Append the participant results table to the final output table
            ### ------------------------------------------------------------------------
            if arcpy.Exists(final_output_table):
                arcpy.Append_management("Exposure_Table_" + exposure_name, final_output_table, "NO_TEST")
                print ("Step 4: Exposure results for [", pt_ID, "] added to output table")
            else:
                arcpy.Copy_management("Exposure_Table_" + exposure_name, final_output_table)

        except:
            ### Write an Error Msg to the log file to record the PT-Date that was not processed
            if expo_type == 'Daily':
                msg = "\nExposure was not calculated run for : {pt}-{dt}".format(pt = pt_ID, dt= date)
            else:
                msg = "\nExposure was not calculated run for : {pt}".format(pt = pt_ID)
            f.write(msg)

        i+=1

    # --------------------------------------------------------
    # Step 5: Convert Table to Pandas Dataframe
    # --------------------------------------------------------
    
    ### Get columns names from result table
    field_names = [f.name for f in arcpy.ListFields(final_output_table)]
    
    ### Convert table to NP Array, and convert NP Array to Pandas Dataframe
    np_arr = arcpy.da.TableToNumPyArray(final_output_table, field_names)
    df = pd.DataFrame(data = np_arr)
    
    ### Add additional information columns to output file 
    df['study']  = dataset
    df['method'] = running_method
    df['point_type'] = point_type
    df['scale'] = scale
    
    ### Reorganize the column orders and export to .csv
    if expo_type == 'Daily':
        df[['study','PT_ID', 'date_int','point_type','scale','method','exposure_layer',
            'COUNT', 'AREA', 'MIN', 'MAX', 'RANGE', 'MEAN', 'STD', 'SUM', 'MEDIAN', 'PCT90']].to_csv(output_dir + final_output_table_csv_name, index = False)
    else:
        df[['study','PT_ID', 'point_type','scale','method','exposure_layer',
            'COUNT', 'AREA', 'MIN', 'MAX', 'RANGE', 'MEAN', 'STD', 'SUM', 'MEDIAN', 'PCT90']].to_csv(output_dir + final_output_table_csv_name, index = False)
        
    ### Some text output for the user
    print ("-"*60)
    print ("Exposure Analysis for {} + {} + {} Completed !".format(dataset,running_method,exposure_name))
    print ("")
    print ("Output table available at : ", output_dir + final_output_table_csv_name)

    end = time.time()
    print("Exposure task complete, total time spent : {} ".format(end - start))
    
    f.write("\n")
    f.write("-"*60)
    f.write("\nExposure task complete, total time spent : {} ".format(end - start))    

    ### Note: Final output table is saved as a .csv file in the defined [final_output_table_output_dir] location.

  0%|          | 0/2613 [00:00<?, ?it/s]

------------------------------------------------------------
Exposure Analysis for MENU + DR + NDVI2014 Completed !

Output table available at :  C:/Users/jiyang/Documents/ArcGIS/Projects/TWSA_PA/Outputs/Exposure_Table_Daily_NDVI2014_MENU_DR_200_20220914.csv
Exposure task complete, total time spent : 13952.722834348679 
