# Use Case: Visualizing Scheduling of Skilled Trades Work Orders

### Original Context: Ongoing evaluation of Work Order Reform

This notebook contains an example of applied analysis using a subset of work order data. These data represent "cases" associated with the Work Order Reform program -- that is, units of work (typically one work order, but perhaps more) assigned to a particular trade, in a particular location, scheduled to start on a particular day. The query yielding these cases can be found in the "Data/" directory horizontal to this notebook.

This analysis makes use of the same libraries and functions used in Example Notebook 0. This document therefore contains fewer annotations throughout, and those that exist pertain mainly to functionality not covered in ealier notebooks or to the logic of modifications made to this dataset in particular.


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime

pd.set_option('display.max_columns',None)

Note two pieces of `datetime` functionality used below:
- Once data in a column are converted to `datetime` objects (as in lines 1 and 3 of the below, the date portion of that datetime can be accessed using the syntax `<datetime object>.date()`
- Additional portions of a date and/or time can be accessed or modified using the various functionality in the `datetime` library. In line four, for example, the function `dt.to_period()` is called with the argument `'M'`, for month, yielding the month associated with a particular date.

In [None]:
wos = pd.read_csv('Data/WOR_cases_20220824.csv')

wos['REPORTDATE'] = pd.to_datetime(wos['CREATEDATE'])
wos['REPORTDATE'] = wos['REPORTDATE'].apply(lambda x: x.date())
wos['SCHEDSTART'] = pd.to_datetime(wos['STARTDATE'])
wos['SCHEDMONTH'] = wos['SCHEDSTART'].dt.to_period('M')
wos['SCHEDSTART'] = wos['SCHEDSTART'].apply(lambda x: x.date())


The following line restricts our `DataFrame` wos to only those cases that are scheduled for work on or after Aug. 24, 2022 (which is created as a `datetime.date` object). 

In [None]:
wos = wos[wos['SCHEDSTART'] >= datetime.date(2022,8,24)]

The following function marks each case as 'PRE' or 'POST', representing whether the date in the column passed into the function occurred before or after implementation of Work Order Reform in the borough where the case is located. Note that this analysis was created prior to implementation in Manhattan.

In [None]:
def pre_post_wor(row, datecol):
    if row['SITEID'] == 'QS':
        if row[datecol] < datetime.date(2021,11,9):
            return 'PRE'
        elif row[datecol] >= datetime.date(2021,11,9):
            return 'POST'
        else:
            return None
    
    if row['SITEID'] == 'BX':
        if row[datecol] < datetime.date(2022,3,14):
            return 'PRE'
        elif row[datecol] >= datetime.date(2022,3,14):
            return 'POST'
        else:
            return None
    
    if row['SITEID'] == 'BK':
        if row[datecol] < datetime.date(2022,8,1):
            return 'PRE'
        elif row[datecol] >= datetime.date(2022,8,1):
            return 'POST'
        else:
            return None
        
    else:
        return 'PRE'
            
    pass

Using the `apply()` functionality of our `DataFrame`, we run our `pre_post_wor()` function on each row, passing in as arguments the content of that row and the name of our date column (in this case, `'REPORTDATE'`). Note that the `axis = 1` argument passed into `apply()` signfies that we are operating on each **row**, as opposed to each **column** (in which case we would use `axis = 0`).

In [None]:
wos['PREPOST'] = wos.apply(lambda row: pre_post_wor(row, 'REPORTDATE'), axis=1)

Pandas' `pivot_table()` function is yet another way to reshape data into the form of our choosing, in a manner that should be familiar to anyone who has used Excel's PivotTable functionality. The syntax is somewhat complicated -- please reference the following guide and documentation for a full explanation:
- [Pandas Pivot Table: A Guide](https://builtin.com/data-science/pandas-pivot-tables)
- [Official documentation for pandas.pivot_table()](https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html)

In [None]:
#Note that, although we provide 'STARTDATE' as our values column in this case, we could just as easily use any other
#column, since our aggregation function -- which is what actually fills in our "cells" -- is 'count'
current_by_month = wos.pivot_table(index=['SITEID','ZZCRAFT'], columns='SCHEDMONTH', values='STARTDATE', aggfunc='count')

#Here, we produce a normalized version of the above by dividing the contents of each cell by the sum of its row
norm_by_month = current_by_month.div(current_by_month.sum(axis=1), axis=0)

#Here, we add a new column containing the total of each row
current_by_month['TOTAL'] = current_by_month.sum(axis=1)

In [None]:
wos

In [None]:
current_by_prepost = wos.pivot_table(index=['SITEID','ZZCRAFT','SCHEDMONTH','PREPOST'],  
                          values='REPORTDATE', aggfunc='count').fillna(0)

#We can rename our single column to better reflect that it is, in fact, a count of cases
current_by_prepost.columns = ['CASE_COUNT']

pct_new = wos.pivot_table(index=['SITEID','ZZCRAFT','SCHEDMONTH'], 
                          columns='PREPOST', 
                          values='REPORTDATE', aggfunc='count').fillna(0)

pct_new['PCT_POST'] = pct_new['POST']/(pct_new['POST']+pct_new['PRE'])
pct_new = pct_new.reset_index().pivot_table(index=['SITEID','ZZCRAFT'], columns=['SCHEDMONTH'], values='PCT_POST')


In [None]:
###Write multiple sheets to Excel using ExcelWriter
writer = pd.ExcelWriter('Scheduled_WOs_by_Craft_and_Month_07052022.xlsx')

current_by_month.fillna('-').to_excel(writer, sheet_name='Total by Craft, Month')
norm_by_month.fillna('-').to_excel(writer, sheet_name = 'Percent by Craft, Month')
current_by_prepost.fillna('-').to_excel(writer, sheet_name = 'Totals by Craft, Month, PrePost')
pct_new.fillna('-').to_excel(writer, sheet_name = 'Pct Post-WOR by Craft, Month')

writer.save()

### Visualizing Scheduling by Trade and Borough

In addition to the `FacetGrid()` functionality explored in Example 0, Seaborn allows us to visualize subsets of data in a single step using the `catplot()` function (for "category plot"), which can produce several kinds of chart based on the `kind` value it is provided.

Here, we provide the following specifications:
- Our `data` come from `current_by_prepost`, with its index reset
- The x axis in each of our plots comes from the `'SCHEDMONTH'` column
- Our y values come from the `'CASE_COUNT'` column
- Each `row` represents a different value of `'ZZCRAFT'` -- i.e., a diffent skilled trade
- Each column represents a different value of `'SITEID'`, which contains borough information
- Tallies of pre- and post-WOR cases are represented by different colors, as indicated by the argument `hue = 'PREPOST'` and shown in the figure's legend
- We set our `kind` to `'bar'`, for  a bar graph. See the documentation for `sns.catplot()` for other possible values.

In [None]:
plot = sns.catplot(data=current_by_prepost.reset_index(), x='SCHEDMONTH', y='CASE_COUNT', row='ZZCRAFT', col='SITEID', hue='PREPOST', kind='bar')
plt.savefig(f"Plots/full_grid.pdf", bbox_inches='tight')

### Creating plots en masse

With just a bit of knowledge on basic Python concepts -- here, for-loops and try/except blocks -- it is possible to flexibly create plots to identical specifications using arbitrary subsets of data. (A clear overview of these and other basic Python concepts can be found [here](https://www.geeksforgeeks.org/python-if-else/).)

Here, we use two nested for-loops to produce a plot for each combination of borough and skilled trade, as well as for the sum of all skilled trades in each borough.

In [None]:
plot_data = current_by_prepost.reset_index()
plot_data.columns = ['Borough', 'Craft', 'Month Scheduled', 'Pre_Post', 'Case Count']

for boro in ['BK','BX','MN','QS']:
    for trade in list(plot_data['Craft'].unique()):
        dataset = plot_data[(plot_data['Borough']==boro)&(plot_data['Craft']==trade)]
        
        try:
            plt.close('all')
        except:
            pass

        fig = plt.subplot()
        sns.barplot(data=dataset, x='Month Scheduled', y='Case Count', hue='Pre_Post', errorbar=None)
        fig.set_title(f"Case Count by Month: {boro}, {trade}")
        plt.savefig(f"Plots/{boro}_{trade}.pdf", bbox_inches='tight')
    
    
    try:
        plt.close('all')
    except:
        pass

    fig = plt.subplot()
    sns.barplot(data=plot_data[plot_data['Borough']==boro], x='Month Scheduled', y='Case Count', hue='Pre_Post', errorbar=None)
    fig.set_title(f"Case Count by Month: {boro}, all trades")
    plt.savefig(f"Plots/{boro}_ALL.pdf", bbox_inches='tight')
            

Finally, the following block produces a formatted table containing the percentage of cases in each borough that were created before and after the introduction of Work Order Reform.

In [None]:
temp = current_by_prepost.reset_index()[['SITEID', 'PREPOST', 'CASE_COUNT']].pivot_table(index='SITEID', columns='PREPOST', values='CASE_COUNT', aggfunc=sum).fillna(0)
temp['Percent Pre'] = temp.apply(lambda row: str('%.1f'%(row['PRE']*100/(row['POST']+row['PRE'])))+'%', axis=1)
temp['Percent Post'] = temp.apply(lambda row: str('%.1f'%(row['POST']*100/(row['POST']+row['PRE'])))+'%', axis=1)
