# 3b Corridor Analysis (loops)

This module will produce plots of delay, speed, volume, and percent of good data at each loopgroup (using unsmoothed loopgroup data) as a function of time as well as along a corridor as a function of space. These plots can be used to diagnose the data quality both by comparing the actual loop data here with the plots from contour data to see if the data processing introduced errors and to compare the amount of delay calculated without aggregating across the year, as the contour data does. You can also look for discontinuities and messy/noisy data in each loop group that will help determine which data is reliable.

The module also produces throughput plots both for diagnostic purposes and for publishing.

The raw loop data needs to be downloaded from the Tracflow website and placed in the appropriate folder (default is *./[ccr]/1_Data/[region]/4_Loop Data/[year]*).

This module also requires that the *3a_Corridor_Analysis* module has been completed and the Corridor object binary files are saved to an appropriate location (default is *./[ccr]/2_Corridor Output/[region]/*).

## Inputs

The inputs for this module are:

Base Year, Current Year : the base and current analysis years for the current analysis. (e.g. 2015, 2017)

CCR : the current CCR and name of the main folder for the current CCR (e.g. 'CCR 18')

Analyst : analyst's name

Suffix (in and out) : the *in* Suffix is the suffix for previously processed and written Corridor objects (either written with this module or the *3a_Corridor_Analysis* module). For example, to load Corridor objects *5 NWR_with_loops.dat* I would input *_with_loops* for the *in* Suffix.  
The *out* Suffix is a tag to add to the filenames of the Excel and binary outputs of this module. For example, if Suffix = _8-20-2018 then the Excel outputfile for 5 NWR loopgroups would be named *5 NWR_thru_loops_8-20-2018.xlsx*.

Update Plotting : this is a boolean (True/False) argument indicating whether to re-load the plot parameters (plot_params attribute of the Corridor objects, read from the *Plotting* sheet of the *[corridor]_[region]_config.xlsx* file.

Percent Good Days : this is a threshold for the percent of good days of data that a loopgroup must have (both for base year and current year) to be plotted in png format. For example, if 40% is chosen (~ 100 good days) then all of the loopgroups in a given corridor with at least 40% of the days having good data will have the throughput plotted and output in png format (this is not for publication, just for data exploration).

by_day Format : this is the format to output the diagnostic plots for each loopgroup containing percent of good days, delay, volume, and speed for each day in the base year and current year. Image formats (e.g. png) that open in Windows Photos are preferable because you can quickly scroll through the images and look for patterns.

by_mp Format : this is the format to output the diagnostic plots for each corridor containing percent of good days, delay, volume, and speed at each loopgroup plotted by milepost. Since this is a large plot (especially for the longer corridors such as I-5 and I-405) it is helpful to plot this in pdf format so that it is clear and you can zoom in.

## Main Code Block

### Import Dependencies

In [None]:
#---------------------------- import dependencies -----------------------------
import os
import pickle
import time
from pymas.corridor_tools import *
from pymas.pymas_classes import Corridor, LoopGroup

# disable warnings
import warnings
warnings.filterwarnings('ignore')

### Prepare loopgroup data

These steps prepare the loopgroup data to be attached to the Corridor objects built in the 3a_Corridor_Analysis module. 

#### Extract loop files from .zip files

Starting with the .zip files from TRAC, this cell will extract the loop .xlsx files and move the original zip files and accompanying plots (provided by TRAC) into subfolders in the *./[ccr]/1_Data/[region]/4_Loop Data/[year]/* folder.

In [None]:
interface = '0_Interface.xlsx'
sheet = 'Inputs'

#------------------------------------------------------------------------------
# You shouldn't need to edit anything below this line
#------------------------------------------------------------------------------

# get inputs from 0_Interface.xlsx 
inputs = get_inputs('3b', interface, sheet)
base_year = inputs['base_year']
curr_year = inputs['curr_year']
ccr = inputs['ccr']

reg_cors = get_batchlist()

for reg, cor_list in reg_cors.iteritems():
    print('\nProcessing %s'%reg)
    paths = define_paths(ccr, reg, base_year, curr_year)
    
    for cor in cor_list:
        unzip_loops(cor, paths)

#### Build LoopGroups and write to .dat

This is an optional (but highly recommended) step, which builds LoopGroup objects and writes them to binary files. The benefit to including this step is that if you need to rebuild corridor objects and call the add_loops function, it is much faster to add loops from the binary files produced by this cell than by re-building the LoopGroup objects from Excel files.

**This is the most time-consuming and computationally demanding step in the entire PyMAS program. Expect it to take ~5 seconds per loop (for NWR, which has ~650 loops, this takes ~1 hour to complete). This is why it is helpful to run this optional step. Otherwise, re-building the loops from scratch if a Corridor object needs to be rebuilt will take a substantial amount of time, whereas adding pre-built loops to a Corridor object will only require a fraction of this time.**

In [None]:
interface = '0_Interface.xlsx'
sheet = 'Inputs'

#------------------------------------------------------------------------------
# You shouldn't need to edit anything below this line
#------------------------------------------------------------------------------

# get inputs from 0_Interface.xlsx 
inputs = get_inputs('3b', interface, sheet)
base_year = inputs['base_year']
curr_year = inputs['curr_year']
ccr = inputs['ccr']
analyst = inputs['analyst']

reg_cors = get_batchlist()

for reg, cor_list in reg_cors.iteritems():
    print('\nProcessing %s'%reg)
    
    paths = define_paths(ccr, reg, base_year, curr_year)
    p = os.path.join(paths['loop_path'])
    
    for cor in cor_list:
        print('\nProcessing %s %s'%(cor,reg))
        for loop in get_loops(cor, paths):
            print('Building %s LoopGroup object'%loop)
            obj = LoopGroup(loop, base_year, curr_year, paths, analyst)
            
            with open(os.path.join(p, '%s.dat'%loop), 'wb') as f:
                pickle.dump(obj, f)

### Build objs dictionary

This cell reads in Corridor objects that were written previously (either by the current module or by *3a_Corridor_Analysis*) and stores them in a dictionary called *objs*.

In [2]:
interface = '0_Interface.xlsx'
sheet = 'Inputs'

#------------------------------------------------------------------------------
# You shouldn't need to edit anything below this line
#------------------------------------------------------------------------------

# get inputs from 0_Interface.xlsx 
inputs = get_inputs('3b', interface, sheet)
base_year = inputs['base_year']
curr_year = inputs['curr_year']
ccr = inputs['ccr']
analyst = inputs['analyst']
suffix = inputs['suffix_in']

reg_cors = get_batchlist()

# Initialize empty dictionary
objs = {}

# loop through all regions
for reg, cor_list in reg_cors.iteritems():
    if len(cor_list) > 0:
        print('\nLoading %s'%(reg))
        paths = define_paths(ccr, reg, base_year, curr_year)
    
    # loop through corridors and read into dictionary
    for cor in cor_list:
        name = '%s %s'%(cor, reg)
        
        print('Loading %s%s.dat Corridor object'%(name,suffix))
        try:
            objs[name] = read_object(name, 'Corridor', paths, suffix)
        except IOError as e:
            print(e)
            
            
print('\n\nDone building objs dictionary')


Loading NWR
Loading 5 NWR_with_loops.dat Corridor object
Loading 90 NWR_with_loops.dat Corridor object
Loading 167 NWR_with_loops.dat Corridor object
Loading 405 NWR_with_loops.dat Corridor object
Loading 520 NWR_with_loops.dat Corridor object


Done building objs dictionary


### Add loops

This cell adds loop data to each Corridor object. As noted above, this step is much faster if the LoopGroup objects are already built and stored as .dat files in the *./[ccr]/1_Data/[region]/4_Loop Data/* folder.

In [None]:
interface = '0_Interface.xlsx'
sheet = 'Inputs'

#------------------------------------------------------------------------------
# You shouldn't need to edit anything below this line
#------------------------------------------------------------------------------

# get inputs from 0_Interface.xlsx 
inputs = get_inputs('3b', interface, sheet)
base_year = inputs['base_year']
curr_year = inputs['curr_year']
ccr = inputs['ccr']
analyst = inputs['analyst']
suffix = inputs['suffix_in']


reg_cors = get_batchlist()

# loop through all regions
for reg, cor_list in reg_cors.iteritems():
    if len(cor_list) > 0:
        print('\nProcessing %s'%(reg))
        paths = define_paths(ccr, reg, base_year, curr_year)
    
    # loop through corridors
    for cor in cor_list:       
        name = '%s %s'%(cor, reg)      
        objs['%s %s'%(cor, reg)].add_loops()      

print('\n\nDone adding loops')

### Write objects to .dat

This cell writes the Corridor objects to .dat files. It is recommended to use a suffix here (in the *out* Suffix cell) so that this file doesn't overwrite the Corridor object written in *3a_Corridor_Analysis*. Subsequent steps, such as *4b_Commute_Analysis*, do not require the loop data to be included in the Corridor object. Therefore, it is much better to keep loop-less objects saved as *[corridor] [region].dat* so that when the objects are to be loaded into memory to have information extracted, a much smaller object needs to be loaded.

In [None]:
interface = '0_Interface.xlsx'
sheet = 'Inputs'

#------------------------------------------------------------------------------
# You shouldn't need to edit anything below this line
#------------------------------------------------------------------------------

# get inputs from 0_Interface.xlsx 
inputs = get_inputs('3b', interface, sheet)
suffix = inputs['suffix_out']

reg_cors = get_batchlist()

# loop through all regions
for reg, cor_list in reg_cors.iteritems():
    
    if len(cor_list) > 0:
        print('\nProcessing %s'%(reg))

    # loop through corridors
    for cor in cor_list:       
        name = '%s %s'%(cor, reg)
        print('Writing %s%s.dat'%(name, suffix))
        
        objs[name].export_dat(suffix)

print('\n\nDone writing .dat files')

### Plot loop data and export Loop Summary Excel file

This cell generates plots from the loopgroup data. It plots throughput data in .png format for all loopgroups with at least the specified percentage of good days of data (*Percent Good Days* in input sheet) as well as in .pdf format for all loopgroups indicated in the *Plotting* sheet of the corridor config file. (The png files are to be used for data exploration, pdf for publication).

It also produces an Excel file for each Corridor that includes the loop summary along the corridor (speed, volume, delay, number of good days) as well as throughput at each of the loopgroups that are plotted for publication (i.e. the loopgroups indicated in the *Plotting* sheet of the corridor config file).

In [3]:
interface = '0_Interface.xlsx'
sheet = 'Inputs'

#------------------------------------------------------------------------------
# You shouldn't need to edit anything below this line
#------------------------------------------------------------------------------

# get inputs from 0_Interface.xlsx 
inputs = get_inputs('3b', interface, sheet)
base_year = inputs['base_year']
curr_year = inputs['curr_year']
ccr = inputs['ccr']
analyst = inputs['analyst']
suffix = inputs['suffix_out']
pctgd = inputs['pctgd']
by_day_fmt = inputs['by_day_fmt']
by_mp_fmt = inputs['by_mp_fmt']
update = inputs['update_pp']

reg_cors = get_batchlist()


# loop through all regions
for reg, cor_list in reg_cors.iteritems():

    if len(cor_list) > 0:
        print('\nProcessing %s'%(reg))

    # loop through corridors
    for cor in cor_list:       
        name = '%s %s'%(cor, reg)
        
        if name not in objs.keys():
            print('%s Corridor object not in objs dictionary'%name)
            continue
        
        print('\nPlotting %s'%name)
        
        if update:
            objs[name].update_plot_params()
        
        # plot throughput
        objs[name].plot_throughput()    
        objs[name].plot_throughput(lgs='config', fmt='pdf')
        
        # export Excel loop data
        objs[name].export_excel_lg(suffix=suffix)
        
        # plot diagnostics
        objs[name].plot_lg_data(by_mp_fmt=by_mp_fmt,
                                by_day_fmt=by_day_fmt)     
        
print('\n\nDone plotting.')


Processing NWR

Plotting 5 NWR
005es15792_MS__ is missing data for 2015
005es16064_MS__ is missing data for 2015
005es16640_MN__ is missing data for 2015
005es16701_MS__ is missing data for 2015
005es16885_MS__ is missing data for 2015
005es17264_MS__ is missing data for 2015
005es17826_MS__ is missing data for 2015

Plotting 90 NWR
090es00380_MW__ is missing data for 2015
090es00390_ME__ is missing data for 2015
090es00627_ME__ is missing data for 2015
090es00627_MW__ is missing data for 2015
090es00647_ME__ is missing data for 2015
090es00647_MW__ is missing data for 2015
090es01090_ME__ is missing data for 2017
090es01864_MW__ is missing data for 2017

Plotting 167 NWR
167es01565_MN__ is missing data for 2015

Plotting 405 NWR
405es01536_MN__ is missing data for 2017
405es01536_MS__ is missing data for 2017
405es01577_MN__ is missing data for 2017
405es01577_MS__ is missing data for 2017
405es01686_MN__ is missing data for 2017
405es01686_MS__ is missing data for 2017
405es01724_MN