# Whereabout Streets Data Extraction
This notebook will demonstrate how to access Street and Bridge Operations PDF file and extract this data to create a work order plan template.

<div style="text-align:center"><img src="https://upload.wikimedia.org/wikipedia/en/9/94/Closeup_of_pavement_with_grass.JPG" /></div>

## Introduction
The purpose of this notebook is to create a Street and Bridge Work Order plans based on segment IDs and additional comments on long line. Markings feature layers are published in the City of Austin ArcGIS Portal page available for public view as well. 

The schedule for where sealcoat and overlay streets are completed is received through email by Street and Bridge Operations on a daily basis. It is sent as a PDF file that lists weather conditions, temperature, and provides a table of streets where paving is completed.

<b>The only manual process the user will have to do is to:</b>
- Input Segment IDs
- Make comments on long line markings
- Specify file path to retrieve the table of completed streets paved for PDF name and file path
- Create any missing markings assets that are not visible in aerial imagery

This process will cut down on the previous process of manually editing a plans layout through copy-pasting imagery and writing Location IDs, work groups, markings found, and the exporting plans one at a time. An excel document will be created based on this input and read segment IDs to find all short line and specialty point markings. This will ideally generate multiple PDF plans in a faster and shorter time frame.

<i><b>Disclaimer:</b> This product is for informational purposes and may not have been prepared for or be suitable for legal, engineering, or surveying purposes. No warranty is made by the City of Austin regarding specific accuracy or completeness.</i>

## Imports
The packages used for this project are:
- [pandas](https://pandas.pydata.org/) to create dataframe of extracted table and transform the data
- [openpyxl](https://openpyxl.readthedocs.io/en/stable/) to edit excel files
- [arcgis](https://esri.github.io/arcgis-python-api/apidoc/html/) to search for markings feature layer dataset

In [1]:
import pandas as pd
%run C:\Users\Govs\Projects/CopyGISFeatures.py
%run C:\Users\Govs\Projects/FeatureLayerDataFrame.py
    
from openpyxl import Workbook,load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows

from arcgis.gis import GIS
from arcgis.features import FeatureLayer

from functools import reduce
import numpy as np

## Constants

The date by month and day constant will determine the file pdf name to use as a dataframe. Folder path will determine where the plans will be created depending on the year. This is set to the top for the purpose of changing these constants as needed.

<i>The table below explains the purpose of each constant.</i>

| Constant | Description   |
|:--------:|----|
| <b>MONTH, DAY, YEAR</b> |Date used to find PDF in month-day format and file path based on year|
|<b>FOLDER</b>      |File directory used to import SBO whereabouts reports from email|
|<b>FILE_NAME</b>   |File directory name used to extact SBO whereabouts reports from file|
|<b>SIGN_IN</b>   |Whether to prompt user to sign in to outlook email|
|<b>INPUT</b>|Whether to prompt user to input segment Ids and comments to export to excel| 

In [3]:
YEAR = str(2019)
FOLDER = (r"G:\ATD\Signs_and_Markings\MARKINGS\Whereabouts WORK ORDERS\{}\Whereabouts_Summary"
         ).format(YEAR)
FILE_NAME = FOLDER + r'\9_September\SBO_combined_sept'
EXCEL_FILE = FILE_NAME + ".xlsx"
CSV_FILE = FILE_NAME + ".csv"
INPUT= True
%store FOLDER
%store EXCEL_FILE

Stored 'FOLDER' (str)
Stored 'EXCEL_FILE' (str)


## Methods
These functions will be used to extract and transform the data into a feasible format.

<i>The table below explains the purpose of each:</i>

| Method | Description   |
|:--------:|----|
|<b>input_form</b> |Prompts user to input segment IDs and long line specifications|
|<b>query_df</b>   |Query dataframe by segment IDs|

In [138]:
# Prompts user to input segment IDs and longline while changing the datafram to include user input
def input_form(df):
    segments = []
    longline = []
    for index,row in df.iterrows():
        console = input(row['Location'] + "\nSegment ID list: ")
        if not console:
            segments.append(None)
            longline.append(None)
        else:
            temp = GISFeatures(console).to_df()
            segments.append(str(list(temp.SEGMENT_ID))[1:-1])
            comment = input("Longline: ")
            longline.append(comment)
    df['Segment IDs'], df['LongLine'] = (segments,longline)
    print("\nInput complete.")
    return df.dropna(subset=['Segment IDs'])
    
# Returns query dataframe appended if markings exist in the listed segment IDs
def query_df(fc,index,f,df,df1):
    q = "SEGMENT_ID IN({})".format(df["Segment IDs"][index])
    if q != "SEGMENT_ID IN(N/A)":
        c = fc.query(where=q,return_count_only=True) 
        if c != 0:
            sdf = fc.query(where=q).sdf.filter(items=f)
            sdf["Location ID"] = df["Location ID"][index]
            sdf["LongLine"] = df["LongLine"][index]
            df1 = df1.append(sdf,sort=True)
    df1['COUNTS'] = 1
    return df1

# Return dataframe of the listed specifications
def specifications(df,i):
    df["SPECIFICATIONS"] = ''
    for index,row in df.iterrows():
        keys = list(row[i:])
        values = list(df.columns)[i:]
        spec = []
        for k,v in zip(keys,values):
            if k != 'N/A' and k != '' and v != 'WORK GROUPS':
                spec.append('{} {}'.format(int(k),v.lower().replace('_',' ')))
            if row['LongLine'] != 'N/A':
                sentence = 'Install {}, '.format(row['LongLine']) + ', '.join(word for word in spec)
            else:
                sentence = 'Install ' + ', '.join(word for word in spec)
        df.at[index,'SPECIFICATIONS'] = sentence
    if 'WORK GROUPS' in df.columns:
        df.loc[df.Street != None,'WORK GROUPS'] = df.loc[df.Street != None,'WORK GROUPS'].apply(str)
    return df

# Returns dataframe of markings count and pages
def location_in_df(df,markings_type,workgroup):
    if 'Location ID' in df:
        count = df.groupby(['Location ID',markings_type]).count()[['SEGMENT_ID']].rename(columns={"SEGMENT_ID":'COUNTS'})
        count = count.pivot_table(values='COUNTS',index='Location ID',columns=(markings_type),aggfunc='first').reset_index()
        count[workgroup] = workgroup
        page = df.groupby(['Location ID','SEGMENT_ID','LongLine',markings_type]).count()[['COUNTS']]
        page = page.pivot_table(
            values='COUNTS',index=['Location ID','SEGMENT_ID','LongLine'],columns=(markings_type),aggfunc='first')
        return count,page

# Returns dataframe of cover page
def create_cover(cover,sl_count,sp_count,wg):
    cover.loc[cover.LongLine != 'N/A', wg[2]] = wg[2]
    cover.loc[cover.LongLine == 'N/A', wg[2]] = 'N/A' 
    if not sl_count.empty and not sp_count.empty:
        cover = reduce(lambda z,y: pd.merge_ordered(z,y,on='Location ID'), [cover,sl_count,sp_count])
    elif not sl_count.empty or not sp_count.empty:
        count = sl_count if sp_count.empty else sp_count
        wg_remove = 'SPECIALTY MARKINGS' if sp_count.empty else 'SHORTLINE'
        cover = pd.merge_ordered(count,sl_count,on='Location ID')
        wg.remove(wg_remove)
    else:
        cover = specifications(cover,9)
        return cover
    cover = cover.dropna(how='all',subset=list(cover.columns)[6:]).fillna('N/A')
    cover['WORK GROUPS'] = cover[wg].apply(','.join,1).apply(lambda x: [s for s in x.split(',') if s != 'N/A'])
    cover = cover.drop(columns = wg).fillna('N/A')
    cover = specifications(cover,9)
    cover['PAGE'] = 1
    return cover[cover['WORK GROUPS'] != '[]']

# Returns dataframe of pages
def create_pages(pages,sl_page,sp_page):
    if not sl_page.empty and not sp_page.empty: # if it has shortline and specialty
        pages = pd.merge_ordered(sl_page,sp_page,on=('Location ID','SEGMENT_ID','LongLine')).fillna("N/A")
        pages = specifications(pages,3)
        pages = pd.merge_ordered(pages,streets,on=('Location ID','SEGMENT_ID','LongLine'))
        pages = pages.sort_values(by=['Location ID','BLOCK']).drop(columns='BLOCK').reset_index(drop = True)
    elif not sl_page.empty or not sp_page.empty: # if it has shortline or specialty
        page = sl_page if sp_page.empty else sp_page
        pages = specifications(page.fillna('N/A'),3)
        pages = pd.merge_ordered(pages,streets,on=('Location ID','SEGMENT_ID','LongLine')).sort_values(
            by=['BLOCK','Location ID']).reset_index(drop = True).drop(columns='BLOCK')
        pages = pages.dropna(subset=['SPECIFICATIONS'])
        page = 1
        for index, row in streets.iterrows():
            if index != 0 and (row['Location ID'] != pages['Location ID'][index - 1]):
                page = 2
                pages.at[index,'PAGE'] = page
            else:
                page += 1
                pages.at[index,'PAGE'] = page
    else:
        pages.loc[cover.Street != None,'PAGE'] = 2
    return pages

# Creates worksheet in excel file unless the worksheet already exists
def create_ws(df,sheet_name):
    if sheet_name in wb:
        del wb[sheet_name]
    ws = wb.create_sheet(sheet_name)
    for r in dataframe_to_rows(df, index=False, header=True):
        ws.append(r)
    wb.save(EXCEL_FILE)

## Import SBO Report
The first thing to do is to import the sbo report from a csv file and convert it into a dataframe

In [8]:
cols = {'altref':'Location ID','on_street':'Street','from_street':'From','to_street':'To','actfinish_1':'Finish Date'}

sbo = pd.read_csv(CSV_FILE)

if 'altref' in sbo.columns:
    df = sbo.filter(items=list(cols.keys())).sort_values('altref').rename(columns=cols).reset_index(drop=True)
    df['Location'] = (df["Street"] + ' FROM ' + df["From"] + ' TO ' + df["To"]).str.upper()
    df.to_csv(CSV_FILE)
else:
    df = sbo.copy().drop(columns='Unnamed: 0').dropna(subset=['Segment IDs'])

This will display the first 10 rows of the report from SBO

In [10]:
display(df.head(10))

Unnamed: 0,Location ID,Street,From,To,Finish Date,Location
0,43104,MEINARDUS DR,ST ELMO RD E,SPONBERG DR,"Aug 27, 2019, 11:04 AM",MEINARDUS DR FROM ST ELMO RD E TO SPONBERG DR
1,43114,RIVER HILLS RD,2112,2225,"Aug 21, 2019, 6:00 PM",RIVER HILLS RD FROM 2112 TO 2225
2,62527,AFFIRMED DR,MANOWAR STRETCH DR,THUNDER GULCH DR,"Sep 17, 2019, 10:23 AM",AFFIRMED DR FROM MANOWAR STRETCH DR TO THUNDER...
3,62528,ALOMAR CV,QUIRIN DR,PEARCE LN,"Sep 17, 2019, 10:23 AM",ALOMAR CV FROM QUIRIN DR TO PEARCE LN
4,62529,ALYSHEBA DR,SEATTLE SLEW DR,WAR ADMIRAL DR,"Sep 17, 2019, 10:23 AM",ALYSHEBA DR FROM SEATTLE SLEW DR TO WAR ADMIRA...
5,62530,BAHAN DR,GILWELL DR,THOME VALLEY DR,"Sep 16, 2019, 10:23 AM",BAHAN DR FROM GILWELL DR TO THOME VALLEY DR
6,62534,DEARBONNE DR,FELLER CV,MANOWAR STRETCH DR,"Sep 19, 2019, 8:16 AM",DEARBONNE DR FROM FELLER CV TO MANOWAR STRETCH DR
7,62536,FELLER CV,DEARBONNE DR,CUL DE SAC,"Sep 19, 2019, 8:16 AM",FELLER CV FROM DEARBONNE DR TO CUL DE SAC
8,62538,FRYMAN HILL DR,VIZQUEL LOOP,THOME VALLEY DR,"Sep 16, 2019, 10:23 AM",FRYMAN HILL DR FROM VIZQUEL LOOP TO THOME VALL...
9,62539,GILWELL DR,BAHAN DR,DEAD END,"Sep 19, 2019, 8:16 AM",GILWELL DR FROM BAHAN DR TO DEAD END


## Loading and Transforming Data

### PDF tables to Excel

Now that the PDFs have been extracted and exported to the folder path, the next step is to extract the tables in the PDF and export it as an excel file.

An input form will generate so the user can input Segment ID and comment information for each of the streets listed. The columns list will only take the relevant columns from the extracted table. The `pdfplumber` package will be used to extract tables from the PDF and prompt user to submit data.

The input will be stored as a DataFrame saved to an excel document. If the user already provided input froma  previous session, the dataframe will be set to the excel file document instead.

In [33]:
from pathlib import Path

# Will prompt input and export to excel unless the excel file already exists. In that case it will read excel file instead
if Path(EXCEL_FILE).exists():
    df = pd.read_csv(CSV_FILE,index_col=0)
    df = df.reset_index(drop=True).fillna("N/A")
else:
    input_form(df)
    df = df.fillna("N/A")
    df['LongLine'] = ['N/A' if x == '' else x for x in df['LongLine']]
    df.to_excel(EXCEL_FILE)
    df.to_csv(CSV_FILE)

In [34]:
display(df)

Unnamed: 0,Location ID,Street,From,To,Finish Date,Location,Segment IDs,LongLine
0,62528,ALOMAR CV,QUIRIN DR,PEARCE LN,"Sep 17, 2019, 10:23 AM",ALOMAR CV FROM QUIRIN DR TO PEARCE LN,"'3282735', '2036597'",
1,62530,BAHAN DR,GILWELL DR,THOME VALLEY DR,"Sep 16, 2019, 10:23 AM",BAHAN DR FROM GILWELL DR TO THOME VALLEY DR,"'2036627', '2036625', '2036624', '2036626'",
2,62547,MANOWAR STRETCH DR,DEARBONNE DR,SEA BISCUIT DR,"Sep 16, 2019, 10:23 AM",MANOWAR STRETCH DR FROM DEARBONNE DR TO SEA BI...,"'2036632', '2036633', '2036634'",
3,62549,MUCK DR,RANDLEMAN DR,13504,"Sep 17, 2019, 10:23 AM",MUCK DR FROM RANDLEMAN DR TO 13504,'2046337',
4,62561,SPIERS WAY,12816,LIPTON LOOP,"Sep 19, 2019, 8:16 AM",SPIERS WAY FROM 12816 TO LIPTON LOOP,"'2045145', '2038244', '2046330', '2046331', '2...",
5,62566,THOME VALLEY DR,ROSS RD,GILWELL DR,"Sep 19, 2019, 8:16 AM",THOME VALLEY DR FROM ROSS RD TO GILWELL DR,"'2036610', '2036606', '2036609', '2036607', '2...","turn bay, partial double yellow centerline"
6,62573,WINTERS CV,NIJMEGEN DR,13120,"Sep 18, 2019, 2:18 PM",WINTERS CV FROM NIJMEGEN DR TO 13120,'2045149',
7,62661,SHOAL CREEK BLVD,31ST ST W,34TH ST W,"Aug 17, 2019, 2:12 PM",SHOAL CREEK BLVD FROM 31ST ST W TO 34TH ST W,'2016472',bike lane
8,62700,SLAUGHTER LN E,100,BRANDT RD,"Sep 6, 2019, 8:00 AM",SLAUGHTER LN E FROM 100 TO BRANDT RD,"'3260082', '3260074', '2044862', '3260079', '3...","turn bays, lane lines"
9,63043,METROPOLIS,BURLESON RD,METLINK DR,"Sep 14, 2019, 10:35 AM",METROPOLIS FROM BURLESON RD TO METLINK DR,'2046722',"lane lines, double yellow centerlines"


This file contains a table for the list of streets with the following columns:
- <i>Location ID</i>: unique identifier used for street paving
- <i>Street</i>: main street that is paved
- <i>From</i>: intersecting cross street
- <i>To</i>: intersecting cross street
- <i>Segment IDs</i>: list of segment IDs where street is paved seperated by commas
- <i>Comments</i>: Notes on long line markings

### Feature Layer Data Query

The next task is to find the markings through the list of segment IDs the user has inputted. For this task the `arcgis` package will be useful for extracting the markings available in each segment ID since the dataset is already available publically.

Since the markings datasets are publically available, we can login to ArcGIS Online anonymously. 

Use `client_id` instead of `None` if you wish to log-in through an AGOL federate account. Note that it will prompt user to enter code which can be found by following the instructions. Going through an AGOL federated account is useful if the user wishes to add their own layers as a reference such as [NearMap](https://go.nearmap.com/) aerial imagery. 

It will search through the markings feature layer based on the list of segment IDs provided by the excel file.

In [35]:
# variables used to find and query feature layer in AGOL
gis = GIS("https://austin.maps.arcgis.com/home/index.html")
url = r"https://services.arcgis.com/0L95CJ0VTaxqcmED/arcgis/rest/services/TRANSPORTATION_{}/FeatureServer/0"
sl,sp,streets = (pd.DataFrame(),pd.DataFrame(),pd.DataFrame())

# Columns for data frame. Indexes: df (0), shortline (1-4), specialty point (3 to etc.)
cols = ['SHORT_LINE_TYPE','SEGMENT_ID','SPECIALTY_POINT_TYPE','SPECIALTY_POINT_SUB_TYPE']
s_col = ['LEFT_BLOCK_FROM','RIGHT_BLOCK_FROM','SEGMENT_ID']

for index,row in df.iterrows():
    streets = query_df(FeatureLayer(url.format("street_segment")),index,s_col,df,streets)      
    sl = query_df(FeatureLayer(url.format("markings_short_line")),index,cols[:2],df,sl)
    sp = query_df(FeatureLayer(url.format("markings_specialty_point")),index,cols[1:],df,sp)
sp = FeatureLayerDataFrame(sp).specialty_markings(sp)

# Order table
streets['BLOCK'] = np.maximum(streets[s_col[0]],streets[s_col[1]])
streets = streets.sort_values(by=['BLOCK','Location ID']).reset_index(drop = True)
streets = streets.rename(columns={'COUNTS':'PAGE'}).drop(s_col[:2],axis=1)

page = 1
for index, row in streets.iterrows():
    if index != 0 and (row['Location ID'] != streets['Location ID'][index - 1]):
        page = 2
        streets.at[index,'PAGE'] = page
    else:
        page += 1
        streets.at[index,'PAGE'] = page

### Plans Table Creation

#### Cover Table

In [141]:
wg = ['SHORT LINE','SPECIALTY MARKINGS','LONGLINE']
sl_count,sl_page = location_in_df(sl,'SHORT_LINE_TYPE',wg[0])
sp_count,sp_page = location_in_df(sp,'SPECIALTY_POINT_TYPE',wg[1])
cover = create_cover(df.copy(),sl_count,sp_count,wg)
pages = create_pages(df.copy(),sl_page,sp_page)

#### Cover Table
This dataframe lists pavement markings queried by segment IDs with the following columns:
- <i>Location ID</i>: Unique identifier used for street paving
- <i>Location</i>: Location of wherreabouts work
- <i>WORK GROUPS</i>: Type of markings work group assigned to work order
- <i>SPECIFICATIONS</i>: Lists all markings that need to be installed on work order.

The dataframe will be saves in an excel sheet for it to be used again to generate the template.

In [140]:
display(cover.filter(['Location ID','Location','Comments','WORK GROUPS','SPECIFICATIONS']))

Unnamed: 0,Location ID,Location,WORK GROUPS,SPECIFICATIONS
0,62528,ALOMAR CV FROM QUIRIN DR TO PEARCE LN,['SHORT LINE'],Install 2 stop line
1,62530,BAHAN DR FROM GILWELL DR TO THOME VALLEY DR,['SHORT LINE'],Install 1 stop line
2,62547,MANOWAR STRETCH DR FROM DEARBONNE DR TO SEA BI...,['SHORT LINE'],Install 2 stop line
3,62549,MUCK DR FROM RANDLEMAN DR TO 13504,['SHORT LINE'],Install 1 stop line
5,62566,THOME VALLEY DR FROM ROSS RD TO GILWELL DR,"['SHORT LINE', 'LONGLINE']","Install turn bay, partial double yellow center..."
6,62573,WINTERS CV FROM NIJMEGEN DR TO 13120,['SHORT LINE'],Install 1 stop line
7,62661,SHOAL CREEK BLVD FROM 31ST ST W TO 34TH ST W,"['SHORT LINE', 'SPECIALTY MARKINGS', 'LONGLINE']","Install bike lane, 1 stop line, 2 chevron symb..."
8,62700,SLAUGHTER LN E FROM 100 TO BRANDT RD,"['SHORT LINE', 'SPECIALTY MARKINGS', 'LONGLINE']","Install turn bays, lane lines, 3 stop line, 14..."
9,63043,METROPOLIS FROM BURLESON RD TO METLINK DR,"['SHORT LINE', 'LONGLINE']","Install lane lines, double yellow centerlines,..."
10,63044,METROPOLIS DR FROM METLINK DR TO US 183 HWY SB S,"['SHORT LINE', 'LONGLINE']","Install lane lines, double yellow centerlines,..."


#### Pages Table
This dataframe lists pavement markings queried by segment IDs with the following columns:
- <i>LOCATION ID</i>: Unique identifier used for street paving
- <i>SEGMENT_ID</i>: Segment ID of page reference
- <i>SPECIFICATIONS</i>: Lists all markings that need to be installed on work order.
- <i>PAGE</i>: Page number used for work order

The dataframe will be saves in an excel sheet for it to be used again to generate the template.

In [134]:
display(pages.filter(['Location ID','SEGMENT_ID','SPECIFICATIONS','PAGE'])) 

Unnamed: 0,Location ID,SEGMENT_ID,SPECIFICATIONS,PAGE
0,62528,2036597,Install 1 stop line,2
1,62528,3282735,Install 1 stop line,2
2,62530,2036625,"Install 1 crosswalk, 1 stop line",2
3,62530,2036626,,3
4,62530,2036627,,4
...,...,...,...,...
78,74357,2018563,"Install turn bays, lane lines, 1 crosswalk, 17...",2
79,74361,2040261,"Install lane lines, double yellow turn bays, t...",2
80,79944,2044888,Install 1 stop line,2
81,79944,2044889,,3


## Create Worksheets of DataFrames

In [135]:
wb = load_workbook(filename = EXCEL_FILE)
create_ws(cover,'Cover')
create_ws(pages,'Pages')

## Generating Whereabouts Plans
To generate whereabout plans, we will have to use the `arcpy` package, which requires Python 2 and ArcMap 10.5. Eventually, this notebook will be able to use `arcpy` in Python 3.

[Click here to access notebook](PlansTemplate.ipynb)