# Whereabout Streets Data Extraction
This notebook will demonstrate how to access Street and Bridge Operations PDF file and extract this data to create a work order plan template.

<div style="text-align:center"><img src="https://upload.wikimedia.org/wikipedia/en/9/94/Closeup_of_pavement_with_grass.JPG" /></div>

## Introduction
The purpose of this notebook is to create a Street and Bridge Work Order plans based on segment IDs and additional comments on long line. Markings feature layers are published in the City of Austin ArcGIS Portal page available for public view as well. 

The schedule for where sealcoat and overlay streets are completed is received through email by Street and Bridge Operations on a daily basis. It is sent as a PDF file that lists weather conditions, temperature, and provides a table of streets where paving is completed.

<b>The only manual process the user will have to do is to:</b>
- Input Segment IDs
- Make comments on long line markings
- Specify file path to retrieve the table of completed streets paved for PDF name and file path
- Create any missing markings assets that are not visible in aerial imagery

This process will cut down on the previous process of manually editing a plans layout through copy-pasting imagery and writing Location IDs, work groups, markings found, and the exporting plans one at a time. An excel document will be created based on this input and read segment IDs to find all short line and specialty point markings. This will ideally generate multiple PDF plans in a faster and shorter time frame.

In the future I would like to make this script more customizable and be done seamlessly without inputting Segment IDs and inputting only specific long line markings using the maintained streets feature layer.

## Imports
The packages used for this project are:
- [pandas](https://pandas.pydata.org/) to create dataframe of extracted table and transform the data
- [openpyxl](https://openpyxl.readthedocs.io/en/stable/) to edit excel files
- [arcgis](https://esri.github.io/arcgis-python-api/apidoc/html/) to search for markings feature layer dataset

In [1]:
import pandas as pd
from openpyxl import Workbook,load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows

from arcgis.gis import GIS
from arcgis.features import FeatureLayer

from functools import reduce
import numpy as np

# NOTES !!!!!!!!!
- Might remove arcgis.gis and move markings asset calculations to second notebook to use GISMAINT1

## Constants

The date by month and day constant will determine the file pdf name to use as a dataframe. Folder path will determine where the plans will be created depending on the year. This is set to the top for the purpose of changing these constants as needed.

<i>The table below explains the purpose of each constant.</i>

| Constant | Description   |
|:--------:|----|
| <b>MONTH, DAY, YEAR</b> |Date used to find PDF in month-day format and file path based on year|
|<b>FOLDER</b>      |File directory used to import SBO whereabouts reports from email|
|<b>FILE_NAME</b>   |File directory name used to extact SBO whereabouts reports from file|
|<b>SIGN_IN</b>   |Whether to prompt user to sign in to outlook email|
|<b>INPUT</b>|Whether to prompt user to input segment Ids and comments to export to excel| 

In [22]:
YEAR = str(2019)
FOLDER = (r"G:\ATD\Signs_and_Markings\MARKINGS\Whereabouts WORK ORDERS\{}\Whereabouts_Summary").format(YEAR)
FILE_NAME = FOLDER + r'\SBO_Combined'
EXCEL_FILE = FILE_NAME + ".xlsx"
INPUT= True
%store FOLDER
%store EXCEL_FILE

Stored 'FOLDER' (str)
Stored 'EXCEL_FILE' (str)


In [42]:
cols = {'altref':'Location ID','on_street':'Street','from_street':'From','to_street':'To',
        'crew':'Type','actfinish_1':'Finish Date'}
values = {'OVL1':'Overlay','SLCT1':'Sealcoat','SLCT2':'Sealcoat','MILL':'Mill','DISTN1':'Overlay (DIS)'}

sealcoat = pd.read_csv(FOLDER + r'\sbo_seal_coat_2019.csv')
overlay = pd.read_csv(FOLDER + r'\sbo_overlay_2019.csv')
df = sealcoat.append(overlay,sort=True).replace(values).filter(items=list(cols.keys())).sort_values('altref').rename(
    columns=cols).reset_index(drop=True)
df['Location'] = (df["Street"] + ' FROM ' + df["From"] + ' TO ' + df["To"]).str.upper()

This will display the first 10 rows of the report from SBO

In [43]:
display(df.head(10))
df.to_csv(FILE_NAME + '.csv')

Unnamed: 0,Location ID,Street,From,To,Type,Finish Date,Location
0,30887.0,GRANDVIEW ST,31St St W,34Th St W,Overlay,"Jul 19, 2019, 3:07 PM",GRANDVIEW ST FROM 31ST ST W TO 34TH ST W
1,34534.0,GLISSMAN RD,Springdale Rd,Mansell Ave,Overlay,"Jul 19, 2019, 3:11 PM",GLISSMAN RD FROM SPRINGDALE RD TO MANSELL AVE
2,37936.0,25TH 1/2 ST W,San Gabriel St,Leon St,Overlay,"Jul 19, 2019, 3:15 PM",25TH 1/2 ST W FROM SAN GABRIEL ST TO LEON ST
3,37977.0,MLK BLVD W,RIO GRANDE,PEARL ST,Overlay,"Jul 19, 2019, 3:19 PM",MLK BLVD W FROM RIO GRANDE TO PEARL ST
4,37986.0,SHOAL CREST AVE,28Th 1/2 St W,29Th St W,Overlay,"Jul 19, 2019, 3:09 PM",SHOAL CREST AVE FROM 28TH 1/2 ST W TO 29TH ST W
5,39099.0,3RD ST E,San Saba St,Tillery Sq,Overlay,"Jul 19, 2019, 1:36 PM",3RD ST E FROM SAN SABA ST TO TILLERY SQ
6,39168.0,FAIRVIEW DR,4500,BIG BEND DR,Overlay,"Jul 23, 2019, 10:38 AM",FAIRVIEW DR FROM 4500 TO BIG BEND DR
7,39187.0,HARTFORD RD,ENFIELD RD,WINDSOR RD,Overlay,"Mar 21, 2019, 3:00 PM",HARTFORD RD FROM ENFIELD RD TO WINDSOR RD
8,39234.0,MOUNTAIN LAUREL DR,EXPOSITION BLVD,2809,Overlay,"Jul 23, 2019, 10:36 AM",MOUNTAIN LAUREL DR FROM EXPOSITION BLVD TO 2809
9,39265.0,ROBBINS PL,Vance Cir,22Nd St W,Overlay,"Jul 19, 2019, 3:13 PM",ROBBINS PL FROM VANCE CIR TO 22ND ST W


## Methods
These functions will be used to extract and transform the data into a feasible format.

<i>The table below explains the purpose of each:</i>

| Method | Description   |
|:--------:|----|
|<b>lists_to_df</b> |Converts extracted nested list into a dataframe|
|<b>pdf_table_to_df</b> |Extracts table from PDF and then converts to dataframe|
|<b>input_form</b> |Prompts user to input segment IDs and long line specifications|
|<b>query_df</b>   |Query dataframe by segment IDs|

In [25]:
# Prompts user to input segment IDs and longline while changing the datafram to include user input
def input_form(df):
    segments, longline = [],[]
    for index,row in df.iterrows():
        console = input(row['Location'] + "\nSegment ID list: ")
        try:
            if '\t' not in console:
                segments.append(console)
            else:
                new_tbl=[]
                i = 32
                tbl = console.split('\t')
                while i < len(tbl):
                    try:
                        temp = tbl[i].split(' ')
                        tbl[i:i] = temp
                        new_tbl.append(tbl[i - 32:i + 1])
                        del tbl[i + 2]
                        i += 33
                    except:
                        break
                temp = pd.DataFrame(new_tbl[1:],columns=new_tbl[0])
                segments.append(str(list(temp.SEGMENT_ID))[1:-1])
        except ValueError:
            print("Skipping input...")
            segments.append(None)
        comment = input("Longline: ")
        longline.append(comment)
    df['Segment IDs'], df['LongLine'] = (segments,longline)
    print("\nInput complete.")
    return df
    
# Returns query dataframe appended if markings exist in the listed segment IDs
def query_df(fc,index,f,df,df1):
    q = "SEGMENT_ID IN({})".format(df["Segment IDs"][index])
    if q != "SEGMENT_ID IN(N/A)":
        c = fc.query(where=q,return_count_only=True) 
        if c != 0:
            sdf = fc.query(where=q).sdf.filter(items=f)
            sdf["Location ID"] = df["Location ID"][index]
            sdf["LongLine"] = df["LongLine"][index]
            df1 = df1.append(sdf,sort=True)
    df1['COUNTS'] = 1
    return df1

# Rename markings sp based on domain code
def specialty_markings(df,field):
    if field in df.columns:
        renameList = list(zip(list(df[field]),list(df.SPECIALTY_POINT_SUB_TYPE)))
        word = ["Stop","Yield","Ahead","Only","Merge","Ped", "X-ing","MPH","Bus Only","Ped X-ing","Keep Clear","Do Not Block"]
        arrow = ["Through","Left","Right","Left/Right","Left/Right/Through","Left/Through","Right/Through",
                 "U-turn","Lane reduction","Wrong way","Bike"]
        other = ["Green pad", "Green launch pad", "Speed hump marking","Diagonal crosshatch", "Chevron crosshatch"]
        parking = ["Parking 'L'", "Parking 'T'", "Parking stall line", "Handicap symbol"]
        symbol = ["Bike","Shared lane (Sharrow)","Bicyclist","Railroad Crossing (RxR)","Chevron","Pedestrian","Diamond"]
        rpm = ['blue','']
        t =['word','arrow','symbol','','','rpm']
        st = [word,arrow,symbol,other,parking,rpm]
        index = 0
        for i in renameList:
            x = list(map(int,list(i)))
            temp = st[x[0] - 1][x[1] - 1] + " " + t[x[0] - 1]
            renameList[index] = temp
            index += 1
        df[field] = renameList
        return df.drop('SPECIALTY_POINT_SUB_TYPE',axis=1)
    return pd.DataFrame()

# Return dataframe of the listed specifications
def specifications(df,i):
    df["SPECIFICATIONS"] = ''
    for index,row in df.iterrows():
        keys = list(row[i:])
        values = list(df.columns)[i:]
        spec = []
        for k,v in zip(keys,values):
            if k != 'N/A' and k != '' and v != 'WORK GROUPS':
                spec.append('{} {}'.format(int(k),v.lower().replace('_',' ')))
            if row['LongLine'] != 'N/A':
                sentence = 'Install {}, '.format(row['LongLine']) + ', '.join(word for word in spec)
            else:
                sentence = 'Install ' + ', '.join(word for word in spec)
        df.at[index,'SPECIFICATIONS'] = sentence
    if 'WORK GROUPS' in df.columns:
        df.loc[df.Street != None,'WORK GROUPS'] = df.loc[df.Street != None,'WORK GROUPS'].apply(str)
    return df

# Returns dataframe of markings count and pages
def location_in_df(df,markings_type,workgroup):
    if 'Location ID' in df:
        count = df.groupby(['Location ID',markings_type]).count()[['SEGMENT_ID']].rename(columns={"SEGMENT_ID":'COUNTS'})
        count = count.pivot_table(values='COUNTS',index='Location ID',columns=(markings_type),aggfunc='first').reset_index()
        count[workgroup] = workgroup
        page = df.groupby(['Location ID','SEGMENT_ID','LongLine',markings_type]).count()[['COUNTS']]
        page = page.pivot_table(values='COUNTS',index=['Location ID','SEGMENT_ID','LongLine'],columns=(markings_type),aggfunc='first')
        return count,page

# Returns dataframe of cover page
def create_cover(cover,sl_count,sp_count,wg):
    cover.loc[cover.LongLine != 'N/A', wg[2]] = wg[2]
    cover.loc[cover.LongLine == 'N/A', wg[2]] = 'N/A' 
    if not sl_count.empty and not sp_count.empty:
        cover = reduce(lambda z,y: pd.merge_ordered(z,y,on='Location ID'), [cover,sl_count,sp_count])
    elif not sl_count.empty or not sp_count.empty:
        count = sl_count if sp_count.empty else sp_count
        wg_remove = 'SPECIALTY MARKINGS' if sp_count.empty else 'SHORTLINE'
        cover = pd.merge_ordered(count,sl_count,on='Location ID')
        wg.remove(wg_remove)
    else:
        cover = specifications(cover,9)
        return cover
    cover = cover.dropna(how='all',subset=list(cover.columns)[6:]).fillna('N/A')
    cover['WORK GROUPS'] = cover[wg].apply(','.join,1).apply(lambda x: [s for s in x.split(',') if s != 'N/A'])
    cover = cover.drop(columns = wg).fillna('N/A')
    cover = specifications(cover,9)
    cover['PAGE'] = 1
    return cover

# Returns dataframe of pages
def create_pages(pages,sl_page,sp_page):
    if not sl_page.empty and not sp_page.empty:
        pages = pd.merge_ordered(sl_page,sp_page,on=('Location ID','SEGMENT_ID','LongLine')).fillna("N/A")
        pages = specifications(pages,3)
        pages = pd.merge_ordered(pages,streets,on=('Location ID','SEGMENT_ID','LongLine')).drop(columns='BLOCK')
        pages = pages.sort_values(by=['Location ID','PAGE']).reset_index(drop = True)
    elif not sl_page.empty or not sp_page.empty:
        page = sl_page if sp_page.empty else sp_page
        pages = specifications(page.fillna('N/A'),3)
        pages = pd.merge_ordered(pages,streets,on=('Location ID','SEGMENT_ID','LongLine')).sort_values(
            by=['BLOCK','Location ID']).reset_index(drop = True).drop(columns='BLOCK')
        pages = pages.dropna(subset=['SPECIFICATIONS'])
        page = 1
        for index, row in streets.iterrows():
            if index != 0 and (row['Location ID'] != pages['Location ID'][index - 1]):
                page = 2
                pages.at[index,'PAGE'] = page
            else:
                page += 1
                pages.at[index,'PAGE'] = page
    else:
        pages.loc[cover.Street != None,'PAGE'] = 2
    return pages

# Creates worksheet in excel file unless the worksheet already exists
def create_ws(df,sheet_name):
    if sheet_name in wb:
        del wb[sheet_name]
    ws = wb.create_sheet(sheet_name)
    for r in dataframe_to_rows(df, index=False, header=True):
        ws.append(r)
    wb.save(EXCEL_FILE)

## Loading and Transforming Data

### PDF tables to Excel

Now that the PDFs have been extracted and exported to the folder path, the next step is to extract the tables in the PDF and export it as an excel file.

An input form will generate so the user can input Segment ID and comment information for each of the streets listed. The columns list will only take the relevant columns from the extracted table. The `pdfplumber` package will be used to extract tables from the PDF and prompt user to submit data.

The input will be stored as a DataFrame saved to an excel document. If the user already provided input froma  previous session, the dataframe will be set to the excel file document instead.

In [44]:
segments = []
locations = []
longline = []
with open(r'C:\Users\Govs\Projects\Files\text console.txt','r') as f:
    lines = f.readlines()
    for l in lines:
        if l.isupper():
            sentence = l.replace('\n','')
            locations.append(sentence)
        elif 'Segment ID list: ' in l:
            if 'OBJECTID *' in l:
                new_tbl=[]
                i = 32
                tbl = l.split('\t')
                while i < len(tbl):
                    try:
                        temp = tbl[i].split(' ')
                        tbl[i:i] = temp
                        new_tbl.append(tbl[i - 32:i + 1])
                        del tbl[i + 2]
                        i += 33
                    except:
                        break
                temp = pd.DataFrame(new_tbl[1:],columns=new_tbl[0])
                segments.append(str(list(temp.SEGMENT_ID))[1:-1])
            else:
                segments.append(None)
        else:
            sentence = l[10:].replace('\n','')
            if sentence == '':
                sentence = 'N/A'
            longline.append(sentence)

In [45]:
from pathlib import Path

# Columns of extracted table
columns = ["Location ID", "Street", "From", "To"]

# Will prompt input and export to excel unless the excel file already exists. In that case it will read excel file instead
if Path(FILE_NAME + '.csv').exists():
    df = pd.read_csv(FILE_NAME + '.csv',index_col=0)
    df = df.fillna("N/A")
else:
    input_form(df)
    df = df.fillna("N/A")
    df.to_excel(EXCEL_FILE)
    df.to_csv(FILE_NAME + '.csv')

In [46]:
df1 = pd.DataFrame()
df1['Location'] = locations
df1['Segment IDs'] = segments
df1['LongLine'] = longline
df2 = df.merge(df1,on='Location')
df2 = df2[df2["Segment IDs"] == df2["Segment IDs"]]
df2.to_excel(EXCEL_FILE)
df2.to_csv(FILE_NAME + '.csv')
df = df2.copy()

In [59]:
df = df.fillna("N/A")
df['LongLine'] = ['N/A' if x == ' ' else x for x in df['LongLine']]
display(df)

Unnamed: 0,Location ID,Street,From,To,Type,Finish Date,Location,Segment IDs,LongLine
3,37977.0,MLK BLVD W,RIO GRANDE,PEARL ST,Overlay,"Jul 19, 2019, 3:19 PM",MLK BLVD W FROM RIO GRANDE TO PEARL ST,"'2017702', '2017692', '2017716'","lane lines, bike lanes, double yellow turn ba..."
5,39099.0,3RD ST E,San Saba St,Tillery Sq,Overlay,"Jul 19, 2019, 1:36 PM",3RD ST E FROM SAN SABA ST TO TILLERY SQ,"'2019491', '2019516', '2019542', '2019503'",
6,39168.0,FAIRVIEW DR,4500,BIG BEND DR,Overlay,"Jul 23, 2019, 10:38 AM",FAIRVIEW DR FROM 4500 TO BIG BEND DR,"'2013551', '2040360', '2013580', '2013574'",double yellow centerline
7,39187.0,HARTFORD RD,ENFIELD RD,WINDSOR RD,Overlay,"Mar 21, 2019, 3:00 PM",HARTFORD RD FROM ENFIELD RD TO WINDSOR RD,"'2020525', '2020505', '2020541', '2020593', '2...","double yellow center line,turn bay"
9,39265.0,ROBBINS PL,Vance Cir,22Nd St W,Overlay,"Jul 19, 2019, 3:13 PM",ROBBINS PL FROM VANCE CIR TO 22ND ST W,'2017645',
12,40772.0,43RD ST E,SPEEDWAY,DE (609),Overlay,"Jul 23, 2019, 10:21 AM",43RD ST E FROM SPEEDWAY TO DE (609),"'2016314', '2016338', '2016273', '3318434', '2...",
13,40785.0,6TH ST E,PEDERNALES ST,CALLES ST,Overlay,"Jul 19, 2019, 1:23 PM",6TH ST E FROM PEDERNALES ST TO CALLES ST,'2019299',
15,40804.0,BIG BEND DR,FAIRVIEW DR,BALCONES DR,Overlay,"Jul 19, 2019, 3:22 PM",BIG BEND DR FROM FAIRVIEW DR TO BALCONES DR,"'2013565', '2040400'",double yellow centerline
17,40821.0,CALLES ST,6TH ST E,7TH ST E,Overlay,"Jul 19, 2019, 1:26 PM",CALLES ST FROM 6TH ST E TO 7TH ST E,'2019238',
25,40941.0,RATHERVUE PL,DUVAL ST,HARRIS PARK AVE,Overlay,"Jul 19, 2019, 3:12 PM",RATHERVUE PL FROM DUVAL ST TO HARRIS PARK AVE,'2017117',


This file contains a table for the list of streets with the following columns:
- <i>Location ID</i>: unique identifier used for street paving
- <i>Street</i>: main street that is paved
- <i>From</i>: intersecting cross street
- <i>To</i>: intersecting cross street
- <i>Segment IDs</i>: list of segment IDs where street is paved seperated by commas
- <i>Comments</i>: Notes on long line markings

### Feature Layer Data Query

The next task is to find the markings through the list of segment IDs the user has inputted. For this task the `arcgis` package will be useful for extracting the markings available in each segment ID since the dataset is already available publically.

Since the markings datasets are publically available, we can login to ArcGIS Online anonymously. 

Use `client_id` instead of `None` if you wish to log-in through an AGOL federate account. Note that it will prompt user to enter code which can be found by following the instructions. Going through an AGOL federated account is useful if the user wishes to add their own layers as a reference such as [NearMap](https://go.nearmap.com/) aerial imagery. 

It will search through the markings feature layer based on the list of segment IDs provided by the excel file.

In [60]:
# variables used to find and query feature layer in AGOL
gis = GIS("https://austin.maps.arcgis.com/home/index.html")
url = r"https://services.arcgis.com/0L95CJ0VTaxqcmED/arcgis/rest/services/TRANSPORTATION_{}/FeatureServer/0"
sl,sp,streets = (pd.DataFrame(),pd.DataFrame(),pd.DataFrame())

# Columns for data frame. Indexes: df (0), shortline (1-4), specialty point (3 to etc.)
cols = ['SHORT_LINE_TYPE','SEGMENT_ID','SPECIALTY_POINT_TYPE','SPECIALTY_POINT_SUB_TYPE']
s_col = ['LEFT_BLOCK_FROM','RIGHT_BLOCK_FROM','SEGMENT_ID']

for index,row in df.iterrows():
    streets = query_df(FeatureLayer(url.format("street_segment")),index,s_col,df,streets)      
    sl = query_df(FeatureLayer(url.format("markings_short_line")),index,cols[:2],df,sl)
    sp = query_df(FeatureLayer(url.format("markings_specialty_point")),index,cols[1:],df,sp)
sp = specialty_markings(sp,cols[2])

# Order table
streets['BLOCK'] = np.maximum(streets[s_col[0]],streets[s_col[1]])
streets = streets.sort_values(by=['BLOCK','Location ID']).reset_index(drop = True)
streets = streets.rename(columns={'COUNTS':'PAGE'}).drop(s_col[:2],axis=1)

page = 1
for index, row in streets.iterrows():
    if index != 0 and (row['Location ID'] != streets['Location ID'][index - 1]):
        page = 2
        streets.at[index,'PAGE'] = page
    else:
        page += 1
        streets.at[index,'PAGE'] = page

### Plans Table Creation

#### Cover Table

In [61]:
wg = ['SHORT LINE','SPECIALTY MARKINGS','LONGLINE']
sl_count,sl_page = location_in_df(sl,'SHORT_LINE_TYPE',wg[0])
sp_count,sp_page = location_in_df(sp,'SPECIALTY_POINT_TYPE',wg[1])
cover = create_cover(df.copy(),sl_count,sp_count,wg)
pages = create_pages(df.copy(),sl_page,sp_page)

This dataframe lists pavement markings queried by segment IDs with the following columns:
- <i>LOCATION ID</i>: Unique identifier used for street paving
- <i>COMMENTS</i>: Notes on long line markings
- <i>WORK GROUPS</i>: Type of markings work group assigned to work order
- <i>SPECIFICATIONS</i>: Lists all markings that need to be installed on work order.


The dataframe will be saves in an excel sheet for it to be used again to generate the template.

In [62]:
display(cover)
display(pages) 

Unnamed: 0,Location ID,Street,From,To,Type,Finish Date,Location,Segment IDs,LongLine,CROSSWALK,...,Merge word,Only word,Parking stall line,Right arrow,Right/Through arrow,Speed hump marking,Through arrow,WORK GROUPS,SPECIFICATIONS,PAGE
0,37977.0,MLK BLVD W,RIO GRANDE,PEARL ST,Overlay,"Jul 19, 2019, 3:19 PM",MLK BLVD W FROM RIO GRANDE TO PEARL ST,"'2017702', '2017692', '2017716'","lane lines, bike lanes, double yellow turn ba...",1,...,1,,,2,,,,"['SHORT LINE', 'SPECIALTY MARKINGS', 'LONGLINE']","Install lane lines, bike lanes, double yellow...",1
1,39099.0,3RD ST E,San Saba St,Tillery Sq,Overlay,"Jul 19, 2019, 1:36 PM",3RD ST E FROM SAN SABA ST TO TILLERY SQ,"'2019491', '2019516', '2019542', '2019503'",,2,...,,,,,,,,['SHORT LINE'],"Install 2 crosswalk, 5 stop line",1
2,39168.0,FAIRVIEW DR,4500,BIG BEND DR,Overlay,"Jul 23, 2019, 10:38 AM",FAIRVIEW DR FROM 4500 TO BIG BEND DR,"'2013551', '2040360', '2013580', '2013574'",double yellow centerline,3,...,,1,,2,,,,"['SHORT LINE', 'SPECIALTY MARKINGS', 'LONGLINE']","Install double yellow centerline, 3 crosswalk...",1
3,39187.0,HARTFORD RD,ENFIELD RD,WINDSOR RD,Overlay,"Mar 21, 2019, 3:00 PM",HARTFORD RD FROM ENFIELD RD TO WINDSOR RD,"'2020525', '2020505', '2020541', '2020593', '2...","double yellow center line,turn bay",2,...,,1,,,,,,"['SHORT LINE', 'SPECIALTY MARKINGS', 'LONGLINE']","Install double yellow center line,turn bay, 2...",1
4,39265.0,ROBBINS PL,Vance Cir,22Nd St W,Overlay,"Jul 19, 2019, 3:13 PM",ROBBINS PL FROM VANCE CIR TO 22ND ST W,'2017645',,1,...,,,,,,,,['SHORT LINE'],"Install 1 crosswalk, 2 stop line",1
5,40772.0,43RD ST E,SPEEDWAY,DE (609),Overlay,"Jul 23, 2019, 10:21 AM",43RD ST E FROM SPEEDWAY TO DE (609),"'2016314', '2016338', '2016273', '3318434', '2...",,5,...,,,,,,,,"['SHORT LINE', 'SPECIALTY MARKINGS']","Install 5 crosswalk, 5 stop line, 1 chevron sy...",1
6,40785.0,6TH ST E,PEDERNALES ST,CALLES ST,Overlay,"Jul 19, 2019, 1:23 PM",6TH ST E FROM PEDERNALES ST TO CALLES ST,'2019299',,1,...,,,,,,,,"['SHORT LINE', 'SPECIALTY MARKINGS']","Install 1 crosswalk, 1 stop line, 4 chevron sy...",1
7,40804.0,BIG BEND DR,FAIRVIEW DR,BALCONES DR,Overlay,"Jul 19, 2019, 3:22 PM",BIG BEND DR FROM FAIRVIEW DR TO BALCONES DR,"'2013565', '2040400'",double yellow centerline,,...,,,,,,,,"['SHORT LINE', 'LONGLINE']","Install double yellow centerline, 1 stop line...",1
8,40821.0,CALLES ST,6TH ST E,7TH ST E,Overlay,"Jul 19, 2019, 1:26 PM",CALLES ST FROM 6TH ST E TO 7TH ST E,'2019238',,1,...,,,,,,,,['SHORT LINE'],"Install 1 crosswalk, 1 stop line",1
9,40941.0,RATHERVUE PL,DUVAL ST,HARRIS PARK AVE,Overlay,"Jul 19, 2019, 3:12 PM",RATHERVUE PL FROM DUVAL ST TO HARRIS PARK AVE,'2017117',,,...,,,,,,,,['SHORT LINE'],Install 1 stop line,1


Unnamed: 0,Location ID,SEGMENT_ID,LongLine,CROSSWALK,STOP_LINE,YIELD_LINE,Bicyclist symbol,Bike arrow,Chevron symbol,Diagonal crosshatch,...,Left arrow,Merge word,Only word,Parking stall line,Right arrow,Right/Through arrow,Speed hump marking,Through arrow,SPECIFICATIONS,PAGE
0,37977.0,2017716,"lane lines, bike lanes, double yellow turn ba...",,2,,2,2,,,...,,1,,,,,,,"Install lane lines, bike lanes, double yellow...",2
1,37977.0,2017702,"lane lines, bike lanes, double yellow turn ba...",1,,,2,3,3,3,...,,,,,2,,,,"Install lane lines, bike lanes, double yellow...",3
2,37977.0,2017692,"lane lines, bike lanes, double yellow turn ba...",,,,1,1,,,...,,,,,,,,,"Install lane lines, bike lanes, double yellow...",4
3,39099.0,2019491,,,1,,,,,,...,,,,,,,,,Install 1 stop line,2
4,39099.0,2019503,,,1,,,,,,...,,,,,,,,,Install 1 stop line,2
5,39099.0,2019516,,,1,,,,,,...,,,,,,,,,Install 1 stop line,2
6,39099.0,2019542,,2,2,,,,,,...,,,,,,,,,"Install 2 crosswalk, 2 stop line",3
7,39168.0,2013580,double yellow centerline,1,,,,,,,...,,,1,,2,,,,"Install double yellow centerline, 1 crosswalk...",2
8,39168.0,2013574,double yellow centerline,1,,,,,,,...,,,,,,,,,"Install double yellow centerline, 1 crosswalk",3
9,39168.0,2013551,double yellow centerline,1,,,,,,,...,,,,,,,,,"Install double yellow centerline, 1 crosswalk",4


#### Pages Table

## Create Worksheets of DataFrames

In [64]:
wb = load_workbook(filename = EXCEL_FILE)
create_ws(cover,'Cover')
create_ws(pages,'Pages')

## Generating Whereabouts Plans
To generate whereabout plans, we will have to use the `arcpy` package, which requires Python 2 and ArcMap 10.5. Eventually, this notebook will be able to use `arcpy` in Python 3.

[Click here to access notebook](PlansTemplate.ipynb)

# (Optional) Create Spreadsheet of Completed Streets
This is intended to report on extracted streets generated from the PDFs

In [9]:
import os
import pandas as pd

# Columns of extracted table
columns = ["Location ID", "Street", "From", "To"]
df = pd.DataFrame()

try:
    df.read_excel(FOLDER + "\\SBO Street List.xlsx")
except:
    for foldername,subfolders,files in os.walk(FOLDER):
        for file in files:
            if file.endswith('.pdf'):
                FILE_NAME = "\\".join((FOLDER,file[:-4]))
                df1 = pdf_table_to_df(columns)
                df1["filename"] = file
                df = df.append(df1,sort=True)
    df.to_excel(FOLDER + "\\SBO Street List.xlsx",sheet_name="Report")