# Whereabout Streets Data Extraction
This notebook will demonstrate how to access Street and Bridge Operations PDF file and extract this data to create a work order plan template.

<div style="text-align:center"><img src="https://upload.wikimedia.org/wikipedia/en/9/94/Closeup_of_pavement_with_grass.JPG" /></div>

## Introduction
The purpose of this notebook is to create a Street and Bridge Work Order plans based on segment IDs and additional comments on long line. Markings feature layers are published in the City of Austin ArcGIS Portal page available for public view as well. 

The schedule for where sealcoat and overlay streets are completed is received through email by Street and Bridge Operations on a daily basis. It is sent as a PDF file that lists weather conditions, temperature, and provides a table of streets where paving is completed.

<b>The only manual process the user will have to do is to:</b>
- Input Segment IDs
- Make comments on long line markings
- Specify MONTH/DAY/YEAR to retrieve the table of completed streets paved for PDF name and file path
- Create any missing markings assets that are not visible in aerial imagery

This process will cut down on the previous process of manually editing a plans layout through copy-pasting imagery and writing Location IDs, work groups, markings found, and the exporting plans one at a time. An excel document will be created based on this input and read segment IDs to find all short line and specialty point markings. This will ideally generate multiple PDF plans in a faster and shorter time frame.

In the future I would like to make this script more customizable and be done seamlessly without inputting Segment IDs and inputting only specific long line markings using the maintained streets feature layer.

## Imports
The packages used for this project are:
- [exchangelib](https://github.com/ecederstrand/exchangelib) to access the attachments sent by Street and Bridge Operations
- [pdfplumber](https://github.com/jsvine/pdfplumber) to extract tables from the whereabouts report
- [pandas](https://pandas.pydata.org/) to create dataframe of extracted table and transform the data
- [openpyxl](https://openpyxl.readthedocs.io/en/stable/) to edit excel files
- [arcgis](https://esri.github.io/arcgis-python-api/apidoc/html/) to search for markings feature layer dataset

In [10]:
from exchangelib import DELEGATE, Account, Credentials, Configuration, FileAttachment, ItemAttachment
import pdfplumber
import pandas as pd
from openpyxl import Workbook,load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows
import tabula

from arcgis.gis import GIS
from arcgis.features import FeatureLayer

from functools import reduce
import numpy as np

## Constants

The date by month and day constant will determine the file pdf name to use as a dataframe. Folder path will determine where the plans will be created depending on the year. This is set to the top for the purpose of changing these constants as needed.

<i>The table below explains the purpose of each constant.</i>

| Constant | Description   |
|:--------:|----|
| <b>MONTH, DAY, YEAR</b> |Date used to find PDF in month-day format and file path based on year|
|<b>FOLDER</b>      |File directory used to import SBO whereabouts reports from email|
|<b>FILE_NAME</b>   |File directory name used to extact SBO whereabouts reports from file|
|<b>SIGN_IN</b>   |Whether to prompt user to sign in to outlook email|
|<b>INPUT</b>|Whether to prompt user to input segment Ids and comments to export to excel| 

In [2]:
MONTH,DAY,YEAR = ('May',str(30),str(2019))
FOLDER = (r"G:\ATD\Signs_and_Markings\MARKINGS\Whereabouts WORK ORDERS\{}\Whereabouts_Summary").format(YEAR)
FILE_NAME = "\\".join((FOLDER," ".join((MONTH,DAY))))
EXCEL_FILE = FILE_NAME + ".xlsx"
SIGN_IN = False # Bug in exchangelib because of update
INPUT= True

%store FOLDER
%store EXCEL_FILE

Stored 'FOLDER' (str)
Stored 'EXCEL_FILE' (str)


In [107]:
# received new reporting
# replace pdf plumber with tabula? if no excel
path = FOLDER + '\\' + 'sbo_overlay'
header = ['d','s','Day','WO No.', 'Altref', 'Crew', 'Street', 'From Street', 'To Street', 'Lane Miles']
df = tabula.read_pdf(path + '.pdf', pages='all',multiple_tables=True,stream=True,)

for x in df:
    display(x)

Unnamed: 0,0,1
0,,Lane Mi
1,,0.797
2,DISTN1,0.514
3,MILL,1.242
4,OVL1,98.216
5,Total,100.769


Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,,Day,WO No.,Altref,Crew,On_Street,From_Street,To_Street,Lane Miles from Act,Lane Miles From Act Area
1,,,,,,,,,Length and Act Width,Repair
2,"0, 2018",,,,,,,,,
3,,WED,SP18-43162,43162,OVL1,CAMP CV,CAMP FIRE TRL,7311,0.203,0.203
4,,,SP18-43164,43164,OVL1,CANTEEN CIR,CAMP FIRE TRL,7309,0.162,0.162
5,,,SP18-43194,43194,OVL1,FIRE CV,CAMP FIRE TRL,7311,0.210,0.210
6,,,,,,,Week Ending :,"Oct 6, 2018",0.575,0.575
7,,,,,,,Total at the end of Week,0,0.575,0.575
8,", 2018",,,,,,,,,
9,,,SP19-62520,62520,OVL1,ORLEANS CT,KINGS HWY,5111,0.207,0.207


Unnamed: 0,0,1,2,3,4,5,6
0,,SP19-70253,70253,OVL1,HAVANA ST,0.664,0.664
1,THU,SP19-70255,70255,OVL1,POWELL CIR,0.657,0.657
2,FRI,SP19-70260,70260,OVL1,SOUTH PARK DR,0.047,0.047
3,,SP19-70257,70257,OVL1,LIGHTSEY RD,0.557,0.557


Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,,,,,,,Week Ending :,"Nov 3, 2018",2.186,2.186
1,,,,,,,Total at the end of Week,4,7.934,7.934
2,"Nov 4, 2018",,,,,,,,,
3,,THU,SP19-62674,62674.0,OVL1,TRAVIS HILLS DR,TRAVIS COOK RD,BELL DR,0.246,0.246
4,,,SP19-63334,63334.0,OVL1,OLD BEE CAVES RD,9200,9217,0.784,0.784
5,,FRI,SP18-43119,43119.0,OVL1,VEGA AVE,PATTON RANCH RD,WILLIAM CANNON DR W,0.98,0.98
6,,SAT,SP19-63290,63290.0,OVL1,JESSIE ST,HILLMONT ST,TREADWELL ST,0.737,0.737
7,,,,,,,Week Ending :,"Nov 10, 2018",2.747,2.747
8,,,,,,,Total at the end of Week,5,10.681,10.681
9,"Nov 11, 2018",,,,,,,,,


Unnamed: 0,0,1,2,3,4,5,6,7,8
0,MON,SP19-62641,62641,OVL1,OAK BLVD,OAK BLVD E,US HWY 290 SVC RD WB W,0.217,0.217
1,WED,SP19-62625,62625,OVL1,KATHY CV,2401,2417,0.392,0.392
2,,SP19-62639,62639,OVL1,MOUNTAIN VIEW DR,KATHY CV,BARTON HILLS DR,0.74,0.74


Unnamed: 0,0,1,2,3,4,5,6,7,8
0,,,,,,Week Ending :,"Dec 8, 2018",1.349,1.349
1,,,,,,Total at the end of Week,9,14.836,14.836
2,MON,SP19-63303,63303.0,OVL1,MEADOWRIDGE DR,ANN ARBOR AVE,BLUEBONNET LN,0.338,0.338
3,,SP18-43138,43138.0,OVL1,ANN ARBOR AVE,RUNDELL PL,RABB GLEN ST,0.83,0.83
4,,SP19-63273,63273.0,OVL1,ANN ARBOR AVE,RABB GLEN ST,DE VERNE ST,0.116,0.116
5,,SP19-62647,62647.0,OVL1,RABB GLEN ST,BLUEBONNET LN,ANN ARBOR AVE,0.317,0.317
6,TUE,SP19-62602,62602.0,OVL1,CEDARVIEW DR,BARTON SKWY,OAKLANE DR,0.344,0.344
7,,,,,,Week Ending :,"Dec 15, 2018",1.945,1.945
8,,,,,,Total at the end of Week,10,16.781,16.781
9,MON,SP18-43015,43015.0,OVL1,DANA CV,2501,SPYGLASS DR,0.19,0.19


Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,,,,,,,Total at the end of Week,14,19.502,19.502
1,"3, 2019",,,,,,,,,
2,,MON,SP19-62506,62506.0,OVL1,AMETHYST TRL,WANDER LN,STOUT OAK TRL,0.336,0.336
3,,,SP19-63314,63314.0,OVL1,ROBIN RIDGE LN,MISSEL THRUSH DR,OLD STAGE TRL,0.181,0.181
4,,WED,SP18-43209,43209.0,OVL1,LINDELL AVE,LIVE OAK ST W,2225,0.308,0.308
5,,,SP18-43145,43145.0,OVL1,BARTLETT ST,CONGRESS AVE S,EUCLID AVE,0.258,0.258
6,,THU,SP19-63289,63289.0,OVL1,HOSTA CV,10101,MEDALLION LN,0.171,0.171
7,,,SP19-62637,62637.0,OVL1,MEDALLION LN,SWAN DR,11628,0.309,0.309
8,,,,,,,Week Ending :,"Jan 19, 2019",1.563,1.563
9,,,,,,,Total at the end of Week,15,21.065,21.065


Unnamed: 0,0,1,2,3,4,5,6,7,8
0,TUE,SP19-62516,62516,OVL1,LOFTON CLIFF DR,BAHAN DR,VIZQUEL LOOP,0.355,0.355
1,WED,SP19-62512,62512,OVL1,GILWELL DR,ROSS RD,BAHAN DR,0.769,0.769


Unnamed: 0,0,1,2,3,4,5,6,7,8
0,,,,,,Week Ending :,"Feb 9, 2019",2.025,2.025
1,,,,,,Total at the end of Week,18,29.249,29.249
2,THU,SP19-63304,63304.0,OVL1,MONTOPOLIS DR,RICHARDSON LN,LARCH TER,1.318,1.318
3,FRI,SP19-62618,62618.0,OVL1,HOGAN AVE,GROVE BLVD,MONTOPOLIS DR,1.231,1.231
4,,,,,,Week Ending :,"Feb 16, 2019",2.549,2.549
5,,,,,,Total at the end of Week,19,31.798,31.798
6,WED,SP19-62608,62608.0,OVL1,ELFLAND DR,RIVERCREST DR,6709,0.175,0.175
7,,SP18-43062,43062.0,OVL1,WALNUT CLAY DR,MOUNTAINCLIMB DR,DRY BEND CV,0.271,0.271
8,THU,SP18-43204,43204.0,OVL1,LAKEVIEW CIR,WESTSLOPE DR,5810,0.208,0.208
9,,SP18-43258,43258.0,OVL1,Valley Cir,Westlope Dr,5812,0.213,0.213


Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,,,,,,,Total at the end of Week,21,36.935,36.935
1,", 2019",,,,,,,,,
2,,MON,SP18-43014,43014.0,OVL1,CULLEN AVE,GROVER AVE,WOODROW AVE,0.46,0.46
3,,,,,,,Week Ending :,"Mar 9, 2019",0.46,0.46
4,,,,,,,Total at the end of Week,22,37.395,37.395
5,"0, 2019",,,,,,,,,
6,,THU,SP19-62615,62615.0,OVL1,HARRISGLENN DR,12700,13437,3.119,3.119
7,,FRI,SP19-62524,62524.0,OVL1,SUNNY VALE ST,SUMMIT ST,LOMA DR,0.395,0.395
8,,,SP19-62589,62589.0,OVL1,ANTLER DR,MATAGORDA ST,FAWN DR,0.185,0.185
9,,SAT,SP19-62683,62683.0,OVL1,WONSLEY DR W,GEORGIAN DR,219,0.335,0.335


Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,,,,,,,Week Ending :,"Mar 30, 2019",3.323,3.323
1,,,,,,,Total at the end of Week,25,48.102,48.103
2,"1, 2019",,,,,,,,,
3,,MON,SP18-42988,42988.0,OVL1,PINNACLE RD,SILVER HILL DR,ALLEN RD,0.328,0.328
4,,TUE,SP19-63302,63302.0,OVL1,MC CALL RD,WINDSOR RD,INDIAN TRL,0.327,0.327
5,,WED,SP19-62633,62633.0,OVL1,LONGVIEW ST,24TH ST W,2526,0.799,0.799
6,,,,,,,Week Ending :,"Apr 6, 2019",1.454,1.454
7,,,,,,,Total at the end of Week,26,49.556,49.557
8,", 2019",,,,,,,,,
9,,TUE,SP19-62681,62681.0,OVL1,WINDSOR RD,2300,2310,0.759,0.759


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
0,,THU,SP19-43034,43034.0,OVL1,LINDELL LN,8702,,BLUE BLUFF RD,3.015,3.015
1,,,,,,,Week Ending :,,"Apr 27, 2019",3.351,3.351
2,,,,,,,Total at the end of Week,,29,57.832,57.833
3,"8, 2019",,,,,,,,,,
4,,MON,SP19-62594,62594.0,OVL1,BLOOR RD,BLUE BLUFF RD,,12500,3.357,3.357
5,,TUE,SP19-62595,62595.0,OVL1,BLUE BLUFF RD,BLOOR RD,,10421,2.57,2.57
6,,,,,,,Week Ending :,,"May 4, 2019",5.927,5.927
7,,,,,,,Total at the end of Week,,30,63.759,63.76
8,", 2019",,,,,,,,,,
9,,,SP19-62658,62658.0,OVL1,SAN MARINO DR,3301,,WOODWARD ST,0.547,0.547


Unnamed: 0,0,1,2,3,4,5,6,7,8
0,FRI,SP19-62626,62626,OVL1,KERN RAMBLE,FRENCH PL,CONCORDIA AVE,0.512,0.512
1,,SP19-62582,62582,OVL1,37TH ST E,LAFAYETTE,KERN RAMBLE,0.369,0.369


Unnamed: 0,0,1,2,3,4,5,6,7,8
0,,,,,,Week Ending :,"May 25, 2019",3.371,3.371
1,,,,,,Total at the end of Week,33,72.261,72.262
2,TUE,SP19-62652,62652.0,OVL1,ROBINSON AVE,CONCORDIA AVE,38TH 1/2 ST E,0.408,0.408
3,THU,SP19-63322,63322.0,OVL1,SPRINGDALE RD,8108,8495,1.561,1.561
4,,SP19-81369,81369.0,DISTN1,BRICKFORD CV,DEAD END,WEST GATE BLVD,0.06,0.06
5,FRI,SP19-71280,,OVL1,FRENCH PL,KERN RAMBLE,EDGEWOOD,0.115,0.115
6,,SP18-42979,42979.0,MILL,9TH ST E,NECHES ST,I 35 SVC RD SB N,0.608,0.608
7,,,,,,Week Ending :,"Jun 1, 2019",2.752,2.752
8,,,,,,Total at the end of Week,34,75.013,75.014
9,MON,SP18-43017,43017.0,OVL1,DERBY CV,WHELESS LN,6106,0.152,0.152


Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,,,,,,,Total at the end of Week,36,80.302,80.303
1,"6, 2019",,,,,,,,,
2,,,SP19-62614,62614.0,OVL1,HARRIS BRANCH PKWY,12300,12351,1.619,1.619
3,,,,,,,Week Ending :,"Jun 22, 2019",1.619,1.619
4,,,,,,,Total at the end of Week,37,81.921,81.922
5,"3, 2019",,,,,,,,,
6,,THU,SP19-43077,43077.0,OVL1,BRAKER LN W,RENEL DR,SWEARINGEN DR,1.861,1.861
7,,SAT,SP19-63282,63282.0,OVL1,ESTES AVE,1000,PROCK LN,0.338,0.338
8,,,SP19-63311,63311.0,OVL1,PROCK LN,SARA DR,LOTT AVE,0.358,0.358
9,,,,,,,Week Ending :,"Jun 29, 2019",2.557,2.557


Unnamed: 0,0,1,2,3,4,5,6,7
0,16-354510,34534,OVL1,GLISSMAN RD,Springdale Rd,Mansell Ave,0.0,0.0
1,16-354513,40941,OVL1,RATHERVUE PL,DUVAL ST,HARRIS PARK AVE,0.0,0.0
2,16-354518,39265,OVL1,ROBBINS PL,Vance Cir,22Nd St W,0.0,0.0
3,16-354519,37936,OVL1,25TH 1/2 ST W,San Gabriel St,Leon St,0.0,0.0
4,16-354523,37977,OVL1,MLK BLVD W,RIO GRANDE,PEARL ST,0.0,0.0
5,16-354524,40892,OVL1,LAUREL CANYON DR,CRESTWAY DR,CRESTWAY DR,0.0,0.0
6,16-354598,42974,OVL1,SUNSET RIDGE,TRAVIS COOK RD,NEW PAVEMENT,0.0,0.0
7,16-370226,40804,OVL1,BIG BEND DR,FAIRVIEW DR,BALCONES DR,0.0,0.0
8,16-370228,42399,OVL1,EVERGREEN CT,HANCOCK DR,5108,0.0,0.0


Unnamed: 0,0,1,2,3,4,5,6,7,8
0,,,,,,Week Ending :,"Jul 20, 2019",2.154,2.154
1,,,,,,Total at the end of Week,41,86.632,86.633
2,TUE,16-44483,40772.0,OVL1,43RD ST E,SPEEDWAY,DE (609),0.0,0.0
3,,16-370260,39268.0,OVL1,ROSEDALE AVE,46TH ST W,48TH ST W,0.0,0.0
4,,16-370259,42092.0,OVL1,56TH ST E,AVENUE F,AVENUE G,0.0,0.0
5,,16-370258,42270.0,OVL1,CHESTERFIELD AVE,KOENIG LN W,SKYVIEW RD W,0.0,0.0
6,,16-370233,39234.0,OVL1,MOUNTAIN LAUREL DR,EXPOSITION BLVD,2809,0.0,0.0
7,,16-370229,39168.0,OVL1,FAIRVIEW DR,4500,BIG BEND DR,0.0,0.0
8,,,,,,Week Ending :,"Jul 27, 2019",0.0,0.0
9,,,,,,Total at the end of Week,42,86.632,86.633


Unnamed: 0,0,1,2,3,4,5,6,7,8
0,,SP19 - 74355,74355.0,OVL1,LAMPLIGHT VILLAGE,W Parmer Ln,Scofield Ridge Blvd,3.905,3.905
1,,,,,AVE,,,,
2,SAT,SP19-62661,62661.0,OVL1,SHOAL CREEK BLVD,31ST ST W,34TH ST W,0.759,0.759


## Methods
These functions will be used to extract and transform the data into a feasible format.

<i>The table below explains the purpose of each:</i>

| Method | Description   |
|:--------:|----|
|<b>lists_to_df</b> |Converts extracted nested list into a dataframe|
|<b>pdf_table_to_df</b> |Extracts table from PDF and then converts to dataframe|
|<b>input_form</b> |Prompts user to input segment IDs and long line specifications|
|<b>query_df</b>   |Query dataframe by segment IDs|

In [48]:
# Opens PDF to extract table and convert to dataframe
# to fix for reporting
def pdf_table_to_df(columns):
    with pdfplumber.open(FILE_NAME + ".pdf") as pdf:
        pg1 = pdf.pages[0]
        data = pg1.extract_tables(table_settings={})
        pdf.close()
    l = [item for sublist in data for item in sublist]
    l = [[ x for x in y if x != None and x != ''] for y in l] 
    l = [x for x in l if len(x) != 0]
    l = [x for x in l  if x[0] != 'ID#']
    for i in l:
        if i[0].isdigit() == False:
            del i[0]
        del i[len(columns):len(i)]
    df = pd.DataFrame(l,columns=columns)
    return df

# Prompts user to input segment IDs and comments while changing the datafram to include user input
def input_form(df,columns):
    segments, comments = [],[]
    for index,row in df.iterrows():
        location = "{} from {} to {}".format(row["Street"],row["From"],row["To"])
        console = input(location + "\nSegment ID list: ")
        try:
            segments.append(console)
        except ValueError:
            print("Skipping input...")
            segments.append(None)
        comment = input("Comment: ")
        comments.append(comment)
    df['Segment IDs'], df['Comments'] = ([s.replace('\t',',') if s != None else None for s in segments],comments)
    print("\nInput complete.")
    
# Returns query dataframe appended if markings exist in the listed segment IDs
def query_df(fc,index,f,df,df1):
    q = "SEGMENT_ID IN({})".format(df["Segment IDs"][index])
    if q != "SEGMENT_ID IN(N/A)":
        c = fc.query(where=q,return_count_only=True) 
        if c != 0:
            sdf = fc.query(where=q).sdf.filter(items=f)
            sdf["Location ID"] = df["Location ID"][index]
            sdf["Comments"] = df["Comments"][index]
            df1 = df1.append(sdf,sort=True)
    df1['COUNTS'] = 1
    return df1

# Rename markings sp based on domain code
def specialty_markings(df,field):
    if field in df.columns:
        renameList = list(zip(list(df.SPECIALTY_POINT_TYPE),list(df.SPECIALTY_POINT_SUB_TYPE)))
        arrow = ["Through","Left","Right","Left/Right","Left/Right/Through",
                 "Left/Through","Right/Through","U-turn","Lane reduction","Wrong way","Bike"]
        other = ["Green pad", "Green launch pad", "Speed hump marking","Diagonal crosshatch", "Chevron crosshatch"]
        parking = ["Parking 'L'", "Parking 'T'", "Parking stall line", "Handicap symbol"]
        symbol = ["Bike","Shared lane (Sharrow)","Bicyclist","Railroad Crossing (RxR)","Chevron","Pedestrian","Diamond"]
        word = ["Stop","Yield","Ahead","Only","Merge","Ped", "X-ing","Bus Only","Keep Clear","Do Not Block","Ped X-ing"]
        rpm = ['blue','']
        t =['word','arrow','symbol','','','rpm']
        st = [word,arrow,symbol,other,parking,rpm]
        index = 0
        for i in renameList:
            x = list(map(int,list(i)))
            temp = st[x[0] - 1][x[1] - 1] + " " + t[x[0] - 1]
            renameList[index] = temp
            index += 1
        df['SPECIALTY_POINT_TYPE'] = renameList
        return df.drop('SPECIALTY_POINT_SUB_TYPE',axis=1)
    return pd.DataFrame()

# Return dataframe of the listed specifications
def specifications(df,i):
    df["SPECIFICATIONS"] = ''
    for index,row in df.iterrows():
        keys = list(row[i:])
        values = list(df.columns)[i:]
        spec = []
        for k,v in zip(keys,values):
            if k != 'N/A' and k != '' and v != 'WORK GROUPS':
                spec.append('{} {}'.format(int(k),v.lower().replace('_',' ')))
            if row['Comments'] != 'N/A':
                sentence = 'Install {}, '.format(row['Comments']) + ', '.join(word for word in spec)
            else:
                sentence = 'Install ' + ', '.join(word for word in spec)
        df.at[index,'SPECIFICATIONS'] = sentence
    if 'WORK GROUPS' in df.columns:
        df.loc[df.Street != None,'WORK GROUPS'] = df.loc[df.Street != None,'WORK GROUPS'].apply(str)
    return df

# Returns dataframe of markings count and pages
def location_in_df(df,markings_type,workgroup):
    if 'Location ID' in df:
        count = df.groupby(['Location ID',markings_type]).count()[['SEGMENT_ID']].rename(columns={"SEGMENT_ID":'COUNTS'})
        count = count.pivot_table(values='COUNTS',index='Location ID',columns=(markings_type),aggfunc='first').reset_index()
        count[workgroup] = workgroup
        page = df.groupby(['Location ID','SEGMENT_ID','Comments',markings_type]).count()[['COUNTS']]
        page = page.pivot_table(values='COUNTS',index=['Location ID','SEGMENT_ID','Comments'],columns=(col),aggfunc='first')
        return count,page

# Returns dataframe of cover page
def create_cover(cover,sl_count,sp_count,wg):
    cover.loc[cover.Comments != 'N/A', wg[2]] = wg[2]
    cover.loc[cover.Comments == 'N/A', wg[2]] = 'N/A' 
    cover['PAGE'] = 1
    if not sl_count.empty and not sp_count.empty:
        cover = reduce(lambda z,y: pd.merge_ordered(z,y,on='Location ID'), [cover,sl_count,sp_count])
    elif not sl_count.empty or not sp_count.empty:
        count = sl_count if sp_count.empty else sp_count
        wg_remove = 'SPECIALTY MARKINGS' if sp_count.empty else 'SHORTLINE'
        cover = pd.merge_ordered(count,sl_count,on='Location ID')
        wg.remove(wg_remove)
    else:
        cover = specifications(cover,6)
        return cover
    cover = cover.dropna(how='all',subset=list(cover.columns)[6:]).fillna('N/A')
    cover['WORK GROUPS'] = cover[wg].apply(','.join,1).apply(lambda x: [s for s in x.split(',') if s != 'N/A'])
    cover = cover.drop(columns = wg).fillna('N/A')
    cover = specifications(cover,6)
    return cover

# Returns dataframe of pages
def create_pages(pages,sl_page,sp_page):
    if not sl_page.empty and not sp_page.empty:
        pages = pd.merge_ordered(sl_page,sp_page,on=('Location ID','SEGMENT_ID','Comments')).fillna("N/A")
        pages = specifications(pages,3)
        pages = pd.merge_ordered(pages,streets,on=('Location ID','SEGMENT_ID','Comments')).drop(columns='BLOCK')
        pages = pages.sort_values(by=['Location ID','PAGE']).reset_index(drop = True)
    elif not sl_page.empty or not sp_page.empty:
        page = sl_page if sp_page.empty else sp_page
        pages = specifications(page.fillna('N/A'),3)
        pages = pd.merge_ordered(pages,streets,on=('Location ID','SEGMENT_ID','Comments')).sort_values(
            by=['BLOCK','Location ID']).reset_index(drop = True).drop(columns='BLOCK')
        pages = pages.dropna(subset=['SPECIFICATIONS'])
        page = 1
        for index, row in streets.iterrows():
            if index != 0 and (row['Location ID'] != pages['Location ID'][index - 1]):
                page = 2
                pages.at[index,'PAGE'] = page
            else:
                page += 1
                pages.at[index,'PAGE'] = page
    else:
        pages.loc[cover.Street != None,'PAGE'] = 2
    return pages

# Creates worksheet in excel file unless the worksheet already exists
def create_ws(df,sheet_name):
    if sheet_name in wb:
        del wb[sheet_name]
    ws = wb.create_sheet(sheet_name)
    for r in dataframe_to_rows(df, index=False, header=True):
        ws.append(r)
    wb.save(EXCEL_FILE)

## Loading and Transforming Data

### Email Attachment Extraction

Attachments will be extracted from the inbox. The purpose of `getpass` is to prompt the user for a password to login to email. 

Since the attachments have already been exported to the directory file, a sign-in is not required.

In [5]:
import getpass

# Email subject line used for Street and Bridge Whereabouts report
daily_subject = "S&B Whereabouts"

# This will try to prompt the user to input email and password if SIGN_IN is True
try:
    if SIGN_IN:
        email = input("Enter email: ")
        password = getpass.getpass("Enter password: ")
        credentials = Credentials(username = email,password = password)
        config = Configuration(server='outlook.office365.com', credentials=credentials)
        account = Account(primary_smtp_address=email,config=config,autodiscover=False,access_type=DELEGATE)
        print("\nFile attachments below are:")
        for item in account.inbox.filter(subject__contains=daily_subject):
            for attachment in item.attachments:
                if isinstance(attachment, FileAttachment):
                    file_path = "\\".join([FOLDER,attachment.name])
                    with open(file_path, 'wb') as f:
                        f.write(attachment.content)
                    print(file_path)
except:
    print("\nWrong username or password")

### PDF tables to Excel

Now that the PDFs have been extracted and exported to the folder path, the next step is to extract the tables in the PDF and export it as an excel file.

An input form will generate so the user can input Segment ID and comment information for each of the streets listed. The columns list will only take the relevant columns from the extracted table. The `pdfplumber` package will be used to extract tables from the PDF and prompt user to submit data.

The input will be stored as a DataFrame saved to an excel document. If the user already provided input froma  previous session, the dataframe will be set to the excel file document instead.

In [6]:
from pathlib import Path

# Columns of extracted table
columns = ["Location ID", "Street", "From", "To"]

# Will prompt input and export to excel unless the excel file already exists. In that case it will read excel file instead
if Path(EXCEL_FILE).exists():
    df = pd.read_excel(EXCEL_FILE,index_col=0)
    df = df.fillna("N/A")
else:
    if INPUT:
        df = pdf_table_to_df(columns)
        input_form(df,columns)
        df = df.fillna("N/A")
        df.to_excel(EXCEL_FILE,sheet_name=" ".join((MONTH,DAY)))

In [7]:
display(df)

Unnamed: 0,Location ID,Street,From,To,Segment IDs,Comments
0,62963,HYMEADOW DR,Woodlawn Village Dr,12519,319430520387082038719,
1,SG-13247,Pecan Park Blvd,S Lake Line Blvd,Lake Creek Blvd,"3272671,3272712,3272816,3272915,3272705,327278...","turn bays, lane lines, bike lanes"


This file contains a table for the list of streets with the following columns:
- <i>Location ID</i>: unique identifier used for street paving
- <i>Street</i>: main street that is paved
- <i>From</i>: intersecting cross street
- <i>To</i>: intersecting cross street
- <i>Segment IDs</i>: list of segment IDs where street is paved seperated by commas
- <i>Comments</i>: Notes on long line markings

### Feature Layer Data Query

The next task is to find the markings through the list of segment IDs the user has inputted. For this task the `arcgis` package will be useful for extracting the markings available in each segment ID since the dataset is already available publically.

Since the markings datasets are publically available, we can login to ArcGIS Online anonymously. 

Use `client_id` instead of `None` if you wish to log-in through an AGOL federate account. Note that it will prompt user to enter code which can be found by following the instructions. Going through an AGOL federated account is useful if the user wishes to add their own layers as a reference such as [NearMap](https://go.nearmap.com/) aerial imagery. 

It will search through the markings feature layer based on the list of segment IDs provided by the excel file.

In [8]:
# variables used to find and query feature layer in AGOL
gis = GIS("https://austin.maps.arcgis.com/home/index.html")
url = r"https://services.arcgis.com/0L95CJ0VTaxqcmED/arcgis/rest/services/TRANSPORTATION_{}/FeatureServer/0"
sl,sp,streets = (pd.DataFrame(),pd.DataFrame(),pd.DataFrame())

# Columns for data frame. Indexes: df (0), shortline (1-4), specialty point (3 to etc.)
cols = ['SHORT_LINE_TYPE','SEGMENT_ID','SPECIALTY_POINT_TYPE','SPECIALTY_POINT_SUB_TYPE']
s_col = ['LEFT_BLOCK_FROM','RIGHT_BLOCK_FROM','SEGMENT_ID']

for index,row in df.iterrows():
    streets = query_df(FeatureLayer(url.format("street_segment")),index,s_col,df,streets)      
    sl = query_df(FeatureLayer(url.format("markings_short_line")),index,cols[:2],df,sl)
    sp = query_df(FeatureLayer(url.format("markings_specialty_point")),index,cols[1:],df,sp)
sp = specialty_markings(sp,cols[2])

# Order table
streets['BLOCK'] = np.maximum(streets[s_col[0]],streets[s_col[1]])
streets = streets.sort_values(by=['BLOCK','Location ID']).reset_index(drop = True)
streets = streets.rename(columns={'COUNTS':'PAGE'}).drop(s_col[:2],axis=1)

page = 1
for index, row in streets.iterrows():
    if index != 0 and (row['Location ID'] != streets['Location ID'][index - 1]):
        page = 2
        streets.at[index,'PAGE'] = page
    else:
        page += 1
        streets.at[index,'PAGE'] = page

### Plans Table Creation

#### Cover Table

In [49]:
wg = ['SHORT LINE','SPECIALTY MARKINGS','LONGLINE']
sl_count,sl_page = location_in_df(sl,'SHORT_LINE_TYPE',wg[0])
sp_count,sp_page = location_in_df(sp,'SPECIALTY_POINT_TYPE',wg[1])
cover = create_cover(df.copy(),sl_count,sp_count,wg)
pages = create_pages(df.copy(),sl_page,sp_page)

This dataframe lists pavement markings queried by segment IDs with the following columns:
- <i>LOCATION ID</i>: Unique identifier used for street paving
- <i>COMMENTS</i>: Notes on long line markings
- <i>WORK GROUPS</i>: Type of markings work group assigned to work order
- <i>SPECIFICATIONS</i>: Lists all markings that need to be installed on work order.


The dataframe will be saves in an excel sheet for it to be used again to generate the template.

In [50]:
display(cover)
display(pages) 

Unnamed: 0,Location ID,Street,From,To,Segment IDs,Comments,PAGE,CROSSWALK,STOP_LINE,YIELD_LINE,Bicyclist symbol,Bike arrow,Diagonal crosshatch,Left arrow,Left/Through arrow,Only word,Right arrow,Shared lane (Sharrow) symbol,WORK GROUPS,SPECIFICATIONS
0,62963,HYMEADOW DR,Woodlawn Village Dr,12519,319430520387082038719,,1,,3.0,,,,,,,,,,['SHORT LINE'],"Install 1 page, 3 stop line"
1,SG-13247,Pecan Park Blvd,S Lake Line Blvd,Lake Creek Blvd,"3272671,3272712,3272816,3272915,3272705,327278...","turn bays, lane lines, bike lanes",1,10.0,13.0,1.0,9.0,28.0,9.0,21.0,2.0,23.0,13.0,14.0,"['SHORT LINE', 'SPECIALTY MARKINGS', 'LONGLINE']","Install turn bays, lane lines, bike lanes, 1 p..."


Unnamed: 0,Location ID,SEGMENT_ID,Comments,CROSSWALK,STOP_LINE,YIELD_LINE,Bicyclist symbol,Bike arrow,Diagonal crosshatch,Left arrow,Left/Through arrow,Only word,Right arrow,Shared lane (Sharrow) symbol,SPECIFICATIONS,PAGE
0,62963,2038719,,,1.0,,,,,,,,,,Install 1 stop line,2
1,62963,2038708,,,1.0,,,,,,,,,,Install 1 stop line,3
2,62963,3194305,,,1.0,,,,,,,,,,Install 1 stop line,4
3,SG-13247,3272912,"turn bays, lane lines, bike lanes",,,,,,,,,,,4.0,"Install turn bays, lane lines, bike lanes, 4 s...",2
4,SG-13247,3272907,"turn bays, lane lines, bike lanes",1.0,1.0,,1.0,,,2.0,,2.0,2.0,2.0,"Install turn bays, lane lines, bike lanes, 1 c...",3
5,SG-13247,3272914,"turn bays, lane lines, bike lanes",,,,,,,,,,,4.0,"Install turn bays, lane lines, bike lanes, 4 s...",4
6,SG-13247,3272909,"turn bays, lane lines, bike lanes",,,,,,,,,,,4.0,"Install turn bays, lane lines, bike lanes, 4 s...",5
7,SG-13247,3272915,"turn bays, lane lines, bike lanes",2.0,1.0,,1.0,1.0,4.0,,,1.0,1.0,,"Install turn bays, lane lines, bike lanes, 2 c...",6
8,SG-13247,3272910,"turn bays, lane lines, bike lanes",,,1.0,,,5.0,,,,,,"Install turn bays, lane lines, bike lanes, 1 y...",7
9,SG-13247,2048238,"turn bays, lane lines, bike lanes",,,,,,,,,,,,,8


#### Pages Table

## Create Worksheets of DataFrames

In [10]:
wb = load_workbook(filename = EXCEL_FILE)
create_ws(cover,'Cover')
create_ws(pages,'Pages')

## Generating Whereabouts Plans
To generate whereabout plans, we will have to use the `arcpy` package, which requires Python 2 and ArcMap 10.5. Eventually, this notebook will be able to use `arcpy` in Python 3.

[Click here to access notebook](PlansTemplate.ipynb)

# (Optional) Create Spreadsheet of Completed Streets
This is intended to report on extracted streets generated from the PDFs

In [9]:
import os
import pandas as pd

# Columns of extracted table
columns = ["Location ID", "Street", "From", "To"]
df = pd.DataFrame()

try:
    df.read_excel(FOLDER + "\\SBO Street List.xlsx")
except:
    for foldername,subfolders,files in os.walk(FOLDER):
        for file in files:
            if file.endswith('.pdf'):
                FILE_NAME = "\\".join((FOLDER,file[:-4]))
                df1 = pdf_table_to_df(columns)
                df1["filename"] = file
                df = df.append(df1,sort=True)
    df.to_excel(FOLDER + "\\SBO Street List.xlsx",sheet_name="Report")