This notebook scrapes a text file for bearings and then puts the bearings into a cvs file which can be used in the Legal Description Tools Notebook code in order to identify vertices as a points layer.  This can only be used for a set of descriptions that are straight lines, i.e. no curve data.

Step 1: Export your PDF to a text file using Adobe.  Currently this is not available in Python without an additional Python dictionary which requires downloading. See: https://realpython.com/pdf-python/ 

In [1]:
##Import the packages you will need for this notebook

import re  ##package used for Regular Expressions ("re") -- re is a package for finding what you want to match in strings 
import pandas as pd  ##pandas is a data analysis and manipulation tool, we'll use it here to place the scraped data into columns
import csv  ##csv package will ultimately be used to convert dataframe to a spreadsheet for future use

In [2]:
# point "myfile" to the text file you have exported from the PDF Legal Description you are working in, the pathfile below
# points to the text file within the github project - you will need to update r"[UPDATE THIS TEXT]" with the appropriate file pathway
# i've included the sample legal description txt file in the github project 

myfile = open(r"F:\ROW\ROW_Park_Folders\GATE\Marine Forces Reserve\RW GATE-20-001.txt",'r')

#This line of code places your text file in read mode, allowing the program to read the contents of the file.
#For more information on Python file methods see: https://pythonguides.com/python-file-methods/

f = myfile.read()

In [4]:
##Useful documentation regular Expressions: https://docs.python.org/3/library/re.html#regular-expression-syntax
##Some samples of using RegEx in Pyton:  https://www.w3schools.com/python/python_regex.asp
##for a HOWTO on Regular Expressions: https://docs.python.org/3/howto/regex.html

#This cell cleans up lists of Northing and Eastings 

#Reference - there are many more items in regular expression reference above, but here are the ones I used below:

#\d+: One or more numbers.
#\s+: One or more whitespaces.
#[A-Z\s]+: One or more uppercase characters or space characters
#\s+: One or more whitespaces.
#[A-Z][A-Za-z\s]+: An uppercase character followed by at least one more character (uppercase or lowercase) or whitespaces.
# "." is a wild card - in the search text below, I used a wild card to find both the minutes and seconds symbol as python would not
## recognize the quotation marks as a symbol
#\. indicates a decimal point

Bearing = re.findall('North\s?\d+\°\d+.\d+\.?\d+.\s?[A-Z][A-Za-z]+|South\s?\d+\°\d+.\d+\.?\d+.\s?[A-Z][A-Za-z]+', f)
print(Bearing)  ##print out the list you've created to see if it looks correct based on the text file

['South 19°26’24” East', 'South 68°53’12” West', 'South 45°44’01” West', 'South 72°08’03” West', 'South 75°44’52” West', 'South 81°36’01” West', 'South 42°26’17” West', 'South 41°39’31” West', 'South 87°30’55” West', 'North 87°21’42” West', 'North 01°48’35” West', 'North 77°40’57” West', 'North 76°08’27” West', 'North 40°23’11” West', 'South 76°08’27” East', 'South 77°40’57” East', 'South 01°48’35” East', 'South 87°21’42” East', 'North 87°30’55” East', 'North 41°39’31” East', 'North 42°26’17” East', 'North 81°36’01” East', 'North 75°44’52” East', 'North 72°08’03” East', 'North 45°44’01” East', 'North 68°53’12” East']


In [52]:
#In this cell, we take the list we've generated by finding all of our bearings into a Data Frame using the Pandas package
#this will allow us to work with the columns

#Converting lists to Pandas with examples: https://datatofish.com/list-to-dataframe/

df = pd.DataFrame(Bearing, columns = ['Bearing'])
print(df)

                 Bearing
0   South 19°26’24” East
1   South 68°53’12” West
2   South 45°44’01” West
3   South 72°08’03” West
4   South 75°44’52” West
5   South 81°36’01” West
6   South 42°26’17” West
7   South 41°39’31” West
8   South 87°30’55” West
9   North 87°21’42” West
10  North 01°48’35” West
11  North 77°40’57” West
12  North 76°08’27” West
13  North 40°23’11” West
14  South 76°08’27” East
15  South 77°40’57” East
16  South 01°48’35” East
17  South 87°21’42” East
18  North 87°30’55” East
19  North 41°39’31” East
20  North 42°26’17” East
21  North 81°36’01” East
22  North 75°44’52” East
23  North 72°08’03” East
24  North 45°44’01” East
25  North 68°53’12” East


In [54]:
#In this cell, we'll take the full bearing and split it into columns so that we can work with it in other notebooks, such as ones
#where we need to run calculations on the information in the bearing

#Reference for splitting string in df column: https://practicaldatascience.co.uk/data-science/how-to-split-strings-using-the-pandas-split-function

df[['NS','Degrees','MS','EW']] = df['Bearing'].str.split(pat = r'\s|\°',expand = True)
print(df)


                 Bearing     NS Degrees      MS    EW
0   South 19°26’24” East  South      19  26’24”  East
1   South 68°53’12” West  South      68  53’12”  West
2   South 45°44’01” West  South      45  44’01”  West
3   South 72°08’03” West  South      72  08’03”  West
4   South 75°44’52” West  South      75  44’52”  West
5   South 81°36’01” West  South      81  36’01”  West
6   South 42°26’17” West  South      42  26’17”  West
7   South 41°39’31” West  South      41  39’31”  West
8   South 87°30’55” West  South      87  30’55”  West
9   North 87°21’42” West  North      87  21’42”  West
10  North 01°48’35” West  North      01  48’35”  West
11  North 77°40’57” West  North      77  40’57”  West
12  North 76°08’27” West  North      76  08’27”  West
13  North 40°23’11” West  North      40  23’11”  West
14  South 76°08’27” East  South      76  08’27”  East
15  South 77°40’57” East  South      77  40’57”  East
16  South 01°48’35” East  South      01  48’35”  East
17  South 87°21’42” East  So

In [55]:
##Note on why I didn't just immediately split out Minutes and seconds: Reg ex would not recognize quotation marks as symbols
#I could use to split --> Therefore, I had to isolate minutes and seconds and then strip off the symbols fo minutes and seconds
df.keys()

df['Minutes'] = df['MS'].str[0:2]
df['Seconds'] = df['MS'].str[3:5]

print(df)

                 Bearing     NS Degrees      MS    EW Minutes Seconds
0   South 19°26’24” East  South      19  26’24”  East      26      24
1   South 68°53’12” West  South      68  53’12”  West      53      12
2   South 45°44’01” West  South      45  44’01”  West      44      01
3   South 72°08’03” West  South      72  08’03”  West      08      03
4   South 75°44’52” West  South      75  44’52”  West      44      52
5   South 81°36’01” West  South      81  36’01”  West      36      01
6   South 42°26’17” West  South      42  26’17”  West      26      17
7   South 41°39’31” West  South      41  39’31”  West      39      31
8   South 87°30’55” West  South      87  30’55”  West      30      55
9   North 87°21’42” West  North      87  21’42”  West      21      42
10  North 01°48’35” West  North      01  48’35”  West      48      35
11  North 77°40’57” West  North      77  40’57”  West      40      57
12  North 76°08’27” West  North      76  08’27”  West      08      27
13  North 40°23’11” 

In [56]:
#Drop the 'MS' column - it's no longer useful

dfclean = df.drop(['MS'], axis = 1)
# Converting to excel

print(dfclean.head())

                Bearing     NS Degrees    EW Minutes Seconds
0  South 19°26’24” East  South      19  East      26      24
1  South 68°53’12” West  South      68  West      53      12
2  South 45°44’01” West  South      45  West      44      01
3  South 72°08’03” West  South      72  West      08      03
4  South 75°44’52” West  South      75  West      44      52


In [57]:
#Cell to find list of distances in order to add a Distance Column

Distance = re.findall('\d+\.\d+\s?feet',f)
print(Distance)

['15.01 feet', '273.69 feet', '108.39 feet', '403.92 feet', '183.06 feet', '161.74 feet', '26.64 feet', '70.70 feet', '284.78 feet', '172.73 feet', '113.59 feet', '153.88 feet', '225.34 feet', '25.66 feet', '245.68 feet', '165.66 feet', '111.40 feet', '158.18 feet', '277.76 feet', '64.46 feet', '32.08 feet', '166.31 feet', '181.82 feet', '399.93 feet', '107.94 feet', '277.20 feet']


In [64]:
#We could have done this in one combined step, but since we created a separate list of distance, we'll need to insert that into
##the existing df, our dftest

##For this, we'll use df.insert: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.insert.html
dfclean.insert(6,"Distance and feet",Distance)
print(dfclean)

                 Bearing     NS Degrees    EW Minutes Seconds Distance and feet
0   South 19°26’24” East  South      19  East      26      24        15.01 feet
1   South 68°53’12” West  South      68  West      53      12       273.69 feet
2   South 45°44’01” West  South      45  West      44      01       108.39 feet
3   South 72°08’03” West  South      72  West      08      03       403.92 feet
4   South 75°44’52” West  South      75  West      44      52       183.06 feet
5   South 81°36’01” West  South      81  West      36      01       161.74 feet
6   South 42°26’17” West  South      42  West      26      17        26.64 feet
7   South 41°39’31” West  South      41  West      39      31        70.70 feet
8   South 87°30’55” West  South      87  West      30      55       284.78 feet
9   North 87°21’42” West  North      87  West      21      42       172.73 feet
10  North 01°48’35” West  North      01  West      48      35       113.59 feet
11  North 77°40’57” West  North      77 

In [67]:
#This cell uses regular expression again to split the 'Distance and feet' column into 2 columns: Distance and units

dfclean[['Distance','Units']] = dfclean['Distance and feet'].str.split(pat = r'\s',expand = True)
print(dfclean)

                 Bearing     NS Degrees  ... Distance and feet Distance Units
0   South 19°26’24” East  South      19  ...        15.01 feet    15.01  feet
1   South 68°53’12” West  South      68  ...       273.69 feet   273.69  feet
2   South 45°44’01” West  South      45  ...       108.39 feet   108.39  feet
3   South 72°08’03” West  South      72  ...       403.92 feet   403.92  feet
4   South 75°44’52” West  South      75  ...       183.06 feet   183.06  feet
5   South 81°36’01” West  South      81  ...       161.74 feet   161.74  feet
6   South 42°26’17” West  South      42  ...        26.64 feet    26.64  feet
7   South 41°39’31” West  South      41  ...        70.70 feet    70.70  feet
8   South 87°30’55” West  South      87  ...       284.78 feet   284.78  feet
9   North 87°21’42” West  North      87  ...       172.73 feet   172.73  feet
10  North 01°48’35” West  North      01  ...       113.59 feet   113.59  feet
11  North 77°40’57” West  North      77  ...       153.88 feet  

In [69]:
#Drop the 'Distance and feet' column - it's no longer useful

dfclean = dfclean.drop(['Distance and feet'], axis = 1)

print(dfclean.head())


                Bearing     NS Degrees    EW Minutes Seconds Distance Units
0  South 19°26’24” East  South      19  East      26      24    15.01  feet
1  South 68°53’12” West  South      68  West      53      12   273.69  feet
2  South 45°44’01” West  South      45  West      44      01   108.39  feet
3  South 72°08’03” West  South      72  West      08      03   403.92  feet
4  South 75°44’52” West  South      75  West      44      52   183.06  feet


In [70]:
#Save the resulting data frame into a csv using the to_csv method

#to_csv Pandas documentation:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html

csv = dfclean.to_csv(r'C:\Users\hdean\Documents\ArcGIS\Projects\PythonWorkSpace_ROW\bearings2.csv',index = False)