## Label Generation Code for Neural Networks
Project by <b><a href = 'ramavajjala@wisc.edu'>C S Siddharth Ramavajjala<a> <sup>a</sup></b>, <b><a href = 'sgnamburi@wisc.edu'>G N V V Satya Sai Srinath<a><sup>b</sup></b>, <b><a href = 'gangaraju2@wisc.edu'>Ramakrishna Raju Gangaraju<a><sup>a</sup></b>

<i>a - Department of Geography, University of Wisconsin - Madison*</i><br>
<i>b - Department of Computer Science, University of Wisconsin - Madison*</i>

In [3]:
# importing libraries
import geopandas as gpd
import matplotlib.pyplot as plt
import sys, os
import json

In [4]:
#function to change current working directory
def change_os(path):                       
    print(os.getcwd())
    os.chdir(path)
    print(os.getcwd())

change_os(r'C:\Users\Sidrcs\Documents\Github\map_generalisation_ml')

C:\Users\Sidrcs\Documents\Github\map_generalisation_ml
C:\Users\Sidrcs\Documents\Github\map_generalisation_ml


In [5]:
# import Florida shapefile
fl_gdf = gpd.read_file(r'Data\Florida.shp')

In [6]:
fl_gdf.head()

Unnamed: 0,GISJOIN,REGION,DIVISION,STATEFP,STATENS,GEOID,STUSPS,NAME,LSAD,MTFCC,FUNCSTAT,ALAND,AWATER,INTPTLAT,INTPTLON,Shape_Leng,Shape_Le_1,Shape_Area,geometry
0,G120,3,5,12,294478,12,FL,Florida,0,G4000,A,138949100000.0,31361100000.0,28.4574302,-82.4091477,20681080.0,198.478218,13.54037,"MULTIPOLYGON (((-82.88482 24.62121, -82.88619 ..."


In [7]:
# convert into geoseries
fl_gs = gpd.GeoSeries(fl_gdf.geometry[0])
type(fl_gs) # Read more - https://geopandas.org/en/stable/docs/reference/geoseries.html

geopandas.geoseries.GeoSeries

<h3> Detailed breakdown of code below </h3><br>
<b>For Srinath</b>: GeoDataFrame(gdf) has both data, geometry. Try printing of <code>fl_gdf.head(1)</code>, you could observe a column that stores geometry.

In [9]:
fl_gs

0    MULTIPOLYGON (((-82.88482 24.62121, -82.88619 ...
dtype: geometry

Furthermore, Florida is a <b>multipolygon</b> feature: <code>fl_gs</code> (geoseries), i.e., it is a single feature that contains multiple polygons (Think of India with Andaman & Nicobar islands and Lakshadweep, even <b>India</b> is a multi-polygon). 

Geometry for each polygon is extracted using <code>.geoms</code> property.

Therefore, we are extracting <b>geometry</b> from the gdf using <code> fl_gdf.geometry[0] or fl_gdf.geometry.iloc[0] </code>

In [9]:
# get the geometry of the first feature (i.e., the MultiPolygon for Florida)
fl_multipolygon = fl_gdf.geometry[0]

# initialize a counter for the total number of vertices, feature count
num_vertices_total = 0
feature_count = 0

# iterate over each individual polygon in the MultiPolygon
for fl_polygon in fl_multipolygon.geoms:
    
    # get the number of vertices in the polygon
    num_vertices = len(fl_polygon.exterior.coords)
    
    # add the number of vertices for this polygon to the total
    num_vertices_total += num_vertices
    feature_count += 1

# print the total number of vertices
print(f'Total number of vertices for Florida MultiPolygon: {num_vertices_total}')
# print number of polygons in the Multi polygon feature
print(f'Number of polygons: {feature_count}')

Total number of vertices for Florida MultiPolygon: 1087787
Number of polygons: 4225


### For NN Labels

In [14]:
vertices_list = []

for fl_polygon in fl_multipolygon.geoms:
    vertices_list.extend(list(fl_polygon.exterior.coords))

In [18]:
import pandas as pd

In [20]:
fl_vertices_df = pd.DataFrame(vertices_list, columns = ['Longitude', 'Latitude'])
fl_vertices_df

Unnamed: 0,Longitude,Latitude
0,-82.884818,24.621208
1,-82.886194,24.620151
2,-82.887223,24.620801
3,-82.887196,24.622216
4,-82.886657,24.623717
...,...,...
1087782,-80.915639,29.013240
1087783,-80.915565,29.013235
1087784,-80.915439,29.013267
1087785,-80.915339,29.013268


In [21]:
fl_simplified_gdf = gpd.read_file(r'Florida_0.30/Florida_0.30.shp')

In [22]:
# get the geometry of the first feature (i.e., the MultiPolygon for Florida)
fl_multipoly_simplify = fl_simplified_gdf.geometry[0]

In [23]:
vertices_list_simplify = []

for fl_polygon in fl_multipoly_simplify.geoms:
    vertices_list_simplify.extend(list(fl_polygon.exterior.coords))

In [25]:
fl_vertices_simplify_df = pd.DataFrame(vertices_list_simplify, columns = ['Longitude', 'Latitude'])
fl_vertices_simplify_df

Unnamed: 0,Longitude,Latitude
0,-82.125002,24.597091
1,-82.106804,24.589948
2,-82.099419,24.575389
3,-82.105336,24.558989
4,-82.110735,24.559233
...,...,...
3385,-80.914467,29.056751
3386,-80.908516,29.046204
3387,-80.909045,29.033741
3388,-80.920265,29.028362


In [26]:
fl_vertices_df['case'] = fl_vertices_df.apply(lambda row: 'yes' if (row['Longitude'], row['Latitude']) in set(zip(fl_vertices_simplify_df['Longitude'], fl_vertices_simplify_df['Latitude'])) else 'no', axis=1)

In [27]:
fl_vertices_df

Unnamed: 0,Longitude,Latitude,case
0,-82.884818,24.621208,no
1,-82.886194,24.620151,no
2,-82.887223,24.620801,no
3,-82.887196,24.622216,no
4,-82.886657,24.623717,no
...,...,...,...
1087782,-80.915639,29.013240,no
1087783,-80.915565,29.013235,no
1087784,-80.915439,29.013267,no
1087785,-80.915339,29.013268,no


In [28]:
fl_vertices_df.to_csv('vertices.csv')