**Testing directory operations**

In [None]:
import os

In [None]:
cwd = os.getcwd()

In [None]:
cwd_parts = cwd.split("/")

In [None]:
cwd_parts

In [None]:
cwd_parts[len(cwd_parts)-1]

**Testing df index operations**

06/01/2021 context of these df operations:

* Manual remerging of segments shall be enabled via a CLI (> manualMergeCLIFlow_segs.py).
* Manual remerging is done by computing clustering solutions for two different buffer sizes and determining the differences between these solutions; segment clusters are compared based on strings containing the sorted ids of the individual segments contained in them.
* Segments are fractions of OSM ways; therefore, they inherit a range of properties from their 'parent' highways. 'id' is one of them. Hence, values in the 'id'-column aren't unique. I thought that would cause a problem later on, as clusters are compared on hash values of the ids contained in them. (Hash values can only be computed for unique values.)
* The clustering solution comparison procedure goes like this: **(1)** map ids to lists; **(2)** if two segments end up in the same cluster and hence ar geographically merged and property-wise aggregated, their ids are concatenated; **(3)** in the end, those id lists are sorted (to ensure identical clusters are recognized as such when represented as strings) and converted to string (so they can be put into sets, i.e. hashed); **(4)** then, two data sets that are based on clustering solutions obtained with different values for buffer size can be compared using their respective sets of segment clusters, with each segment cluster (irrespective of its containing a single or multiple segment/s) represented by a string of id/s.

In [None]:
d = {'col1': [1, 2, 7, 4, 5, 6, 2], 'col2': [3, 4, 9, 3, 5, 3, 5]}

In [None]:
import pandas as pd

In [None]:
df = pd.DataFrame(data=d)

In [None]:
df

In [None]:
df.index

In [None]:
df['list_ind'] = df.index.map(lambda x: [x])

In [None]:
df

**Why aren't some segments merged when they should be?**

09/02/2021 Debugging - summary.

Problem: Particularly in Stuttgart, not all segments that should be merged(i.e., if two segments overlap in a location where there's no junction, they should be merged in this location to avoid having odd breaks in street segments) are merged.

What could cause this?
1. The geospatial overlap isn't detected (unlikely)
2. A junction is incorrectly assumed to be located in the segments' overlap.
3. The neighbour_param (number of rows that should be considered above and below each data point when checking for neighbouring segments) is too small.
4. Some logical criteria are either flawed or not implemented correctly.

***(1) Checking if geospatial overlap is detected***

In [None]:
import shapely
from shapely.geometry.polygon import Polygon
import geopandas as gpd
import folium

EXHIBIT 1: Cannstatter Str.

In [None]:
p1 = Polygon(zip([48.78901338983428, 48.78886814152739, 48.78877049992553, 48.78906349638634, 48.78905966057722, 48.78914987672392, 48.78919690583257, 48.78916972836784, 48.789093624606906, 48.78939944823923, 48.7894017882983, 48.78942689416001, 48.789474610295606, 48.7897707467031, 48.790328666069314, 48.790434461038586, 48.79017599748566, 48.78986868836214, 48.78922872316659, 48.78923069647896, 48.78919671456495, 48.789173606310015, 48.78916170119815, 48.78901338983428],[9.191734582980951, 9.191548775617719, 9.191647953182393, 9.191993709262423, 9.191998275769556, 9.19209908446235, 9.19216928314215, 9.192150933625328, 9.192270279054602, 9.19247676345146, 9.192475103920387, 9.192512578548284, 9.1924619459685, 9.192792852632342, 9.193696611191871, 9.193609584208584, 9.193158177295256, 9.192694009515112, 9.19197876279125, 9.19197655855987, 9.191938718974587, 9.19189096683244, 9.191899730844632, 9.191734582980951]))

In [None]:
p2 = Polygon(zip([48.794497702542316, 48.79259004665616, 48.79037350721767, 48.79026889839811, 48.79248615290849, 48.792841935479736, 48.79246156577028, 48.792332246513595, 48.79222812727993, 48.79234465363009, 48.79234378184505, 48.7926453788676, 48.79407010244168, 48.79429176598952, 48.79467434863853, 48.794700534648655, 48.79471158899031, 48.79471684439578, 48.794765430308814, 48.79481115324505, 48.79536532032411, 48.795371749744334, 48.795567361737305, 48.795561021629666, 48.79574835267022, 48.795859478661654, 48.79572020390558, 48.795721502720475, 48.79559606939611, 48.79544192354787, 48.795439234317854, 48.795390153937504, 48.794889052621116, 48.79468606856374, 48.79468698792446, 48.79468135554041, 48.79469059058135, 48.79465600090749, 48.79463201297187, 48.794631095047336, 48.794497702542316],[9.200098633305618, 9.197230751469688, 9.193505825520955, 9.19359487156822, 9.197320974040798, 9.197867823070421, 9.197306933977416, 9.197096678761154, 9.197186530829585, 9.197375986838134, 9.197376761522994, 9.19784627348296, 9.199922645184802, 9.20031154061775, 9.201053385871042, 9.201131022779395, 9.201125596616269, 9.201135787087821, 9.201099167925507, 9.201076724245153, 9.202321351487102, 9.202317473570055, 9.202817593266646, 9.202818556034984, 9.20324872881619, 9.203172324899375, 9.20285250557114, 9.202851571588427, 9.202582929361528, 9.20213726573322, 9.202138732139126, 9.201999778477635, 9.200855190103246, 9.200455246941896, 9.200454526939676, 9.200443936953262, 9.200433414544051, 9.200396004094689, 9.200348740231062, 9.200349437371056, 9.200098633305618]))

In [None]:
myMap = folium.Map(location=[48.7825,9.1831], zoom_start=15, tiles='cartodbpositron', prefer_canvas=True)

In [None]:
myMap

In [None]:
def plotPoly(poly, mmaapp):
    
    lats, lons = poly.exterior.coords.xy
            
    poly_swapped = Polygon(zip(lons, lats))
            
    poly_geoDf = gpd.GeoDataFrame(index=[0], crs="EPSG:4326", geometry=[poly_swapped])
        
    folium.GeoJson(poly_geoDf, style_function=lambda x: {'fillColor': '#ff1493', 'lineColor': '#F5FFFA'}).add_to(mmaapp)

In [None]:
plotPoly(p1, myMap)

In [None]:
plotPoly(p2, myMap)

In [None]:
p1.intersects(p2)

EXHIBIT 2: Planckstr.

In [None]:
planck1 = Polygon(zip([48.77064220614648, 48.77027902282031, 48.770216964544325, 48.77113233850344, 48.77119235529778, 48.77064220614648],[9.196586155856163, 9.196367388998445, 9.196471711010851, 9.197003618675843, 9.196898098011804, 9.196586155856163]))

In [None]:
planck2 = Polygon(zip([48.77468928993937, 48.774268919442015, 48.77360000157447, 48.773511553093236, 48.774418340546156, 48.7748303604466, 48.774876706782905, 48.77468928993937],[9.20048566238255, 9.19998872394114, 9.198684661553976, 9.198762937210928, 9.200424985845693, 9.200660633352422, 9.200548999837249, 9.20048566238255]))

In [None]:
planck3 = Polygon(zip([48.77219820144769, 48.77111066268884, 48.771046967436426, 48.77235547948136, 48.772419801472786, 48.77219820144769],[9.197564801529339, 9.19684877415754, 9.196952073860398, 9.197812897853167, 9.197710004500362, 9.197564801529339]))

In [None]:
planck4 = Polygon(zip([48.77323221837213, 48.77312376272209, 48.77324130058953, 48.773311337288874, 48.77323221837213],[9.198341004462575, 9.198409381464053, 9.198502372117122, 9.198403599715435, 9.198341004462575]))

In [None]:
planck5 = Polygon(zip([48.77624773684407, 48.77563090211576, 48.775221393399406, 48.774784163788624, 48.77473711972361, 48.775147298335035, 48.77555304765636, 48.776323494047574, 48.77695086657948, 48.77695426476902, 48.777344100084086, 48.77742419372668, 48.77719998370119, 48.77700097491153, 48.776867070179186, 48.776411936668836, 48.77641318198939, 48.77638864067995, 48.776351730199316, 48.776349677912805, 48.77624773684407],[9.201738596798606, 9.201309339972696, 9.200742273975205, 9.200517248829359, 9.200628642955298, 9.200837846939482, 9.201401186724622, 9.202011424619691, 9.202824892640852, 9.202821576063716, 9.203292542175523, 9.203203240286982, 9.20295296228295, 9.20269429870298, 9.202512742932123, 9.201931414447523, 9.20193005985061, 9.201901659174382, 9.201854514516201, 9.20185656912174, 9.201738596798606]))

In [None]:
planck6 = Polygon(zip([48.77264564605393, 48.77234210924578, 48.772273434166856, 48.77296762935576, 48.772967831705195, 48.77317457837869, 48.773480092154735, 48.77357273526821, 48.77365588184199, 48.773573402081794, 48.77357542161313, 48.77345969763818, 48.77327026672514, 48.773270598350784, 48.77325911599957, 48.77323603723039, 48.77323532684282, 48.77299865403288, 48.77264564605393],[9.19788486893517, 9.197655255036794, 9.197755080776355, 9.198290286359532, 9.198289998787589, 9.198449334639406, 9.198732510231588, 9.198848514543, 9.19876282292163, 9.198659544902629, 9.198657276407232, 9.198529474562664, 9.19837105500656, 9.198370578971865, 9.198361729744029, 9.198342429154119, 9.198343395897226, 9.19816099670227, 9.19788486893517]))

In [None]:
def plotPoly2(poly, poly_id, mmaapp):
    
    lats, lons = poly.exterior.coords.xy
            
    poly_swapped = Polygon(zip(lons, lats))
            
    poly_geoDf = gpd.GeoDataFrame(index=[0], crs="EPSG:4326", geometry=[poly_swapped])
        
    folium.GeoJson(poly_geoDf, style_function=lambda x: {'fillColor': '#ff1493', 'lineColor': '#F5FFFA'}, tooltip=f"Id: {poly_id}").add_to(mmaapp)

In [None]:
for poly, poly_id in [(planck1,1), (planck2,2), (planck3,3), (planck4,4), (planck5,5), (planck6,6)]:
    
    plotPoly2(poly, poly_id, myMap)

In [None]:
myMap

In [None]:
planck_d = {'id': [1, 2, 3, 4, 5, 6], 'geometry': [planck1, planck2, planck3, planck4, planck5, planck6]}

In [None]:
planck_df = pd.DataFrame(data=planck_d)

In [None]:
planck_df

In [None]:
planck_df['index'] = planck_df.index

In [None]:
planck_df

Read data set for Stuttgart containing all junctions (large and small)

In [None]:
import os
import pandas as pd
from geopandas import GeoSeries
from shapely.geometry import Point
from itertools import starmap

In [None]:
junction_path = os.path.join("junctions", "stuttgart_junctions_for_segs.csv")

In [None]:
junction_data = pd.read_csv(junction_path)

In [None]:
junction_data

In [None]:
junctionlats = junction_data.lat.values
junctionlons = junction_data.lon.values
junctionpoints = GeoSeries(map(Point, zip(junctionlats, junctionlons)))

In [None]:
def findNeighbours(segs, junctionpoints):

    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    # Define some inner functions we'll need for determining the segments' neighbours.

    ## a) isIntersectionValid: as a neighbouring segment is defined as a segment whose polygon a segments' polygon 
    ##                         intersects with WITHOUT a junction being contained in that intersection, this function
    ##                         checks for junctions in intersections.

    def isIntersectionValid(polyOne, outerInd, polyTwo, innerInd):

        if polyOne == polyTwo:
            
            return False
        
        intersection = polyOne.intersection(polyTwo)

        junctions_in_intersection = junctionpoints[lambda x: x.within(intersection)]
        
        if junctions_in_intersection.empty:
                                
            return True
                                
        else:
    
            print(f"Not merging because of junctions in segment overlap: {junctions_in_intersection}")

            return False

    ## b) getNeighbours: unsurprisingly, this function assigns each segment its neighbours (definition of 'neighbour'
    ##                   in this context: see above)

    def getNeighbours(outerInd, outerPoly):
        
        neighbours = []
        
        # lower = max(outerInd-neighbourParam, 0)

        lower = 0
        
        # upper = min(outerInd+neighbourParam, len(unfoldedOddballs)-1)

        upper = len(segs)-1
        
        # Use buffer trick if polygon is invalid
        # https://stackoverflow.com/questions/13062334/polygon-intersection-error-in-shapely-shapely-geos-topologicalerror-the-opera
        
        if not(outerPoly.is_valid):
            
            outerPoly = outerPoly.buffer(0)
        
        for i in range(lower, upper):
        
            innerID = segs.at[i,'id']
            
            innerPoly = segs.at[i,'geometry']
            
            # Use buffer trick if polygon is invalid
            
            if not(innerPoly.is_valid):
            
                innerPoly = innerPoly.buffer(0)
                        
            if outerPoly.intersects(innerPoly): 
                    
                validIntersection = isIntersectionValid(outerPoly, outerInd, innerPoly, i)

                if validIntersection:
                        
                    neighbours.append(innerID)
                        
        return neighbours

    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    segs['neighbours'] = [x for x in starmap(getNeighbours, list(zip(segs['index'],segs['geometry'])))]

    return segs

In [None]:
findNeighbours(planck_df, junctionpoints)

In [None]:
planck_df

***(2) Checking if a junction is incorrectly assumed to lie in the two segments' intersection***

In [None]:
intersection = p1.intersection(p2)

In [None]:
junctions_in_intersection = junctionpoints[lambda x: x.within(intersection)]

In [None]:
junctions_in_intersection

In [None]:
junctions_in_intersection.empty

No junctions discovered in this intersection, so that can't be it.

***(3) The neighbour_param (number of rows that should be considered above and below each data point when checking for neighbouring segments) is too small.***

Testing by removing the neighbour_param, i.e. searching the entire data frame for potential neighbours.

---------------------------------------------------------------------------------------------------------------------

Testing a function for path concatenation relative to the location of the script currently being executed.

In [None]:
def getSubDirPath (file_, subdir1, subdir2):

    # Concatenate path using os library so system can tell which part of the
    # path is a directory and which is a file name.

    curr_dir = os.path.abspath('')

    file_path = os.path.join(curr_dir, subdir1, subdir2, file_)

    return file_path

In [None]:
file_path = getSubDirPath("hannover_segments_buffer=1", "segments", "pickled_data")

In [None]:
file_path

In [None]:
import pandas as pd

In [None]:
hannover_segments = pd.read_pickle(file_path)

In [None]:
hannover_segments['id']

### Testing multiprocessing etc - optimization of neighbour search (26./27.04.)

(1) Load oddball pickle for Wedding, our test data set

In [None]:
import pandas as pd

In [None]:
import os

In [None]:
import sys

In [None]:
from multiprocessing import set_start_method
set_start_method("spawn")

In [None]:
wedding_oddballs = pd.read_pickle("oddball_pickle")

In [None]:
region = "wedding"

(2) Read the respective junctions data set

In [None]:
def getSubDirPath (file_, subdir1, subdir2):

    # Concatenate path using os library so system can tell which part of the
    # path is a directory and which is a file name.

    curr_dir = os.path.abspath('')

    file_path = os.path.join(curr_dir, subdir1, subdir2, file_)

    return file_path

In [None]:
subdir_path = getSubDirPath(f"{region}_junctions_for_segs.csv", "junctions","csv_data")

In [None]:
subdir_path

In [None]:
try:
    junctionsdf = pd.read_csv(subdir_path)
except FileNotFoundError: 
    print("Junctions file wasn't found! Please execute OSM_jcts.py for this region to generate it.")
    sys.exit()

Grab the larger junctions (>= 2 larger highways intersecting)

In [None]:
larger_jcts = junctionsdf[junctionsdf['junction'] == 'large_junction']

larger_jctids = larger_jcts['id'].values 

(3) Functionality for determining whether two segments are neighbours, i.e., they share a node at either end that is not a junction (plainly speaking, there's an odd break where there shouldn't be one)

In [None]:
def sharedNonJunctionNode(outerNodes, innerNodes):

    outerLastNodeIdx = len(outerNodes) - 1

    innerLastNodeIdx = len(innerNodes) - 1

    # Scenario 1: the first node belonging to the outer segment is not a junction.
    #             Check if the inner segment contains this node too (at one of its ends).

    if outerNodes[0] not in larger_jctids:

        if (innerNodes[0] == outerNodes[0] or innerNodes[innerLastNodeIdx] == outerNodes[0]):

            return True

    if outerNodes[outerLastNodeIdx] not in larger_jctids:

        if (innerNodes[0] == outerNodes[outerLastNodeIdx] or innerNodes[innerLastNodeIdx] == outerNodes[outerLastNodeIdx]):

            return True

        else:

            return False

In [None]:
def getNeighbours(outerNodes):

    neighbours = []

    for index, row in wedding_oddballs.iterrows(): 

        common_nodes = set(outerNodes).intersection(set(row['segment_nodes_ids']))
        
        no_jcts = [x for x in common_nodes if x not in larger_jctids] # super pythonic list comprehension
        
        if no_jcts: # using the implicit booleanness of a list is quite pythonic

            neighbours.append(index)

    return neighbours

In [None]:
def getNeighbours2(outerNodes):

    neighbours = []
    
    mylambda = lambda x: sharedNonJunctionNode(outerNodes, x)
    
    target = wedding_oddballs.loc[wedding_oddballs['segment_nodes_ids'].apply(mylambda)]

    neighbours = target.index.tolist()
    
    return neighbours

In [None]:
wedding_oddballs = wedding_oddballs.dropna(subset=['segment_nodes_ids'])

In [None]:
import time

In [None]:
start_time = time.time()
wedding_oddballs['neighbours'] = wedding_oddballs['segment_nodes_ids'].map(getNeighbours)
print("--- %s seconds ---" % (time.time() - start_time))

In [None]:
start_time = time.time()
wedding_oddballs['neighbours'] = wedding_oddballs['segment_nodes_ids'].map(getNeighbours2)
print("--- %s seconds ---" % (time.time() - start_time))

In [None]:
import pathos
import numpy as np

In [None]:
n_cores = 4 # is this correct?

# split the data frame into 4 chunks - what is the return value? A list containing the four chunks?
# Yes, it's a list with four elements (the four df chunks).

oddballs_split = np.array_split(wedding_oddballs, n_cores)

# Map each of the four data set chunks onto the only column we're interested in

outer_nodes_chunks = list(map(lambda x : x['segment_nodes_ids'], oddballs_split))

# To match up, put four copies of the unfoldedOddballs df into a list:

dfs = [wedding_oddballs, wedding_oddballs, wedding_oddballs, wedding_oddballs]

_pp = pathos.pools._ProcessPool(n_cores)

res = _pp.starmap(getNeighbours, zip(outer_nodes_chunks, dfs))

# What is the type of res ??? 

In [None]:
type(outer_nodes_chunks)

In [None]:
list(zip(outer_nodes_chunks, dfs))

In [None]:
len(outer_nodes_chunks)

In [None]:
outer_nodes_chunks[0]

In [None]:
from multiprocessing import Pool, get_context, Manager
import numpy as np

with get_context("spawn").Pool(4) as pool:
    
    mgr = Manager()
    ns = mgr.Namespace()
    ns.df = wedding_oddballs
    
    # split the data frame into 4 chunks - what is the return value? A list containing the four chunks?
    # Yes, it's a list with four elements (the four df chunks).

    oddballs_split = np.array_split(wedding_oddballs, 4)

    # Map each of the four data set chunks onto the only column we're interested in

    outer_nodes_chunks = list(map(lambda x : x['segment_nodes_ids'], oddballs_split))

    ns_arg = [ns, ns, ns, ns]

    args = list(zip(outer_nodes_chunks, ns_arg)) #list of args - df is the same object in each tuple

    start_time = time.time()
    
    res = pool.map(getNeighbours, args) #func is some arbitrary function
    pool.close()
    pool.join()

    print("--- %s seconds ---" % (time.time() - start_time))

## Edit 28/04: following best practices, let's try to optimize without using parallelism.

**(1) Import libraries**

In [None]:
import pandas as pd
import os
import sys
import time
import numpy as np

**(2) Import data (wedding segments & junctions)**

In [None]:
wedding_oddballs = pd.read_pickle("oddball_pickle")

In [None]:
region = "wedding"

In [None]:
def getSubDirPath (file_, subdir1, subdir2):

    # Concatenate path using os library so system can tell which part of the
    # path is a directory and which is a file name.

    curr_dir = os.path.abspath('')

    file_path = os.path.join(curr_dir, subdir1, subdir2, file_)

    return file_path

In [None]:
subdir_path = getSubDirPath(f"{region}_junctions_for_segs.csv", "junctions","csv_data")

In [None]:
try:
    junctionsdf = pd.read_csv(subdir_path)
except FileNotFoundError: 
    print("Junctions file wasn't found! Please execute OSM_jcts.py for this region to generate it.")
    sys.exit()

Grab the larger junctions (>= 2 larger highways intersecting)

In [None]:
larger_jcts = junctionsdf[junctionsdf['junction'] == 'large_junction']

larger_jctids = larger_jcts['id'].values 

**(3) Functionality for determining whether two segments are neighbours, i.e., they share a node at either end that is not a junction (plainly speaking, there's an odd break where there shouldn't be one)**

In [None]:
def getNeighbours(outerNodes, outerId):

    common_nodes = wedding_oddballs['segment_nodes_ids'].map(lambda innerNodes: set(innerNodes).intersection(set(outerNodes)))
    
    common_nodes_list = common_nodes.map(lambda x: list(x))
    
    common_nodes_nojcts = common_nodes_list.map(lambda cns: [x for x in cns if x not in larger_jctids])
    
    neighbours = [i for i in range(len(common_nodes_nojcts)) if common_nodes_nojcts[i]]
    
    neighbours_without_self = [x for x in neighbours if x != outerId]

    return neighbours_without_self

In [None]:
from itertools import starmap

In [None]:
start_time = time.time()
wedding_oddballs['neighbours'] = [x for x in starmap(getNeighbours, zip(wedding_oddballs['segment_nodes_ids'], wedding_oddballs.index))]
print("--- %s seconds ---" % (time.time() - start_time))

In [None]:
res

In [None]:
print(wedding_oddballs['neighbours'])

In [None]:
type(res[0][0])

### Testing df index resetting

In [None]:
import pandas as pd

In [None]:
df = pd.DataFrame({'poly_lats': [1, 4, 7, 10],
                   'poly_lons': [2012, 2014, 2013, 2014],
                   'highwaynames': [55, 40, 84, 31]})

In [None]:
df.set_index('month', inplace=True)

In [None]:
df

In [None]:
df.index

In [None]:
df.reset_index(inplace = True, drop = False)

In [None]:
df

### Debugging the weirdest error ever

In [None]:
from shapely.geometry.polygon import Polygon

In [None]:
import folium

In [None]:
import geopandas as gpd

In [None]:
lats_1 = [51.33285120366603, 51.33265043471369, 51.33243619076528, 51.33263695878674, 51.33285120366603]

In [None]:
lons_1 = [12.371268724808056, 12.370926701682338, 12.371247275311967, 12.371589298228095, 12.371268724808056]

In [None]:
lats_2 = [51.332555164489754, 51.332493574527625, 51.332334405639976, 51.332170897230306, 51.33209882993941, 51.33216041938109, 51.33231958766371, 51.33248309659383, 51.332555164489754]

In [None]:
lons_2 = [12.37131869277576, 12.371057651418353, 12.370942617783772, 12.371040975825904, 12.371295107369345, 12.371556146791685, 12.371671182107923, 12.371572826000861, 12.37131869277576]

In [None]:
myMap = folium.Map(location=[51.3403333, 12.37475], zoom_start=15, tiles='cartodbpositron')

In [None]:
myMap

In [None]:
poly_1 = Polygon(zip(lons_1, lats_1))

In [None]:
style = {'fillColor': '#ff1493', 'lineColor': '#F5FFFA'}

In [None]:
poly1_geoDf = gpd.GeoDataFrame(index=[0], crs="EPSG:4326", geometry=[poly_1])
        
folium.GeoJson(poly1_geoDf, style_function=lambda x: style).add_to(myMap)

In [None]:
poly_2 = Polygon(zip(lons_2, lats_2))

In [None]:
poly2_geoDf = gpd.GeoDataFrame(index=[0], crs="EPSG:4326", geometry=[poly_2])
        
folium.GeoJson(poly2_geoDf, style_function=lambda x: style).add_to(myMap)

In [None]:
def largeIntersection(poly1, poly2):
    return poly1.intersects(poly2) and ((poly1.intersection(poly2).area/poly1.area)*100) > 8

In [None]:
largeIntersection(poly_2,poly_1)

Incredibly, neighbour discovery is asymmetrical in cases where polygons are of different sizes!

### 21/05 Testing dealing with missing highwaynames

In [1]:
import os

In [2]:
def getSubDirPath (file_, subdir1, subdir2):

    # Concatenate path using os library so system can tell which part of the
    # path is a directory and which is a file name.

    curr_dir = os.path.abspath('')

    file_path = os.path.join(curr_dir, subdir1, subdir2, file_)

    return file_path

In [3]:
subdir_path = getSubDirPath("bern_segments", "segments","pickled_data")

In [4]:
subdir_path

'/Users/theresatratzmuller/Library/Mobile Documents/com~apple~CloudDocs/Code/SimRa/Analyze_Pipeline/PyPipeline_/segments/pickled_data/bern_segments'

In [5]:
import pandas as pd

In [6]:
bern_segments = pd.read_pickle(subdir_path)

In [7]:
bern_segments

Unnamed: 0,neighbour_cluster,id,highwayname,highwaytype,highwaylanes,lanes:backward,segment_nodes_ids,seg_length,poly_geometry,poly_vertices_lats,poly_vertices_lons
0,0.0,"[1, 313, 958, 1490, 1901, 3229, 3238, 3239, 32...","Schützenmattstrasse, Lorrainebrücke, Lorraineb...","primary, secondary, secondary, primary, second...","2, 4, 3, 2, 2, unknown, 1, 1, 1, 1, 1, 1","unknown, 2, 1, unknown, 1, unknown, unknown, u...","[338958784, 3811147154, 2049637628, 5564049529...",254.330367,"POLYGON ((46.95337456559583 7.44015896270303, ...","[46.95337456559583, 46.95321863592993, 46.9531...","[7.4401589627030305, 7.4396840953471095, 7.439..."
1,0.0,"[1, 313, 958, 1490, 1901, 3229, 3238, 3239, 32...","Schützenmattstrasse, Lorrainebrücke, Lorraineb...","primary, secondary, secondary, primary, second...","2, 4, 3, 2, 2, unknown, 1, 1, 1, 1, 1, 1","unknown, 2, 1, unknown, 1, unknown, unknown, u...","[338958784, 3811147154, 2049637628, 5564049529...",254.330367,"POLYGON ((46.95337456559583 7.44015896270303, ...","[46.95337456559583, 46.95321863592993, 46.9531...","[7.4401589627030305, 7.4396840953471095, 7.439..."
2,1.0,"[10, 310, 311, 522, 1697, 3172, 3173, 3174, 31...","Marzilistrasse, Sulgeneckstrasse, Sandrainstra...","residential, residential, residential, residen...","2, 1, 2, unknown, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1...","unknown, unknown, unknown, unknown, unknown, u...","[564770755, 4663118624, 2860577434, 8563870939...",90.306859,"POLYGON ((46.94083638700465 7.440002127689091,...","[46.94083638700465, 46.94082081543674, 46.9407...","[7.440002127689091, 7.439466095667876, 7.43947..."
3,2.0,"[11, 1708]","Marzilistrasse, Marzilistrasse","residential, residential","2, 1","unknown, unknown","[564770865, 1015722344, 5071143424, 5071143424...",42.064801,"POLYGON ((46.94207754841979 7.442953375904438,...","[46.94207754841979, 46.9422683857725, 46.94226...","[7.4429533759044375, 7.443111569905079, 7.4431..."
4,3.0,"[12, 608, 1118]","Wasserwerkgasse, Wasserwerkgasse, Wasserwerkgasse","residential, residential, residential","unknown, unknown, unknown","unknown, unknown, unknown","[3920905, 1311078013, 3920905, 2896744316, 625...",12.908663,"POLYGON ((46.94821498446817 7.457734462788174,...","[46.94821498446817, 46.94826636884634, 46.9481...","[7.457734462788174, 7.457631067304516, 7.45758..."
...,...,...,...,...,...,...,...,...,...,...,...
2125,2128.0,[3290],unknown,pedestrian,unknown,unknown,"[7788528441, 7788528444]",2.630357,"POLYGON ((46.94447194674717 7.449176561304245,...","[46.94447194674717, 46.94436320741938, 46.9443...","[7.449176561304245, 7.449157804115087, 7.44930..."
2126,2129.0,[3291],unknown,pedestrian,unknown,unknown,"[7788528444, 7788528442]",4.953128,"POLYGON ((46.94446393619113 7.449112476473884,...","[46.944463936191134, 46.94435513999867, 46.944...","[7.449112476473884, 7.449093917062028, 7.44926..."
2127,2130.0,[3292],unknown,pedestrian,unknown,unknown,"[7788528442, 7788528439]",2.622865,"POLYGON ((46.94445974411904 7.449078640241551,...","[46.944459744119044, 46.94435099071409, 46.944...","[7.449078640241551, 7.449059932126568, 7.44920..."
2128,2131.0,[3293],unknown,cycleway,unknown,unknown,"[7842535098, 7842535097]",51.914426,"POLYGON ((46.95365931297508 7.423715375049648,...","[46.95365931297508, 46.95367731008164, 46.9532...","[7.423715375049648, 7.423704272948659, 7.42335..."


In [8]:
highways_no_names = bern_segments[bern_segments['highwayname'] == 'unknown'].copy()

In [9]:
highways_no_names

Unnamed: 0,neighbour_cluster,id,highwayname,highwaytype,highwaylanes,lanes:backward,segment_nodes_ids,seg_length,poly_geometry,poly_vertices_lats,poly_vertices_lons
362,366.0,[1957],unknown,pedestrian,unknown,unknown,"[495317262, 8538773417]",2.102635,"POLYGON ((46.94345671021104 7.450378763126369,...","[46.943456710211045, 46.94356234408482, 46.943...","[7.450378763126369, 7.450306023136671, 7.45023..."
363,367.0,[1966],unknown,pedestrian,unknown,unknown,"[2512585469, 16268104]",6.899291,"POLYGON ((46.94386955184175 7.450554789706517,...","[46.94386955184175, 46.94394897047203, 46.9437...","[7.4505547897065165, 7.450445391915476, 7.4504..."
376,380.0,[2319],unknown,cycleway,unknown,unknown,"[613042472, 613042468]",20.740621,"POLYGON ((46.96050869691224 7.460956730263994,...","[46.96050869691224, 46.96051458955345, 46.9606...","[7.460956730263994, 7.460975835492043, 7.46077..."
379,383.0,[2356],unknown,cycleway,unknown,unknown,"[6539506029, 6143536327]",4.302865,"POLYGON ((46.95818862994864 7.438664477553824,...","[46.95818862994864, 46.95820189435418, 46.9582...","[7.438664477553824, 7.4386771172124595, 7.4386..."
382,386.0,[2842],unknown,cycleway,unknown,unknown,"[5515123931, 7849477505]",4.382606,"POLYGON ((46.95352730125781 7.424030216388764,...","[46.953527301257814, 46.953538942004585, 46.95...","[7.424030216388764, 7.424044750412956, 7.42397..."
...,...,...,...,...,...,...,...,...,...,...,...
2125,2128.0,[3290],unknown,pedestrian,unknown,unknown,"[7788528441, 7788528444]",2.630357,"POLYGON ((46.94447194674717 7.449176561304245,...","[46.94447194674717, 46.94436320741938, 46.9443...","[7.449176561304245, 7.449157804115087, 7.44930..."
2126,2129.0,[3291],unknown,pedestrian,unknown,unknown,"[7788528444, 7788528442]",4.953128,"POLYGON ((46.94446393619113 7.449112476473884,...","[46.944463936191134, 46.94435513999867, 46.944...","[7.449112476473884, 7.449093917062028, 7.44926..."
2127,2130.0,[3292],unknown,pedestrian,unknown,unknown,"[7788528442, 7788528439]",2.622865,"POLYGON ((46.94445974411904 7.449078640241551,...","[46.944459744119044, 46.94435099071409, 46.944...","[7.449078640241551, 7.449059932126568, 7.44920..."
2128,2131.0,[3293],unknown,cycleway,unknown,unknown,"[7842535098, 7842535097]",51.914426,"POLYGON ((46.95365931297508 7.423715375049648,...","[46.95365931297508, 46.95367731008164, 46.9532...","[7.423715375049648, 7.423704272948659, 7.42335..."


In [None]:
highways_with_names = bern_segments[bern_segmentsdf['name'] != 'unknown'].copy()