# Apache Sedona and OSM

In this tutorial, we will use apache sedona to read geospatial data which are provided by OSM(Open Street Map).
OSM provides data with various format (e.g. PBF, shapefile, etc.). In this tutorial, we will use `PBF format` ([Protocolbuffer Binary Format](https://wiki.openstreetmap.org/wiki/PBF_Format)).

## 1. Get the sample data

The sample data which I will use in this tutorial is the download from this page 
https://download.geofabrik.de/europe/france.html . I use the `Ile-de-France` map (`ile-de-france-latest.osm.pbf`)


In [1]:
from sedona.spark import *
from pathlib import Path
import json

from ipyleaflet import Map, basemaps, basemap_to_tiles, MarkerCluster, Marker, AwesomeIcon
from ipywidgets import Layout
import numpy as np

Skipping SedonaKepler import, verify if keplergl is installed
Skipping SedonaPyDeck import, verify if pydeck is installed


In [2]:
# build a sedona session (sedona = 1.5.1)
config = SedonaContext.builder() \
    .appName("Sedona with pyspark") \
    .master("local[*]") \
    .config("spark.driver.memory", "6g") \
    .config('spark.jars.packages',
            'com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.11,' 
            'org.apache.sedona:sedona-spark-shaded-3.0_2.12:1.4.1,' 
            'org.datasyslab:geotools-wrapper:1.4.0-28.2'). \
     getOrCreate()

# create a sedona context
sedona = SedonaContext.create(config)

24/04/17 13:42:27 WARN Utils: Your hostname, pengfei-Virtual-Machine resolves to a loopback address: 127.0.1.1; using 10.50.2.80 instead (on interface eth0)
24/04/17 13:42:27 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
:: loading settings :: url = jar:file:/home/pengfei/opt/spark/spark-3.3.0/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml


Ivy Default Cache set to: /home/pengfei/.ivy2/cache
The jars for the packages stored in: /home/pengfei/.ivy2/jars
com.acervera.osm4scala#osm4scala-spark3-shaded_2.12 added as a dependency
org.apache.sedona#sedona-spark-shaded-3.0_2.12 added as a dependency
org.datasyslab#geotools-wrapper added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-06192f74-e84f-4ac2-b30e-924c47672d80;1.0
	confs: [default]
	found com.acervera.osm4scala#osm4scala-spark3-shaded_2.12;1.0.11 in central
	found org.apache.sedona#sedona-spark-shaded-3.0_2.12;1.4.1 in central
	found org.datasyslab#geotools-wrapper;1.4.0-28.2 in central
:: resolution report :: resolve 287ms :: artifacts dl 17ms
	:: modules in use:
	com.acervera.osm4scala#osm4scala-spark3-shaded_2.12;1.0.11 from central in [default]
	org.apache.sedona#sedona-spark-shaded-3.0_2.12;1.4.1 from central in [default]
	org.datasyslab#geotools-wrapper;1.4.0-28.2 from central in [default]
	---------------------------------------

24/04/17 13:42:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


24/04/17 13:42:30 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.


                                                                                

## 2. Read the PBF format with Sedona(Spark)

By default, sedona can not read PBF file directly. But there is a `spark polyglot connector` called [osm4scala](https://simplexspatial.github.io/osm4scala/). You can also visit their github page [here](https://github.com/simplexspatial/osm4scala) 

To make it work, it's quite simple:
- online mode: In the sparkSession creation clause add the **.config('spark.jars.packages','com.acervera.osm4scala:osm4scala-spark3-shaded_2.12:1.0.11')**
- offline mode: You need to download the jar file first, then in the sparkSession creation clause add the **.config(''spark.jars', jar_path')**

> You can find more information about the jar file [here](https://simplexspatial.github.io/osm4scala/docs/spark-connector)



In [3]:
homePath = "/home/pengfei/data_set/geo_spatial"
pbfFilePath= f"{homePath}/ile-de-france-latest.osm.pbf"
parquetFilePath=f"{homePath}/ile-de-france-geo-parquet"


In [4]:
def read_osm_data(usePbf:bool):
    if usePbf:
        df = sedona.read.format("osm.pbf").load(pbfFilePath)
    else:
        # read raw_df from the parquet file
        df=sedona.read.parquet(parquetFilePath)
    return df
        

## 3. Explore the OSM dataset

### 3.1 Understand the basic OSM data structure

OpenStreetMap uses a `topological data structure`, with `four core elements` (aka data primitives):

- **Nodes**: are points with a geographic position, stored as coordinates (pairs of a latitude and a longitude) according to `WGS 84`. Outside their usage in ways, they are used to represent map features without a size, such as `points of interest` or mountain peaks.
- **Ways**: are ordered lists of `nodes`, representing a `polyline, or possibly a polygon` if they form a closed loop. They are used both for representing linear features such as streets and rivers, and areas, like forests, parks, parking areas and lakes.
- **Relations**: are `ordered lists of nodes, ways and relations` (together called "members"), where each member can optionally have a "role" (a string). Relations are used for representing the relationship of existing nodes and ways. Examples include turn restrictions on roads, routes that span several existing ways (for instance, a long-distance motorway), and areas with holes.
- **Tags**: are `key-value pairs (both arbitrary strings)`. They are used to store metadata about the map objects (such as their type, their name and their physical properties). `Tags are not freestanding, but are always attached to an object: to a node, a way or a relation`. A recommended ontology of map features (the meaning of tags) is maintained on a wiki. New tagging schemes can always be proposed by a popular vote of a written proposal in OpenStreetMap wiki, however, there is no requirement to follow this process. There are over 89 million different kinds of tags in use as of June 2017.

![OpenStreetMap_data_primitives_in_iD.png](../images/OpenStreetMap_data_primitives_in_iD.png)

In [5]:
raw_df = read_osm_data(usePbf=False)
raw_df.show()

[Stage 4:>                                                          (0 + 1) / 1]

+------+----+------------------+------------------+-----+---------+--------------------+--------------------+
|    id|type|          latitude|         longitude|nodes|relations|                tags|                info|
+------+----+------------------+------------------+-----+---------+--------------------+--------------------+
|122626|   0|49.115966300000004|         2.5549119|   []|       []|                  {}|{3, 2020-05-10 11...|
|122627|   0|49.110294100000004|         2.5521725|   []|       []|                  {}|{4, 2009-02-13 19...|
|122631|   0|        49.0834393|2.5511375000000003|   []|       []|                  {}|{15, 2021-06-30 1...|
|122632|   0|        49.0675225|2.5524679000000003|   []|       []|                  {}|{17, 2019-04-10 1...|
|122633|   0|         49.063616|2.5522412000000005|   []|       []|                  {}|{17, 2009-02-13 1...|
|122634|   0|        49.0597465|2.5509097000000005|   []|       []|                  {}|{2, 2009-02-13 19...|
|122635|  

                                                                                

In [6]:
raw_df.printSchema()

root
 |-- id: long (nullable = true)
 |-- type: byte (nullable = true)
 |-- latitude: double (nullable = true)
 |-- longitude: double (nullable = true)
 |-- nodes: array (nullable = true)
 |    |-- element: long (containsNull = true)
 |-- relations: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- id: long (nullable = true)
 |    |    |-- relationType: byte (nullable = true)
 |    |    |-- role: string (nullable = true)
 |-- tags: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)
 |-- info: struct (nullable = true)
 |    |-- version: integer (nullable = true)
 |    |-- timestamp: timestamp (nullable = true)
 |    |-- changeset: long (nullable = true)
 |    |-- userId: integer (nullable = true)
 |    |-- userName: string (nullable = true)
 |    |-- visible: boolean (nullable = true)


Here we test the geoparquet and parquet format. The .pbf format is smaller than parquet file (293Mo vs 856 Mo), which surprised me a little.

> We can't write geoparquet, because there is no geometry column

### 3.2 Filter the node and ways

As we explained before, there are four different entities(`four core elements`). In our dataset, each row has a type.
- Type 0: Node
- Type 1: Ways
- Type 2: Relations

In this tutorial, we only keep nodes and ways.

In [6]:
raw_df.select("type").distinct().show()



+----+
|type|
+----+
|   0|
|   1|
|   2|
+----+


                                                                                

In [7]:
# get all nodes 
node_df = raw_df.where("type = 0")

node_df.show(5)


+------+----+------------------+------------------+-----+---------+----+--------------------+
|    id|type|          latitude|         longitude|nodes|relations|tags|                info|
+------+----+------------------+------------------+-----+---------+----+--------------------+
|122626|   0|49.115966300000004|         2.5549119|   []|       []|  {}|{3, 2020-05-10 11...|
|122627|   0|49.110294100000004|         2.5521725|   []|       []|  {}|{4, 2009-02-13 19...|
|122631|   0|        49.0834393|2.5511375000000003|   []|       []|  {}|{15, 2021-06-30 1...|
|122632|   0|        49.0675225|2.5524679000000003|   []|       []|  {}|{17, 2019-04-10 1...|
|122633|   0|         49.063616|2.5522412000000005|   []|       []|  {}|{17, 2009-02-13 1...|
+------+----+------------------+------------------+-----+---------+----+--------------------+


In [8]:
# We need to remove useless column
node_simple_df = node_df.select("id","latitude", "longitude")
node_simple_df.show(5)

+------+------------------+------------------+
|    id|          latitude|         longitude|
+------+------------------+------------------+
|122626|49.115966300000004|         2.5549119|
|122627|49.110294100000004|         2.5521725|
|122631|        49.0834393|2.5511375000000003|
|122632|        49.0675225|2.5524679000000003|
|122633|         49.063616|2.5522412000000005|
+------+------------------+------------------+


In [9]:
# get all ways row
way_df = raw_df.where("type = 1")

way_df.show(5,truncate=False)



+------+----+--------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------+
|id    |type|latitude|longitude|nodes                                                                                                                                                                                                                                                                                |rel

                                                                                

In [14]:
relation_df = raw_df.where("type = 2")
relation_df.show(5,truncate=False)

+----+----+--------+---------+-----+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Since the latitude and longitude columns of ways do not contain any information we can remove them. We can also notice that a way contains a list of nodes(e.g. the value is the node id). If we link these nodes, we can build the way. We can consider the first node is the starting point of the way, the last node is ending point of the way. If we join all the ways with the first nodes listed in each way, we use the starting point of the way to draw the way.

In [10]:

way_simple_df = way_df.drop("id","latitude", "longitude")
way_with_gps_df = way_simple_df.join(
    node_simple_df, way_simple_df.nodes.getItem(0) == node_simple_df.id)

way_trans_df = way_with_gps_df.select("latitude", "longitude", "tags")


In [11]:
way_trans_df.show(5, truncate=False)


[Stage 15:>                                                         (0 + 1) / 1]

+------------------+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|latitude          |longitude         |tags                                                                                                                                                                                                                                                                           |
+------------------+------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|48.74317940000001 |2.3225308         |{name -> Autoroute du Sol

                                                                                

In [12]:
node_trans_df = node_df.select("latitude", "longitude", "tags")

In [13]:
hospital_and_food_df = way_trans_df.union(node_trans_df).\
    where("element_at(tags, 'amenity') in ('hospital', 'clinic','doctors')")
hospital_and_food_df.cache()
hospital_and_food_df.show()

                                                                                

+------------------+------------------+--------------------+
|          latitude|         longitude|                tags|
+------------------+------------------+--------------------+
|  48.8228839999944| 2.397936000000003|{website -> https...|
|49.101812299999956| 2.543690700000001|{name -> Maison d...|
|48.903260199999906| 2.305047599999987|{healthcare:speci...|
|48.885186399999995|2.4011061000000105|{name -> Clinique...|
|48.692460400000265|2.2671637000000016|{amenity -> docto...|
| 48.80802140000021|2.1349204000000053|{amenity -> hospi...|
|  48.8539673000001|         2.3481897|{operator:short -...|
| 48.80471840000002| 2.424477000000009|{website -> https...|
| 48.59296099999903|2.2486759999999957|{name -> Clinique...|
| 48.85146449999998|2.3714151999999813|{website -> http:...|
| 48.82344670000005|2.5382796000000027|{source -> cadast...|
|48.570080700000055| 2.432605999999993|{healthcare:speci...|
| 48.56786900000003| 2.431551999999988|{healthcare:speci...|
|48.764957100000224| 2.3

In [14]:
hospital_df = hospital_and_food_df.select("latitude", "longitude").\
    where("element_at(tags, 'amenity') == 'hospital'")
hospital_df.createOrReplaceTempView("hospital")

clinic_df = hospital_and_food_df.select("latitude", "longitude").\
    where("element_at(tags, 'amenity') == 'clinic'")
clinic_df.createOrReplaceTempView("clinic")

doctors_df = hospital_and_food_df.select("latitude", "longitude").\
    where("element_at(tags, 'amenity') == 'doctors'")
doctors_df.createOrReplaceTempView("doctors")

In [15]:
hospital_number=hospital_df.count()
print(f"Total hospital number in Ile-de-France: {hospital_number}")



Total hospital number in Ile-de-France: 315


                                                                                

In [16]:
hospital_df.show(5,truncate=False)

+------------------+------------------+
|latitude          |longitude         |
+------------------+------------------+
|48.8228839999944  |2.397936000000003 |
|48.885186399999995|2.4011061000000105|
|48.80802140000021 |2.1349204000000053|
|48.8539673000001  |2.3481897         |
|48.59296099999903 |2.2486759999999957|
+------------------+------------------+


In [17]:
clinic_number=clinic_df.count()
print(f"Total clinic number in Ile-de-France: {clinic_number}")



Total clinic number in Ile-de-France: 219


                                                                                

In [18]:
doctors_number=doctors_df.count()
print(f"Total doctor office number in Ile-de-France: {doctors_number}")



Total doctor office number in Ile-de-France: 1294


                                                                                

In [19]:
icon_hospital = AwesomeIcon(
    name='h-square',
    marker_color='green',
    icon_color='darkgreen'
)

icon_clinic = AwesomeIcon(
    name='hospital-o',
    marker_color='red',
    icon_color='black'
)

icon_doctors = AwesomeIcon(
    name='user-md',
    marker_color='blue',
    icon_color='gray'
)

In [20]:
hospital_pos = tuple([Marker(location=tuple(row), icon=icon_hospital) for row in hospital_df.limit(250).collect()])
clinic_pos  = tuple([Marker(location=tuple(row), icon=icon_clinic ) for row in clinic_df.limit(250).collect()])
doctors_pos  = tuple([Marker(location=tuple(row), icon=icon_doctors ) for row in doctors_df.limit(250).collect()])

marker_hospital = MarkerCluster(markers=hospital_pos)
marker_clinic = MarkerCluster(markers=clinic_pos)
marker_doctors = MarkerCluster(markers=doctors_pos)

latitudes =  np.array([x.location[0] for x in hospital_pos]+[x.location[0] for x in doctors_pos])
longitudes = np.array([x.location[1] for x in hospital_pos]+[x.location[1] for x in doctors_pos])
ce = [latitudes.mean(), longitudes.mean()]

m = Map(
    basemap=basemap_to_tiles(basemaps.OpenStreetMap.Mapnik),
    center=ce,
    layout=Layout(width='50%', height='800px'),
    zoom=7
)

m.add_layer(marker_hospital)
m.add_layer(marker_clinic)
m.add_layer(marker_doctors)

display(m)

                                                                                

Map(center=[48.8168256002, 2.336495916399999], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoo…

## Find a doctor near hospital

In previous sections, we have created three tables:
- hospital
- clinic
- doctors

Now we want to find doctors who are near to a hospital.

In [26]:
# You can change the table_name to hospital,clinic and doctors
table_name="clinic"
sedona.sql(f"select * from {table_name}").show()

+------------------+------------------+
|          latitude|         longitude|
+------------------+------------------+
|48.903260199999906| 2.305047599999987|
| 48.80471840000002| 2.424477000000009|
| 48.88454769999999|2.1716105999999975|
| 48.91647889999997| 2.303190799999997|
| 48.83635030000024|2.2315512999999956|
|49.040539000000216|2.3091220999999966|
| 48.96485259999991| 2.873218499999991|
| 49.08164119999964| 2.035150200000009|
|49.005708799999944| 1.693478399999992|
| 48.83592900000021| 2.414462699999997|
|49.065623800000104| 2.677090399999996|
| 48.88902659999986|2.2385880999999994|
|48.864705600000086|2.3819127999999945|
|48.861411700001675| 2.508706400000011|
| 49.02780120000006|2.2247925000000004|
|48.910013299999875|         2.4354707|
|48.814878299999954|2.4875772000000014|
| 48.88551719999999|2.2400001000000036|
| 48.77585449999995| 2.294707799999999|
| 48.71357359999996|2.2612773999999956|
+------------------+------------------+


### Convert string column to geometry column

To use the geoparquet format and sedona to calculate the distance, we need to convert the latitude/longitude string columns into geometry columns

The CRS code is the code of the CRS in the official EPSG database (https://epsg.org/) in the format of EPSG:XXXX. A community tool EPSG.io can help you quick identify a CRS code. For example, the code of WGS84 is EPSG:4326.

In [27]:
from pyspark.sql import DataFrame


def geo_df_convertor(source_df:DataFrame,lat_col_name:str, long_col_name:str,  source_epsg_code:str,target_epsg_code:str, target_geo_col_name:str="location"):
    """
    This function takes a dataframe with gps coordinate column(string type), convert the string column to a geometry column. The returned dataframe can be stored as geoparquet.
    :param source_df: 
    :type source_df: 
    :param long_col_name: longitude column name
    :type long_col_name: str
    :param lat_col_name: latitude column name
    :type lat_col_name: str
    :param source_epsg_code: the csr code of the input gps coordinates
    :type source_epsg_code: str
    :param target_epsg_code: the csr code of the output gps coordinates
    :type target_epsg_code: str
    :param target_geo_col_name: 
    :type target_geo_col_name: 
    :return: 
    :rtype: 
    """
    # create a temp view of the source df, the table name is the name of the data frame
    source_df.createOrReplaceTempView(f"{source_df}")
    target_df = sedona.sql(f"""
    SELECT 
    ST_Transform(ST_Point(CAST({lat_col_name} AS Decimal(24,20)), CAST({long_col_name} AS Decimal(24,20))), '{source_epsg_code}', '{target_epsg_code}') AS {target_geo_col_name} from {source_df}""")
    return target_df

def geo_table_convertor(source_table_name:str,lat_col_name:str, long_col_name:str,  source_epsg_code:str,target_epsg_code:str, target_geo_col_name:str="location"):
    """
    This function takes a spark temp view with gps coordinate column(string type), convert the string column to a geometry column. The returned dataframe can be stored as geoparquet.
    :param source_table_name: 
    :type source_table_name: 
    :param target_geo_col_name: 
    :type target_geo_col_name: 
    :param long_col_name: 
    :type long_col_name: 
    :param lat_col_name: 
    :type lat_col_name: 
    :param source_epsg_code: 
    :type source_epsg_code: 
    :param target_epsg_code: 
    :type target_epsg_code: 
    :return: 
    :rtype: 
    """
    # create a temp view of the source df, the table name is the name of the data frame
    target_df = sedona.sql(f"""
    SELECT 
    ST_Transform(ST_Point(CAST({lat_col_name} AS Decimal(24,20)), CAST({long_col_name} AS Decimal(24,20))), '{source_epsg_code}', '{target_epsg_code}') AS {target_geo_col_name} from {source_table_name}""")
    return target_df

In [28]:
# Set up the epsg code value, osm uses epsg:25832
source_epsg_code = "epsg:4326"
# eu centered epsg code, more information can be found https://epsg.io/25832
target_epsg_code = "epsg:4326"

hospital_geo_table_name = "hospital_geo"
hospital_geo_df = geo_table_convertor("hospital","latitude","longitude",source_epsg_code,target_epsg_code)
hospital_geo_df.cache()
hospital_geo_df.createOrReplaceTempView(hospital_geo_table_name)


print(f"converted hospital count:   {hospital_geo_df.count()}")
sedona.sql(f"select * from {hospital_geo_table_name}").show(5)


                                                                                

converted hospital count:   315
+--------------------+
|            location|
+--------------------+
|POINT (48.8228839...|
|POINT (48.8851863...|
|POINT (48.8080214...|
|POINT (48.8539673...|
|POINT (48.5929609...|
+--------------------+
only showing top 5 rows


In [29]:
clinic_geo_table_name = "clinic_geo"
clinic_geo_df = geo_table_convertor("clinic","latitude","longitude",source_epsg_code,target_epsg_code)
clinic_geo_df.cache()
clinic_geo_df.createOrReplaceTempView(clinic_geo_table_name)

print(f"converted clinic count:   {clinic_geo_df.count()}")
sedona.sql(f"select * from {clinic_geo_table_name}").show(5)



                                                                                

converted clinic count:   219
+--------------------+
|            location|
+--------------------+
|POINT (48.9032601...|
|POINT (48.8047184...|
|POINT (48.8845476...|
|POINT (48.9164788...|
|POINT (48.8363503...|
+--------------------+
only showing top 5 rows


In [30]:
doctors_geo_table_name = "doctors_geo"
doctors_geo_df = geo_table_convertor("doctors","latitude","longitude",source_epsg_code,target_epsg_code)
doctors_geo_df.cache()
doctors_geo_df.createOrReplaceTempView(doctors_geo_table_name)

print(f"converted doctors count:   {doctors_geo_df.count()}")
sedona.sql(f"select * from {doctors_geo_table_name}").show(5)

                                                                                

converted doctors count:   1294
+--------------------+
|            location|
+--------------------+
|POINT (49.1018122...|
|POINT (48.6924604...|
|POINT (48.8234467...|
|POINT (48.5700807...|
|POINT (48.5678690...|
+--------------------+
only showing top 5 rows


In [40]:
# the below sql query use where to filter all doctors and hospitals which have distance less than 100, then create 3 column, 

doctors_near_hospital_df = sedona.sql(f"""
SELECT
ST_AsGeoJSON(
   ST_Transform(doctors_geo.location,     '{target_epsg_code}', 'epsg:4326')
) doctors_point, 
ST_AsGeoJSON(
   ST_Transform(hospital_geo.location, '{target_epsg_code}', 'epsg:4326')
) hospital_point, 
ST_DistanceSphere(
  doctors_geo.location, hospital_geo.location
) distance_meter
FROM doctors_geo, hospital_geo 
WHERE 
ST_DistanceSphere(doctors_geo.location, hospital_geo.location) <= 100
""").cache()

doctors_near_df = doctors_near_hospital_df.select("doctors_point").distinct()
doctors_near_df.cache()
hospital_near_df = doctors_near_hospital_df.select("hospital_point").distinct()
hospital_near_df.cache()

DataFrame[hospital_point: string]

In [41]:
doctors_near_hospital_df.show()

                                                                                

+--------------------+--------------------+------------------+
|       doctors_point|      hospital_point|    distance_meter|
+--------------------+--------------------+------------------+
|{"type":"Point","...|{"type":"Point","...| 82.32033706113094|
|{"type":"Point","...|{"type":"Point","...| 53.00228892135059|
|{"type":"Point","...|{"type":"Point","...| 71.37191949917427|
|{"type":"Point","...|{"type":"Point","...| 30.11324175582708|
|{"type":"Point","...|{"type":"Point","...| 51.86854940196954|
|{"type":"Point","...|{"type":"Point","...| 40.82116425744196|
|{"type":"Point","...|{"type":"Point","...| 42.30815658687326|
|{"type":"Point","...|{"type":"Point","...|52.627921550669726|
|{"type":"Point","...|{"type":"Point","...|54.564668537257354|
|{"type":"Point","...|{"type":"Point","...| 91.45132268963458|
|{"type":"Point","...|{"type":"Point","...| 69.06425407009839|
|{"type":"Point","...|{"type":"Point","...| 89.08856109179862|
|{"type":"Point","...|{"type":"Point","...| 25.69032804

                                                                                

In [42]:
print(f"doctors number: {doctors_near_df.count()}")
print(f"hospital number: {hospital_near_df.count()}")

                                                                                

doctors number: 26




hospital number: 21


                                                                                

In [43]:
doctors_near_pos = tuple([Marker(location=tuple(json.loads(row["doctors_point"])["coordinates"]), icon=icon_doctors) for row in doctors_near_df.collect()])
hospital_near_pos  = tuple([Marker(location=tuple(json.loads(row["hospital_point"])["coordinates"]), icon=icon_hospital) for row in hospital_near_df.collect()])

doctors_near_marker = MarkerCluster(markers=doctors_near_pos)
hospital_near_marker = MarkerCluster(markers=hospital_near_pos)

latitudes =  np.array([x.location[0] for x in doctors_near_pos]+[x.location[0] for x in hospital_near_pos])
longitudes = np.array([x.location[1] for x in doctors_near_pos]+[x.location[1] for x in hospital_near_pos])

ce = [latitudes.mean(), longitudes.mean()]


doc_hospital_near_map = Map(
    basemap=basemap_to_tiles(basemaps.OpenStreetMap.Mapnik),
    center=ce,
    layout=Layout(width='50%', height='800px'),
    zoom=7
)

doc_hospital_near_map.add_layer(doctors_near_marker)
doc_hospital_near_map.add_layer(hospital_near_marker)

display(doc_hospital_near_map)

                                                                                

Map(center=[48.870568063829765, 2.3099355595744666], controls=(ZoomControl(options=['position', 'zoom_in_text'…