## Display locations of dense [AIS broadcast reports](https://marinecadastre.gov/ais/) at the port of Miami, FL.

Package the application using:

```bash
mvn clean install
```

Start PySpark:

```bash
export PATH=${SPARK_HOME}/bin:${PATH}
export SPARK_LOCAL_IP=localhost
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='lab --ip=0.0.0.0 --allow-root --no-browser --NotebookApp.token=""'
export PACKAGES="com.esri:filegdb:0.12.5"
pyspark\
  --master local[*]\
  --num-executors 1\
  --driver-memory 30G\
  --executor-memory 30G\
  --conf spark.ui.enabled=false\
  --packages ${PACKAGES}\
  --exclude-packages org.scala-lang:scala-reflect
```

Get a reference to the underlying JVM

In [1]:
jvm = spark._jvm

Get a reference to the `FileGDB` Scala object

In [2]:
gdb = jvm.com.esri.gdb.FileGDB

Print the list of tables in the geo database:

In [3]:
gdb_path = os.path.join('data','Miami.gdb')
tables = gdb.listTables(gdb_path)
for t in tables:
    print(t)

NameIndex(MiamiExtent,9)
NameIndex(Voyage,10)
NameIndex(Broadcast,11)
NameIndex(Vessel,12)
NameIndex(BaseStations,13)
NameIndex(AttributeUnits,14)
NameIndex(Extent,15)


Print the schema of `Broadcast` (note how `x` and `y` are subfields to `Shape`)

In [4]:
schema = gdb.schema(gdb_path,'Broadcast')
print(schema.treeString())

root
 |-- OBJECTID: integer (nullable = false)
 |-- Shape: struct (nullable = true)
 |    |-- x: double (nullable = true)
 |    |-- y: double (nullable = true)
 |-- SOG: integer (nullable = true)
 |-- COG: integer (nullable = true)
 |-- Heading: integer (nullable = true)
 |-- ROT: integer (nullable = true)
 |-- BaseDateTime: timestamp (nullable = true)
 |-- Status: integer (nullable = true)
 |-- VoyageID: integer (nullable = true)
 |-- MMSI: integer (nullable = true)
 |-- ReceiverType: string (nullable = true)
 |-- ReceiverID: string (nullable = true)



Create a Spark Dataframe and register it as a table for upcoming SQL operations

In [5]:
df = spark.read \
    .format("com.esri.gdb") \
    .options(path=gdb_path, name="Broadcast") \
    .load()

In [6]:
df.registerTempTable("Broadcast")

Create temp table `QR` by mapping a `Broadcast` event to a cell location.  The cell size is 0.001 degrees.

In [7]:
cell_1 = 0.001
cell_2 = cell_1 * 0.5
sql(f"select cast(floor(Shape.x/{cell_1}) as INT) as q,cast(floor(Shape.y/{cell_1}) as INT) as r from Broadcast")\
.registerTempTable("QR")

Aggregate by cell and report back only the cells with more that 100 `Broadcast` events.
Map each cell q/r to an x/y geo location and convert the Spark dataframe into a Pandas dataframe.

In [8]:
sql(f"select q*{cell_1}+{cell_2} as x,r*{cell_1}+{cell_2} as y,count(1) as pop from QR group by q,r having pop > 100").show()

+--------+-------+-----+
|       x|      y|  pop|
+--------+-------+-----+
|-80.2505|25.8005|15780|
|-80.1535|25.7655|  296|
|-80.1405|25.7565| 2091|
|-80.1155|25.7575|  159|
|-80.1215|25.7585|  119|
|-80.1685|25.7765| 1439|
|-80.1835|25.7795|  381|
|-80.1785|25.7725|  105|
|-80.0925|25.7775|  112|
|-80.1795|25.6945| 5439|
|-80.0905|25.7955|  609|
|-80.2165|25.7835|  280|
|-80.0925|25.7985|  144|
|-80.2165|25.7825| 3065|
|-80.1005|25.7745| 3236|
|-80.1755|25.7805|  150|
|-80.1815|25.7845|  155|
|-80.1395|25.7655|  182|
|-80.1565|25.7665|  916|
|-80.2445|25.7955| 5609|
+--------+-------+-----+
only showing top 20 rows



Create a `dark-gray` map centered around Miami, FL.

In [None]:
# gis = GIS()
# m = gis.map('Miami, FL')
# m.basemap = 'dark-gray'
# m

Add to the map a layer whose content is imported from a Pandas dataframe.

In [None]:
# m.add_layer(gis.content.import_data(pdf))