## Show tables with Scala and Spark



In [1]:
spark.catalog.listTables.show(false)

+--------------------------------+--------+-----------+---------+-----------+
|name                            |database|description|tableType|isTemporary|
+--------------------------------+--------+-----------+---------+-----------+
|btl_departures_arrivals_airports|default |null       |EXTERNAL |false      |
|btl_distances                   |default |null       |EXTERNAL |false      |
|int_airports                    |default |null       |EXTERNAL |false      |
|int_departures                  |default |null       |EXTERNAL |false      |
+--------------------------------+--------+-----------+---------+-----------+



## Use SQL

<div>To display result you need to click on the table symbol after "Out: DataFrame", then execute the new inspection cell...</div><div>For me this is a little bit troublesome in the current Polynote version. I like more using Spark DataFrame's show statement.</div>

In [3]:
select * from default.btl_distances where estarrivalairport = 'LEPA'

[estdepartureairport: string, estarrivalairport: string ... 8 more fields]

In [4]:
spark.sql("select * from default.btl_distances where estarrivalairport = 'LEPA'").show

+-------------------+-----------------+--------------------+----------------+-----------------+-----------------+----------------+-----------------+----------------+---------------------+
|estdepartureairport|estarrivalairport|            arr_name|arr_latitude_deg|arr_longitude_deg|         dep_name|dep_latitude_deg|dep_longitude_deg|        distance|could_be_done_by_rail|
+-------------------+-----------------+--------------------+----------------+-----------------+-----------------+----------------+-----------------+----------------+---------------------+
|               LSZB|             LEPA|Palma De Mallorca...|    39.551700592|    2.73881006241|Bern Belp Airport|    46.914100647|7.497149944309999|904.446224553409|                false|
+-------------------+-----------------+--------------------+----------------+-----------------+-----------------+----------------+-----------------+----------------+---------------------+



In [5]:
spark.table("default.int_departures").where($"estarrivalairport"==="LEPA").show

+-----------------------------+--------+-------------------------------+-----------------+------------------------------+-----------------------------+-------------------+--------------------------------+-------------------------------+----------+------+----------+--------+--------------------+
|arrivalairportcandidatescount|callsign|departureairportcandidatescount|estarrivalairport|estarrivalairporthorizdistance|estarrivalairportvertdistance|estdepartureairport|estdepartureairporthorizdistance|estdepartureairportvertdistance| firstseen|icao24|  lastseen|      dt|      dl_ts_captured|
+-----------------------------+--------+-------------------------------+-----------------+------------------------------+-----------------------------+-------------------+--------------------------------+-------------------------------+----------+------+----------+--------+--------------------+
|                            1|OAW24H  |                              0|             LEPA|                      

## Select data by using DataObjects configured in SmartDataLake



In [7]:
// import smartdatalake
import io.smartdatalake.config.SdlConfigObject.stringToDataObjectId
import io.smartdatalake.config.ConfigToolbox
import io.smartdatalake.workflow.dataobject._
import io.smartdatalake.workflow.ActionPipelineContext
import io.smartdatalake.workflow.action.SDLExecutionId
import io.smartdatalake.app.SmartDataLakeBuilderConfig
import io.smartdatalake.workflow.ExecutionPhase
implicit val ss = spark // make Spark session available implicitly

In [8]:
// read config from mounted directory
val (registry, globalConfig) = ConfigToolbox.loadAndParseConfig(Seq("/mnt/config"), Some(this.getClass.getClassLoader))
// Create the context used by SDL objects
implicit val context = ConfigToolbox.getDefaultActionPipelineContext(spark, registry)

In [10]:
// get a dataobject
val dataIntAirports = registry.get[DeltaLakeTableDataObject]("int-airports")
val dataIntDepartures = registry.get[DeltaLakeTableDataObject]("int-departures")

In [11]:
dataIntDepartures.dropTable

In [12]:
dataIntDepartures.getSparkDataFrame().show

+-----------------------------+--------+-------------------------------+-----------------+------------------------------+-----------------------------+-------------------+--------------------------------+-------------------------------+----------+------+----------+--------+--------------------+
|arrivalairportcandidatescount|callsign|departureairportcandidatescount|estarrivalairport|estarrivalairporthorizdistance|estarrivalairportvertdistance|estdepartureairport|estdepartureairporthorizdistance|estdepartureairportvertdistance| firstseen|icao24|  lastseen|      dt|      dl_ts_captured|
+-----------------------------+--------+-------------------------------+-----------------+------------------------------+-----------------------------+-------------------+--------------------------------+-------------------------------+----------+------+----------+--------+--------------------+
|                            4|GES061C |                              0|             EDDF|                      