## Show tables with Scala and Spark



In [3]:
spark.catalog.listTables.show(false)

+--------------------------------+--------+-----------+---------+-----------+
|name                            |database|description|tableType|isTemporary|
+--------------------------------+--------+-----------+---------+-----------+
|btl_departures_arrivals_airports|default |null       |EXTERNAL |false      |
|btl_distances                   |default |null       |EXTERNAL |false      |
|int_airports                    |default |null       |EXTERNAL |false      |
+--------------------------------+--------+-----------+---------+-----------+



In [4]:
spark.table("default.btl_distances").show

+-------------------+-----------------+--------------------+------------------+------------------+-----------------+----------------+-----------------+------------------+---------------------+
|estdepartureairport|estarrivalairport|            arr_name|  arr_latitude_deg| arr_longitude_deg|         dep_name|dep_latitude_deg|dep_longitude_deg|          distance|could_be_done_by_rail|
+-------------------+-----------------+--------------------+------------------+------------------+-----------------+----------------+-----------------+------------------+---------------------+
|               LSZB|             LEPA|Palma De Mallorca...|      39.551700592|     2.73881006241|Bern Belp Airport|    46.914100647|7.497149944309999|  904.446224553409|                false|
|               LSZB|             LKER|    Erpužice Airport|49.802799224853516|13.038100242614746|Bern Belp Airport|    46.914100647|7.497149944309999| 520.1315237975622|                false|
|               LSZB|             L

## Use SQL

<div>To display result you need to click on the table symbol after "Out: DataFrame", then execute the new inspection cell...</div><div>For me this is a little bit troublesome in the current Polynote version. I like more using Spark DataFrame's show statement.</div>

In [11]:
select * from default.btl_distances where estarrivalairport = 'LEPA'

[estdepartureairport: string, estarrivalairport: string ... 8 more fields]

In [12]:
{"type":"table","value":"Out","rowRange":[0,0]}

estdepartureairport: string?,estarrivalairport: string?,arr_name: string?,arr_latitude_deg: string?,arr_longitude_deg: string?,dep_name: string?,dep_latitude_deg: string?,dep_longitude_deg: string?,distance: float8?,could_be_done_by_rail: boolean?
LSZB,LEPA,Palma De Mallorca Airport,39.551700592,2.73881006241,Bern Belp Airport,46.914100647,7.497149944309999,904.446224553409,False


In [13]:
spark.sql("select * from default.btl_distances where estarrivalairport = 'LEPA'").show

+-------------------+-----------------+--------------------+----------------+-----------------+-----------------+----------------+-----------------+----------------+---------------------+
|estdepartureairport|estarrivalairport|            arr_name|arr_latitude_deg|arr_longitude_deg|         dep_name|dep_latitude_deg|dep_longitude_deg|        distance|could_be_done_by_rail|
+-------------------+-----------------+--------------------+----------------+-----------------+-----------------+----------------+-----------------+----------------+---------------------+
|               LSZB|             LEPA|Palma De Mallorca...|    39.551700592|    2.73881006241|Bern Belp Airport|    46.914100647|7.497149944309999|904.446224553409|                false|
+-------------------+-----------------+--------------------+----------------+-----------------+-----------------+----------------+-----------------+----------------+---------------------+



## Select data by using DataObjects configured in SmartDataLake



In [5]:
// import smartdatalake
import io.smartdatalake.config.SdlConfigObject.stringToDataObjectId
import io.smartdatalake.config.ConfigToolbox
import io.smartdatalake.workflow.dataobject._
import io.smartdatalake.workflow.ActionPipelineContext
import io.smartdatalake.workflow.action.SDLExecutionId
import io.smartdatalake.app.SmartDataLakeBuilderConfig
import io.smartdatalake.workflow.ExecutionPhase
implicit val ss = spark // make Spark session available implicitly

In [6]:
// read config from mounted directory
val (registry, globalConfig) = ConfigToolbox.loadAndParseConfig(Seq("/mnt/config"))
// Create the context used by SDL objects
implicit val context = ActionPipelineContext("test", "app", SDLExecutionId.executionId1, registry, appConfig = SmartDataLakeBuilderConfig("test", Some("app")), phase = ExecutionPhase.Init)

In [7]:
// get a dataobject
val dataIntAirports = registry.get[DeltaLakeTableDataObject]("int-airports")

In [8]:
dataIntAirports.getDataFrame()
.where($"ident"==="LEPA")
.show

+-----+--------------------+------------+-------------+
|ident|                name|latitude_deg|longitude_deg|
+-----+--------------------+------------+-------------+
| LEPA|Palma De Mallorca...|39.551700592|2.73881006241|
+-----+--------------------+------------+-------------+

