## Spark Tables

Este notebook muestra como usar la API del catálogo de Spark para consultar bases de datos, tablas y columnas

In [0]:
%scala
val file = "/databricks-datasets/learning-spark-v2/flights/departuredelays.csv"

#### Creación de tablas gestionadas

In [0]:
%scala
spark.sql("DROP DATABASE IF EXISTS learn_spark_db CASCADE")
spark.sql("CREATE DATABASE learn_spark_db")
spark.sql("USE learn_spark_db")
spark.sql("CREATE TABLE us_delay_flights_tbl (date STRING, delay INT, distance INT, origin STRING, destination STRING)")

#### Mostrar las bases de datos

In [0]:
%scala
display(spark.catalog.listDatabases())

name,description,locationUri
default,Default Hive database,dbfs:/user/hive/warehouse
learn_spark_db,,dbfs:/user/hive/warehouse/learn_spark_db.db


#### Leer nuestra tabla _US Flights_

In [0]:
%scala
val df = spark.read.format("csv")
    .schema("`date` STRING, `delay` INT, `distance` INT, `origin` STRING, `destination` STRING")
    .option("header", "true")
    .option("path", "/databricks-datasets/learning-spark-v2/flights/departuredelays.csv").load()

#### Guardar en nuestra tabla

In [0]:
%scala
df.write.mode("overwrite").saveAsTable("us_delay_flights_tbl")

#### Cacheamos la tabla

In [0]:
%sql
CACHE TABLE us_delay_flights_tbl

Verificamos que la tabla esté cacheada

In [0]:
%scala
spark.catalog.isCached("us_delay_flights_tbl")

#### Visualizar tablas dentro de una base de datos

Dicha tabla será gestionada por Spark

In [0]:
%scala
display(spark.catalog.listTables(dbName="learn_spark_db"))

name,database,description,tableType,isTemporary
us_delay_flights_tbl,learn_spark_db,,MANAGED,False


#### Mostrar columnas de una tabla

In [0]:
%scala
display(spark.catalog.listColumns("us_delay_flights_tbl"))

name,description,dataType,nullable,isPartition,isBucket
date,,string,True,False,False
delay,,int,True,False,False
distance,,int,True,False,False
origin,,string,True,False,False
destination,,string,True,False,False
