## Spark Tables

Este notebook muestra como usar la API del catálogo de Spark para consultar bases de datos, tablas y columnas

In [0]:
file = "/databricks-datasets/learning-spark-v2/flights/departuredelays.csv"

#### Creación de tablas gestionadas

In [0]:
spark.sql("DROP DATABASE IF EXISTS learn_spark_db CASCADE") 
spark.sql("CREATE DATABASE learn_spark_db")
spark.sql("USE learn_spark_db")
spark.sql("CREATE TABLE us_delay_flights_tbl(date STRING, delay INT, distance INT, origin STRING, destination STRING)")

Out[2]: DataFrame[]

#### Mostrar las bases de datos

In [0]:
display(spark.catalog.listDatabases())

name,description,locationUri
default,Default Hive database,dbfs:/user/hive/warehouse
learn_spark_db,,dbfs:/user/hive/warehouse/learn_spark_db.db


#### Leer nuestra tabla _US Flights_

In [0]:
df = (spark.read.format("csv")
      .schema("date STRING, delay INT, distance INT, origin STRING, destination STRING")
      .option("header", "true")
      .option("path", "/databricks-datasets/learning-spark-v2/flights/departuredelays.csv")
      .load())

#### Guardar en nuestra tabla

In [0]:
df.write.mode("overwrite").saveAsTable("us_delay_flights_tbl")

#### Cacheamos la tabla

In [0]:
%sql
CACHE TABLE us_delay_flights_tbl

Verificamos que la tabla esté cacheada

In [0]:
spark.catalog.isCached("us_delay_flights_tbl")

Out[6]: True

#### Visualizar tablas dentro de una base de datos

Dicha tabla será gestionada por Spark

In [0]:
spark.catalog.listTables(dbName="learn_spark_db")

Out[7]: [Table(name='us_delay_flights_tbl', database='learn_spark_db', description=None, tableType='MANAGED', isTemporary=False)]

#### Mostrar columnas de una tabla

In [0]:
spark.catalog.listColumns("us_delay_flights_tbl")

Out[8]: [Column(name='date', description=None, dataType='string', nullable=True, isPartition=False, isBucket=False),
 Column(name='delay', description=None, dataType='int', nullable=True, isPartition=False, isBucket=False),
 Column(name='distance', description=None, dataType='int', nullable=True, isPartition=False, isBucket=False),
 Column(name='origin', description=None, dataType='string', nullable=True, isPartition=False, isBucket=False),
 Column(name='destination', description=None, dataType='string', nullable=True, isPartition=False, isBucket=False)]