d
# Spark Tables

This notebook shows how to use Spark Catalog Interface API to query databases, tables, and columns.

A full list of documented methods is available [here](https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.Catalog)

In [0]:
us_flights_file = "/databricks-datasets/learning-spark-v2/flights/departuredelays.csv"

### Create Managed Tables

In [0]:
# Create database and managed tables
spark.sql("DROP DATABASE IF EXISTS learn_spark_db CASCADE") 
spark.sql("CREATE DATABASE learn_spark_db")
spark.sql("USE learn_spark_db")
spark.sql("CREATE TABLE us_delay_flights_tbl(date STRING, delay INT, distance INT, origin STRING, destination STRING)")

### Display the databases

In [0]:
display(spark.catalog.listDatabases())

## Read our US Flights table

In [0]:
df = (spark.read.format("csv")
      .schema("date STRING, delay INT, distance INT, origin STRING, destination STRING")
      .option("header", "true")
      .option("path", "/databricks-datasets/learning-spark-v2/flights/departuredelays.csv")
      .load())

## Save into our table

In [0]:
df.write.mode("overwrite").saveAsTable("us_delay_flights_tbl")

## Cache the Table

In [0]:
%sql
CACHE TABLE us_delay_flights_tbl

Check if the table is cached

In [0]:
spark.catalog.isCached("us_delay_flights_tbl")

### Display tables within a Database

Note that the table is MANGED by Spark

In [0]:
spark.catalog.listTables(dbName="learn_spark_db")

### Display Columns for a table

In [0]:
spark.catalog.listColumns("us_delay_flights_tbl")

### Create Unmanaged Tables

In [0]:
# Drop the database and create unmanaged tables
spark.sql("DROP DATABASE IF EXISTS learn_spark_db CASCADE")
spark.sql("CREATE DATABASE learn_spark_db")
spark.sql("USE learn_spark_db")
spark.sql("CREATE TABLE us_delay_flights_tbl (date STRING, delay INT, distance INT, origin STRING, destination STRING) USING csv OPTIONS (path '/databricks-datasets/learning-spark-v2/flights/departuredelays.csv')")

### Display Tables

**Note**: The table type here that tableType='EXTERNAL', which indicates it's unmanaged by Spark, whereas above the tableType='MANAGED'

In [0]:
spark.catalog.listTables(dbName="learn_spark_db")

### Display Columns for a table

In [0]:
spark.catalog.listColumns("us_delay_flights_tbl")