-sandbox

<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning" style="width: 600px">
</div>

# Databases and Tables on Databricks
In this demonstration, you will create and explore databases and tables.

## Learning Objectives
By the end of this lesson, you will be able to:
* Use Spark SQL DDL to define databases and tables
* Describe how the `LOCATION` keyword impacts the default storage directory



**Resources**
* [Databases and Tables - Databricks Docs](https://docs.databricks.com/user-guide/tables.html)
* [Managed and Unmanaged Tables](https://docs.databricks.com/user-guide/tables.html#managed-and-unmanaged-tables)
* [Creating a Table with the UI](https://docs.databricks.com/user-guide/tables.html#create-a-table-using-the-ui)
* [Create a Local Table](https://docs.databricks.com/user-guide/tables.html#create-a-local-table)
* [Saving to Persistent Tables](https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html#saving-to-persistent-tables)

## Lesson Setup
The following two cells are for setting up the classroom.  
  
They simply install a python library that is used to generate variables, configure a temporary directory, and import a dataset we will use later in the lesson.

In [0]:
%python
import sys, subprocess, os
subprocess.check_call([sys.executable, "-m", "pip", "install", "git+https://github.com/databricks-academy/user-setup"])

from dbacademy import LessonConfig
LessonConfig.configure(course_name="Databases Tables and Views on Databricks", use_db=False)
LessonConfig.install_datasets(silent=True)

## Important Note
In order to keep from conflicting with other users and to ensure the code below runs correctly, there are places in the code that use widgets to store and use variables.  
  
You should not have to change these in order to make the code work correctly.  
  
This next cell simply configures those widgets.  
  
You should run the following cell, but don't be too concerned about what's going on. If you want to learn more about widgets, you can [read the docs](https://docs.databricks.com/notebooks/widgets.html).

In [0]:
%python 
dbutils.widgets.text("username", LessonConfig.clean_username)
dbutils.widgets.text("working_directory", LessonConfig.working_dir)

## Databases
Let's start by creating two databases:
- One with no LOCATION specified
- One with LOCATION specified

You may be wondering about the strange way the databases are named.  
  
Because you may be working in a shared workspace, this course uses variables derived from your username so the databases don't conflict with other users.  
  
You can see the values being used in the boxes displayed after the query is executed.

In [0]:
%sql
CREATE DATABASE IF NOT EXISTS ${username}_database_with_default_location;
CREATE DATABASE IF NOT EXISTS ${username}_database_with_custom_location LOCATION '${working_directory}';

Note that the location of the first database is in the metastore.

In [0]:
%sql
DESCRIBE DATABASE EXTENDED ${username}_database_with_default_location;

database_description_item,database_description_value
Database Name,jaime_vera_palomino_gmail_com_database_with_default_location
Comment,
Location,dbfs:/user/hive/warehouse/jaime_vera_palomino_gmail_com_database_with_default_location.db
Owner,root
Properties,


Note that the location of the second database is in the directory specified after the `LOCATION` keyword.

In [0]:
%sql
DESCRIBE DATABASE EXTENDED ${username}_database_with_custom_location;

database_description_item,database_description_value
Database Name,jaime_vera_palomino_gmail_com_database_with_custom_location
Comment,
Location,dbfs:/user/jaime_vera_palomino_gmail_com/dbacademy/databases_tables_and_views_on_databricks
Owner,root
Properties,


We will create a table in the database with default location and insert data. Note that the schema must be provided because there are no data from which to infer the schema.

In [0]:
%sql
USE ${username}_database_with_default_location;
CREATE OR REPLACE TABLE managed_table_in_database_with_default_location (width INT, length INT, height INT);
INSERT INTO managed_table_in_database_with_default_location VALUES (3, 2, 1);
SELECT * FROM managed_table_in_database_with_default_location;

width,length,height
3,2,1


We can look at the extended table description to find the location (you'll need to scroll down in the results).

In [0]:
%sql
DESCRIBE EXTENDED managed_table_in_database_with_default_location;

col_name,data_type,comment
width,int,
length,int,
height,int,
,,
# Partitioning,,
Not partitioned,,
,,
# Detailed Table Information,,
Name,jaime_vera_palomino_gmail_com_database_with_default_location.managed_table_in_database_with_default_location,
Location,dbfs:/user/hive/warehouse/jaime_vera_palomino_gmail_com_database_with_default_location.db/managed_table_in_database_with_default_location,


By default, managed tables in a database without the location specified will be created in the `dbfs:/user/hive/warehouse/<database_name>.db/` directory.

We can see that, as expected, the data and metadata for our Delta Table are stored in that location.

In [0]:
%python 
display(dbutils.fs.ls(f"dbfs:/user/hive/warehouse/{dbutils.widgets.get('username')}_database_with_default_location.db/managed_table_in_database_with_default_location"))

[0;31m---------------------------------------------------------------------------[0m
[0;31mExecutionError[0m                            Traceback (most recent call last)
[0;32m<command-813784095896220>[0m in [0;36m<module>[0;34m[0m
[0;32m----> 1[0;31m [0mdisplay[0m[0;34m([0m[0mdbutils[0m[0;34m.[0m[0mfs[0m[0;34m.[0m[0mls[0m[0;34m([0m[0;34mf"dbfs:/user/hive/warehouse/{dbutils.widgets.get('username')}_database_with_default_location.db/managed_table_in_database_with_default_location"[0m[0;34m)[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m
[0;32m/databricks/python_shell/dbruntime/dbutils.py[0m in [0;36mf_with_exception_handling[0;34m(*args, **kwargs)[0m
[1;32m    379[0m                     [0mexc[0m[0;34m.[0m[0m__context__[0m [0;34m=[0m [0;32mNone[0m[0;34m[0m[0;34m[0m[0m
[1;32m    380[0m                     [0mexc[0m[0;34m.[0m[0m__cause__[0m [0;34m=[0m [0;32mNone[0m[0;34m[0m[0;34m[0m[0m
[0;32m--> 381[0;31m              

Drop the table.

In [0]:
%sql
DROP TABLE managed_table_in_database_with_default_location;

Note the table's folder and its log and data file are deleted.

In [0]:
%python 
dbutils.fs.ls(f"dbfs:/user/hive/warehouse/{dbutils.widgets.get('username')}_database_with_default_location.db")

Out[6]: []

We now create a table in  the database with custom location and insert data. Note that the schema must be provided because there are no data from which to infer the schema.

In [0]:
%sql
USE ${username}_database_with_custom_location;
CREATE OR REPLACE TABLE managed_table_in_database_with_custom_location (width INT, length INT, height INT);
INSERT INTO managed_table_in_database_with_custom_location VALUES (3, 2, 1);
SELECT * FROM managed_table_in_database_with_custom_location;

width,length,height
3,2,1


Again, we'll look at the description to find the table location.

In [0]:
%sql
DESCRIBE EXTENDED managed_table_in_database_with_custom_location;

col_name,data_type,comment
width,int,
length,int,
height,int,
,,
# Partitioning,,
Not partitioned,,
,,
# Detailed Table Information,,
Name,jaime_vera_palomino_gmail_com_database_with_custom_location.managed_table_in_database_with_custom_location,
Location,dbfs:/user/jaime_vera_palomino_gmail_com/dbacademy/databases_tables_and_views_on_databricks/managed_table_in_database_with_custom_location,


As expected, this managed table is created in the path specified with the `LOCATION` keyword during database creation. As such, the data and metadata for the table are persisted in a directory here.

In [0]:
%python 
display(dbutils.fs.ls(f"{dbutils.widgets.get('working_directory')}/managed_table_in_database_with_custom_location"))

path,name,size
dbfs:/user/jaime_vera_palomino_gmail_com/dbacademy/databases_tables_and_views_on_databricks/managed_table_in_database_with_custom_location/_delta_log/,_delta_log/,0
dbfs:/user/jaime_vera_palomino_gmail_com/dbacademy/databases_tables_and_views_on_databricks/managed_table_in_database_with_custom_location/part-00000-cdc6e46e-edec-4c13-a2fc-2c0527e8ef19-c000.snappy.parquet,part-00000-cdc6e46e-edec-4c13-a2fc-2c0527e8ef19-c000.snappy.parquet,953


Let's drop the table.

In [0]:
%sql
DROP TABLE managed_table_in_database_with_custom_location;

Note the table's folder and the log file and data file are deleted.  
  
Only the "datasets" folder remains.

In [0]:
%python 
dbutils.fs.ls(dbutils.widgets.get('working_directory'))

Out[8]: [FileInfo(path='dbfs:/user/jaime_vera_palomino_gmail_com/dbacademy/databases_tables_and_views_on_databricks/datasets/', name='datasets/', size=0)]

## Tables
We will create an external (unmanaged) table from sample data. The data we are going to use are in csv format. We want to create a Delta table with a LOCATION provided in the directory of our choice.

In [0]:
%sql
USE ${username}_database_with_default_location;

-- mode "FAILFAST" will abort file parsing with a RuntimeException if any malformed lines are encountered
CREATE OR REPLACE TEMPORARY VIEW temp_delays USING CSV OPTIONS (
  path '${working_directory}/datasets/flights/departuredelays.csv',
  header "true",
  mode "FAILFAST"
);
CREATE OR REPLACE TABLE external_table LOCATION '${working_directory}/external_table' AS
  SELECT * FROM temp_delays;

SELECT * FROM external_table;

date,delay,distance,origin,destination
1011245,6,602,ABE,ATL
1020600,-8,369,ABE,DTW
1021245,-2,602,ABE,ATL
1020605,-4,602,ABE,ATL
1031245,-4,602,ABE,ATL
1030605,0,602,ABE,ATL
1041243,10,602,ABE,ATL
1040605,28,602,ABE,ATL
1051245,88,602,ABE,ATL
1050605,9,602,ABE,ATL


Let's note the location of the table's data in this lesson's working directory.

In [0]:
%sql
DESCRIBE TABLE EXTENDED external_table;

col_name,data_type,comment
date,string,
delay,string,
distance,string,
origin,string,
destination,string,
,,
# Partitioning,,
Not partitioned,,
,,
# Detailed Table Information,,


Now, we drop the table.

In [0]:
%sql
DROP TABLE external_table;

The table definition no longer exists in the metastore, but the underlying data remain intact.

In [0]:
%python 
display(dbutils.fs.ls(LessonConfig.working_dir + "/external_table"))

path,name,size
dbfs:/user/jaime_vera_palomino_gmail_com/dbacademy/databases_tables_and_views_on_databricks/external_table/_delta_log/,_delta_log/,0
dbfs:/user/jaime_vera_palomino_gmail_com/dbacademy/databases_tables_and_views_on_databricks/external_table/part-00000-459db80d-d202-4a5c-a9ad-aa39a19036f8-c000.snappy.parquet,part-00000-459db80d-d202-4a5c-a9ad-aa39a19036f8-c000.snappy.parquet,918778
dbfs:/user/jaime_vera_palomino_gmail_com/dbacademy/databases_tables_and_views_on_databricks/external_table/part-00001-b71f5830-bfd4-429f-9f90-7861bce5bceb-c000.snappy.parquet,part-00001-b71f5830-bfd4-429f-9f90-7861bce5bceb-c000.snappy.parquet,958643
dbfs:/user/jaime_vera_palomino_gmail_com/dbacademy/databases_tables_and_views_on_databricks/external_table/part-00002-aae4ae03-3410-47fe-a74f-63a11e15531a-c000.snappy.parquet,part-00002-aae4ae03-3410-47fe-a74f-63a11e15531a-c000.snappy.parquet,970111
dbfs:/user/jaime_vera_palomino_gmail_com/dbacademy/databases_tables_and_views_on_databricks/external_table/part-00003-a7d9d92e-61a6-45f3-8d29-0e96875a7487-c000.snappy.parquet,part-00003-a7d9d92e-61a6-45f3-8d29-0e96875a7487-c000.snappy.parquet,959353
dbfs:/user/jaime_vera_palomino_gmail_com/dbacademy/databases_tables_and_views_on_databricks/external_table/part-00004-75e498a9-f9dd-404a-88e4-17f85c09acfd-c000.snappy.parquet,part-00004-75e498a9-f9dd-404a-88e4-17f85c09acfd-c000.snappy.parquet,1006765
dbfs:/user/jaime_vera_palomino_gmail_com/dbacademy/databases_tables_and_views_on_databricks/external_table/part-00005-ae576b8d-a574-47d4-8710-f3c4bf67eed6-c000.snappy.parquet,part-00005-ae576b8d-a574-47d4-8710-f3c4bf67eed6-c000.snappy.parquet,895819
dbfs:/user/jaime_vera_palomino_gmail_com/dbacademy/databases_tables_and_views_on_databricks/external_table/part-00006-3112ff0f-e4ce-47fb-a687-d6081b697837-c000.snappy.parquet,part-00006-3112ff0f-e4ce-47fb-a687-d6081b697837-c000.snappy.parquet,882220
dbfs:/user/jaime_vera_palomino_gmail_com/dbacademy/databases_tables_and_views_on_databricks/external_table/part-00007-533baafd-7290-4d79-9cba-71e3331fc63e-c000.snappy.parquet,part-00007-533baafd-7290-4d79-9cba-71e3331fc63e-c000.snappy.parquet,119069


## Clean up
Drop both databases.

In [0]:
%sql
DROP DATABASE ${username}_database_with_default_location CASCADE;
DROP DATABASE ${username}_database_with_custom_location CASCADE;

Delete the working directory and its contents.

In [0]:
%python 
dbutils.fs.rm(LessonConfig.working_dir, True)

Out[11]: False

-sandbox
&copy; 2021 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the <a href="http://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/>
<a href="https://databricks.com/privacy-policy">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use">Terms of Use</a> | <a href="http://help.databricks.com/">Support</a>