## Databases in Databricks

* In Databricks, a *database* corresponds to a *schema* in the Hive metastore.
* Creating a database defines a logical structure for organizing tables, views, and functions.
* You can use `CREATE DATABASE` or `CREATE SCHEMA`—both are equivalent.
* The Hive metastore stores metadata about databases, tables, and partitions, including definitions, formats, and storage locations.

### Default Database
* Every Databricks workspace includes a central Hive metastore accessible by all clusters.
* By default, a `default` database exists in the metastore.
* Tables created without specifying a database are stored in this default schema.
* Data files are placed in the default Hive directory:
  `/user/hive/warehouse`

### Tables in Databricks

* There are two types of tables: **managed tables** and **external tables**.
* Understanding their differences is key to managing your data effectively.

### Managed Tables
* Managed tables are the default in Databricks.
* Both the metadata and data are owned by the metastore (Hive or Unity Catalog).
* Data is stored inside the database directory.
* Dropping a managed table deletes both the metadata and the underlying data files.
* This simplifies lifecycle management but requires caution—data deletion is permanent.

In [0]:
USE CATALOG hive_metastore;

DROP TABLE IF EXISTS country_managed;
CREATE TABLE country_managed (
  country_name STRING,
  iso_code STRING,
  currency STRING
);

INSERT INTO country_managed
VALUES ('France', 'FR', 'EUR');

- Executing the “DESCRIBE EXTENDED” command on our table provides advanced metadata information.
- The location, which shows that our table resides in the default Hive metastore under `dbfs:/user/hive/warehouse`

In [0]:
DESCRIBE EXTENDED country_managed

### External Tables

* External tables store only metadata in the metastore.
* Data files remain outside the database directory in an external location.
* You define the file location with the `LOCATION` keyword:<br>
  `CREATE TABLE table_name`<br>
  `LOCATION '<path>'`<br>
* Dropping an external table only removes metadata; the data files are not deleted.
* Useful for working with data stored outside DBFS, such as S3 or Azure storage.

In [0]:
DROP TABLE IF EXISTS country_external;
CREATE TABLE country_external (
  country_name STRING,
  iso_code STRING,
  currency STRING
)
LOCATION 'dbfs:/table/country_external';

INSERT INTO country_external
VALUES ('France', 'FR', 'EUR');

In [0]:
DESCRIBE EXTENDED country_external

In [0]:
SHOW TABLES

In [0]:
SELECT * FROM country_external

In [0]:
%fs ls dbfs:/table/country_external

### Dropping Tables
* You can remove tables using the `DROP TABLE` command.
* Behavior differs for managed and external tables.


**Dropping a Managed Table**

* Deletes metadata from the Hive metastore, including schema and table definitions.
* Also removes all associated data files from storage.

In [0]:
DROP TABLE country_managed

- Verifying the directory shows the files are gone: `FileNotFoundException` confirms deletion.


In [0]:
%fs ls 'dbfs:/user/hive/warehouse/country_managed'

**Dropping an External Table**
* Removes metadata from the metastore but **does not** delete the underlying data files.
* Data files remain in the external location.

In [0]:
DROP TABLE country_external

You can confirm their presence by listing the directory:

In [0]:
SHOW TABLES

In [0]:
%fs ls 'dbfs:/table/country_external'

In [0]:
SELECT * FROM DELTA.`dbfs:/table/country_external`

In [0]:
%python
dbutils.fs.rm('dbfs:/table/country_external', True)

### Creating a new Databases
* You can create additional databases beyond the default using `CREATE DATABASE` or `CREATE SCHEMA`.
* These databases are stored in the Hive metastore.
* Their folders are saved under `/user/hive/warehouse` and have a `.db` extension.

In [0]:
DROP SCHEMA IF EXISTS new_db CASCADE;
CREATE SCHEMA new_db;

In [0]:
DESCRIBE DATABASE EXTENDED new_db

**Creating tables in the new database**

In [0]:
USE DATABASE new_db;

CREATE TABLE region_info_managed (
  region_name STRING,
  region_code STRING,
  timezone STRING
);

INSERT INTO region_info_managed
VALUES ('Western Europe', 'WE', 'CET');

-- ---------------------------------

CREATE TABLE region_info_ext (
  region_name STRING,
  region_code STRING,
  timezone STRING
)
LOCATION 'dbfs:/table/region_info_ext';

INSERT INTO region_info_ext
VALUES ('Western Europe', 'WE', 'CET');

In [0]:
DESCRIBE EXTENDED region_info_managed;

In [0]:
DESCRIBE EXTENDED region_info_ext;

**Drop Table**

In [0]:
DROP TABLE region_info_managed;
DROP TABLE region_info_ext;

In [0]:
%fs ls 'dbfs:/user/hive/warehouse/new_db.db/region_info_managed'

In [0]:
%fs ls 'dbfs:/table/region_info_ext'

### Custom-Location Databases

* Databases can also be created in custom locations using the `LOCATION` keyword:<br>
  `CREATE SCHEMA my_db LOCATION 'path'`<br>
* Metadata stays in the Hive metastore, but the database folder is in the specified path.
* Tables inside these databases store their data within this custom directory.

In [0]:
DROP SCHEMA IF EXISTS custom CASCADE;
CREATE SCHEMA custom
LOCATION 'dbfs:/schemas/custom.db'

In [0]:
DESCRIBE DATABASE EXTENDED custom

**Creating tables**

In [0]:
USE DATABASE custom;
CREATE TABLE language_info_managed (
  language_name STRING,
  iso_language_code STRING,
  is_official BOOLEAN
);

INSERT INTO language_info_managed
VALUES ('French', 'fr', true);

-- ----------------------------------
CREATE TABLE language_info_ext (
  language_name STRING,
  iso_language_code STRING,
  is_official BOOLEAN
)
LOCATION 'dbfs:/table/language_info_ext';

INSERT INTO language_info_ext
VALUES ('French', 'fr', true);

In [0]:
DESCRIBE EXTENDED language_info_managed

In [0]:
DESCRIBE EXTENDED language_info_ext

**Dropping tables**

In [0]:
SELECT * FROM language_info_ext

In [0]:
DROP TABLE language_info_managed;
DROP TABLE language_info_ext;

### Clean up

In [0]:
%sh

rm -r /dbfs/table