
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning">
</div>


# Upgrading Tables to Unity Catalog

In this demo, you will learn essential techniques for upgrading tables to the Unity Catalog, a pivotal step in efficient data management. This demo will cover various aspects, including analyzing existing data structures, applying migration techniques, evaluating transformation options, and upgrading metadata without moving data. Both SQL commands and user interface (UI) tools will be utilized for seamless upgrades.

### Learning Objectives
By the end of this demo, you will be able to:
1. Analyze the current catalog, schema, and table structures in your data environment.
2. Execute methods to move data from Hive metastore to Unity Catalog, including cloning and Create Table As Select \(CTAS\).
3. Assess and apply necessary data transformations during the migration process.
4. Utilize methods to upgrade table metadata while keeping data in its original location.
5. Perform table upgrades using both SQL commands and user interface tools for efficient data management.

## Prerequisites
In order to follow along with this demo, you will need:
* Account administrator capabilities
* Cloud resources to support the metastore
* Have metastore admin capability in order to create and manage a catalog

## REQUIRED - SELECT CLASSIC COMPUTE
### ---SERVERLESS COMPUTE WILL NOT WORK WITH THE HIVE_METASTORE---

Before executing cells in this notebook, please select your classic compute cluster in the lab. Be aware that **Serverless** is enabled by default.

Follow these steps to select the classic compute cluster:

1. Navigate to the top-right of this notebook and click the drop-down menu to select your cluster. By default, the notebook will use **Serverless**.

1. If your cluster is available, select it and continue to the next cell. If the cluster is not shown:

  - In the drop-down, select **More**.

  - In the **Attach to an existing compute resource** pop-up, select the first drop-down. You will see a unique cluster name in that drop-down. Please select that cluster.

**NOTE:** If your cluster has terminated, you might need to restart it in order to select it. To do this:

1. Right-click on **Compute** in the left navigation pane and select *Open in new tab*.

1. Find the triangle icon to the right of your compute cluster name and click it.

1. Wait a few minutes for the cluster to start.

1. Once the cluster is running, complete the steps above to select your cluster.

## A. Classroom Setup

Run the following cell to configure your working environment for this course. It will also set your default catalog to your specific catalog and the schema to the schema name shown below using the `USE` statements.
<br></br>


```
USE CATALOG <your catalog>;
USE SCHEMA <your catalog>.<schema>;
```

**NOTE:** The `DA` object is only used in Databricks Academy courses and is not available outside of these courses. It will dynamically reference the information needed to run the course.

In [0]:
%run ./Includes/Classroom-Setup-3

### B. Analyze the List of Available Table and Views in the Custom Schema
1. Let us analyze the **example** schema within your catalog for the list of tables and views. This has already been set up for you using the setup script. Take note of the tables in your schema.


In [0]:
SELECT current_catalog(), current_schema();

In [0]:
-- Show the list of tables within the custom schema
SHOW TABLES FROM example;

2. Display a list of views in your **example** schema. Take note of the view(s) in your schema.


In [0]:
-- Show the list of views within the custom schema
SHOW VIEWS FROM example;

### B. Exploring the Hive Metastore Source Table

As part of the setup, we now have a table called *movies*, residing in a user-specific schema of the Hive metastore. To make things easier, the schema name in the hive_metastore stored in a variable named `user_hive_schema` that was created in the classroom setup script.

In [0]:
-- View the value of the user_hive_schema SQL variable
SELECT user_hive_schema;

1. Let's preview the data stored in this table using that variable. Notice how the three-level namespaces makes referencing data objects in the Hive metastore seamless.

    Here we will use the `IDENTIFIER()` clause which enables SQL injection safe parameterization of SQL statements and enables you to interprets a constant string as a:
    - table or view name
    - function name
    - column name
    - field name

    View the [documentation](https://docs.databricks.com/en/sql/language-manual/sql-ref-names-identifier-clause.html#identifier-clause) for more information.

In [0]:
--  Show the first 10 rows from the movies table residing in the user-specific schema of the Hive metastore

SELECT * 
FROM IDENTIFIER('hive_metastore.' || user_hive_schema || '.movies')
LIMIT 10

## C. Overview of Upgrade Methods

There are a few different ways to upgrade a table, but the method you choose will be driven primarily by how you want to treat the table data. If you wish to leave the table data in place, then the resulting upgraded table will be an external table. If you wish to move the table data into your Unity Catalog metastore, then the resulting table will be a managed table. Consult [this page](https://docs.databricks.com/en/data-governance/unity-catalog/index.html#managed-versus-external-tables-and-volumes) for tips on whether to choose a managed or external table.

### C1. Moving Table Data into the Unity Catalog Metastore

In this approach, table data will be copied from wherever it resides into the managed data storage area for the destination schema, catalog or metastore. The result will be a managed Delta table in your Unity Catalog metastore. 

This approach has two main advantages:
* Managed tables in Unity Catalog can benefit from product optimization features that may not work well (if at all) on tables that aren't managed
* Moving the data also gives you the opportunity to restructure your tables, in case you want to make any changes

The main disadvantage to this approach is, particularly for large datasets, the time and cost associated with copying the data.

In this section, we cover two different options that will move table data into the Unity Catalog metastore.

#### C1.1 Cloning a Table

Cloning a table is optimal when the source table is Delta (see <a href="https://docs.databricks.com/delta/clone.html" target="_blank">documentation</a> for a full explanation). It's simple to use, it will copy metadata, and it gives you the option of copying data (deep clone) or optionally leaving it in place (shallow clone). Shallow clones can be useful in some use cases.

1. Run the following cell to check the format of the source table. View the results. Notice the following:

- Referring to the *Provider* row, we see the source is a Delta table. 
- Referring to the *Location* row, we see that the table is stored in DBFS.

In [0]:
-- Describe the properties of the "movies" table in the user-specific schema of the Hive metastore using the extended option for more details.
-- DESCRIBE EXTENDED hive_metastore.yourschema.movies

DESCRIBE EXTENDED IDENTIFIER('hive_metastore.' || user_hive_schema || '.movies')

2. Let's perform a deep clone operation to copy the table from the hive metastore, creating a destination table named *movies_clone* in the **example** schema with your catalog.

In [0]:
%python
## Deep clone the "movies" table from the user-specific schema of the Hive metastore to create a new table named "movies_clone" in the user-specific catalog of the example schema.

results = spark.sql(f'''
CREATE OR REPLACE TABLE movies_clone 
DEEP CLONE hive_metastore.{DA.user_hive_schema}.movies
''')

display(results)

3. Let's manually view our **example** schema within our catalog.
    1. Select the catalog icon on the left. 

    1. Expand your unique catalog name.

    1. Expand the **example** schema.

    1. Expand **Tables**.

    1. Notice that the **movies** table from the hive metastore has been cloned into your schema as **movies_clone**.

#### C1.2 Create Table As Select (CTAS)

Using CTAS is a universally applicable technique that simply creates a new table based on the output of a **`SELECT`** statement. This will always copy the data, and no metadata will be copied.

1. Let's copy the table from the hive metastore using this approach, creating a destination table named *movies_ctas* in our catalog within the **example** schema.

In [0]:
-- Copy the "movies" table from the user-specific schema of the Hive metastore to create "movies_ctas" in the user-specific catalog's example schema using CTAS (Create Table As Select)

CREATE OR REPLACE TABLE movies_ctas AS 
SELECT * 
FROM IDENTIFIER('hive_metastore.' || user_hive_schema || '.movies');

2. Run the `SHOW TABLES` statement to view tables in your **example** schema. Notice that the **movies_ctas** table was created in your catalog from the **movies** table from the hive metastore.

In [0]:
SHOW TABLES IN example;

#### C1.3 Applying Transformations during the Upgrade

CTAS offers an option that other methods do not: the ability to transform the data while copying it.

When migrating your tables to Unity Catalog, it's a great time to consider your table structures and whether they still address your organization's business requirements that may have changed over time.

Cloning, and the CTAS operation we just saw, takes an exact copy of the source table. But CTAS can be easily adapted to perform any transformations during the upgrade.

For example, you could modify the table when migrating it from the hive metastore to Unity Catalog.

In [0]:
-- Copy the "movies" table from Hive metastore to create "movies_transformed" in the user-specific catalog using CTAS with the required transformations
CREATE OR REPLACE TABLE movies_transformed AS 
SELECT
  id AS Movie_ID,
  title AS Movie_Title,
  genres AS Genres,
  upper(original_language) AS Original_Language,
  vote_average AS Vote_Average
FROM IDENTIFIER('hive_metastore.' || user_hive_schema || '.movies');

In [0]:
-- Display the contents of the "movies_transformed" table from the user-specific catalog of the example schema
SELECT * 
FROM movies_transformed;


### C2 Upgrade External Tables in Hive Metastore to External Tables in Unity Catalog

**NOTE: This lab environment does not have access to external tables. This is an example of what you can do in your environment.**

We have seen approaches that involve moving table data from wherever it is currently to the Unity Catalog metastore. However, in upgrading external tables, some use cases may call for leaving the data in place. For example:
* Data location is dictated by an internal or regulatory requirement of some sort
* Cannot change the data format to Delta
* Outside writers must be able to modify the data
* Avoiding time and/or cost of moving large datasets

Note the following constraints for this approach:

* Source table must be an external table
* There must be a storage credential referencing the storage container where the source table data resides

In this section, we cover two different options that will upgrade to an external table without moving any table data.

#### C2.1 Using SYNC to Export Hive External Tables to Unity Catalog

The **`SYNC`** SQL command allows us to upgrade **external tables** in Hive Metastore to **external tables** in Unity Catalog.

For more information on the [SYNC statement](https://docs.databricks.com/en/sql/language-manual/sql-ref-syntax-aux-sync.html#sync) view the documentation.

**NOTE:** This lab workspace does not enable you to create external tables.

#### C2.2 Using Catalog Explorer to Upgrade Tables to Unity Catalog from the Hive Metastore

Let's try upgrading the table using the Catalog Explorer user interface.

1. Select the catalog icon on the left.

1. Expand the **hive_metastore**.

1. Expand your schema name in the hive metastore.

1. Right click on your schema name and select **Open in Catalog Explorer**.

1. Select the **movies** table \(it can be any available table\).

1. Click **Upgrade**.

1. Select your destination catalog and schema. 

1. For **Select catalog** select your unique catalog name.

1. For **Select schema** select the **example** schema.

1. For this example, let's leave owner set to the default (your username).

1. Click **Next**.

From here you can run the upgrade, or open a notebook containing the upgrade operations that you can run interactively. For the purpose of the exercise, you don't need to actually run the upgrade since it uses `SYNC` behind the scenes.

## CleanUp
Lets quickly clean up the data in hive metastore by running below command.

In [0]:
%py
DA.cleanup_hive_metastore()

## Conclusion
In this demo, we explored crucial techniques for upgrading tables to the Unity Catalog, focusing on efficient data management. We learned to analyze existing data structures, apply migration techniques, evaluate transformation options, and upgrade metadata without moving data. Through SQL commands and user interface tools, we seamlessly executed upgrades, considering the treatment of table data as either external or managed within the Unity Catalog. With a thorough understanding of these methods, you are now equipped to optimize your data management processes effectively.


&copy; 2025 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the 
<a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/><a href="https://databricks.com/privacy-policy">Privacy Policy</a> | 
<a href="https://databricks.com/terms-of-use">Terms of Use</a> | 
<a href="https://help.databricks.com/">Support</a>