
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning">
</div>


# SQL UDFs


## Learning Objectives
By the end of this lesson, you should be able to:
* Define and register SQL UDFs
* Describe the security model used for sharing SQL UDFs
* Use **`CASE`** / **`WHEN`** statements in SQL code
* Leverage **`CASE`** / **`WHEN`** statements in SQL UDFs for custom control flow

## REQUIRED - SELECT CLASSIC COMPUTE

Before executing cells in this notebook, please select your classic compute cluster in the lab. Be aware that **Serverless** is enabled by default.

Follow these steps to select the classic compute cluster:

1. Navigate to the top-right of this notebook and click the drop-down menu to select your cluster. By default, the notebook will use **Serverless**.

1. If your cluster is available, select it and continue to the next cell. If the cluster is not shown:

  - In the drop-down, select **More**.

  - In the **Attach to an existing compute resource** pop-up, select the first drop-down. You will see a unique cluster name in that drop-down. Please select that cluster.

**NOTE:** If your cluster has terminated, you might need to restart it in order to select it. To do this:

1. Right-click on **Compute** in the left navigation pane and select *Open in new tab*.

1. Find the triangle icon to the right of your compute cluster name and click it.

1. Wait a few minutes for the cluster to start.

1. Once the cluster is running, complete the steps above to select your cluster.

## Classroom Setup

Run the following cell to configure your working environment for this course. It will also set your default catalog to **dbacademy** and the schema to your specific schema name shown below using the `USE` statements.
<br></br>


```
USE CATALOG dbacademy;
USE SCHEMA dbacademy.<your unique schema name>;
```

**NOTE:** The `DA` object is only used in Databricks Academy courses and is not available outside of these courses. It will dynamically reference the information needed to run the course.

In [0]:
%run ./Includes/Classroom-Setup-6

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


0,1
Course Catalog:,
Your Schema:,


## User-Defined Functions

User Defined Functions (UDFs) in Spark SQL allow you to register custom SQL logic as functions in a database, making these methods reusable anywhere SQL can be run on Databricks. These functions are registered natively in SQL and maintain all of the optimizations of Spark when applying custom logic to large datasets.

At minimum, creating a SQL UDF requires a function name, optional parameters, the type to be returned, and some custom logic.

Below, a simple function named **`sale_announcement`** takes an **`item_name`** and **`item_price`** as parameters. It returns a string that announces a sale for an item at 80% of its original price.

In [0]:
CREATE OR REPLACE FUNCTION sale_announcement(item_name STRING, item_price INT)
  RETURNS STRING
  RETURN concat("The ", item_name, " is on sale for $", round(item_price * 0.8, 0));


SELECT *, 
  sale_announcement(name, price) AS message 
FROM item_lookup;

item_id,name,price,message
M_PREM_Q,Premium Queen Mattress,1795.0,The Premium Queen Mattress is on sale for $1436
M_STAN_F,Standard Full Mattress,945.0,The Standard Full Mattress is on sale for $756
M_PREM_F,Premium Full Mattress,1695.0,The Premium Full Mattress is on sale for $1356
M_PREM_T,Premium Twin Mattress,1095.0,The Premium Twin Mattress is on sale for $876
M_PREM_K,Premium King Mattress,1995.0,The Premium King Mattress is on sale for $1596
P_DOWN_S,Standard Down Pillow,119.0,The Standard Down Pillow is on sale for $95
M_STAN_Q,Standard Queen Mattress,1045.0,The Standard Queen Mattress is on sale for $836
M_STAN_K,Standard King Mattress,1195.0,The Standard King Mattress is on sale for $956
M_STAN_T,Standard Twin Mattress,595.0,The Standard Twin Mattress is on sale for $476
P_FOAM_S,Standard Foam Pillow,59.0,The Standard Foam Pillow is on sale for $47


Note that this function is applied to all values of the column in a parallel fashion within the Spark processing engine. SQL UDFs are an efficient way to define custom logic that is optimized for execution on Databricks.

## Scoping and Permissions of SQL UDFs
SQL user-defined functions:
- Persist between execution environments (which can include notebooks, DBSQL queries, and jobs).
- Exist as objects in the metastore and are governed by the same Table ACLs as databases, tables, or views.
- To **create** a SQL UDF, you need **`USE CATALOG`** on the catalog, and **`USE SCHEMA`** and **`CREATE FUNCTION`** on the schema.
- To **use** a SQL UDF, you need **`USE CATALOG`** on the catalog, **`USE SCHEMA`** on the schema, and **`EXECUTE`** on the function.

We can use **`DESCRIBE FUNCTION`** to see where a function was registered and basic information about expected inputs and what is returned (and even more information with **`DESCRIBE FUNCTION EXTENDED`**).

In [0]:
DESCRIBE FUNCTION EXTENDED sale_announcement;

function_desc
Function: dbacademy.labuser9051024_1738251370.sale_announcement
Type: SCALAR
Input: item_name STRING
item_price INT
Returns: STRING
Deterministic: true
Data Access: CONTAINS SQL
Configs: spark.databricks.sql.expression.aiFunctions.repartition=0
spark.databricks.sql.functions.aiForecast.enabled=false
spark.databricks.sql.functions.aiFunctions.adaptiveThreadPool.clusterSizeBasedGlobalParallelism.scaleFactor=32.0


Note that the **`Body`** field at the bottom of the function description shows the SQL logic used in the function itself.
## Viewing Functions in the Catalog Explorer
After we create a function, it is associated with a schema. We can view the functions associated with a schema in the Catalog Explorer. 
1. Follow [**this link**](explore/data) to open Catalog Explorer in a new tab, or use the **Catalog** link in the left sidebar.
1. Run the following cell to obtain the name of your current catalog and schema. Paste your catalog name in the cell marked "Type to filter."
1. Drill into the catalog to reveal the list of schemas in the catalog, by clicking the disclosure triangle to the left of the catalog name.
1. Drill open the schema.
1. Note that there is currently one function associated with the schema: **`sale_announcement`**. Select it and explore information about the function we created above.

In [0]:
SELECT current_catalog(), current_schema();

current_catalog(),current_schema()
dbacademy,labuser9051024_1738251370


## Simple Control Flow Functions

Combining SQL UDFs with control flow in the form of **`CASE`** / **`WHEN`** clauses provides optimized execution for control flows within SQL workloads. The standard SQL syntactic construct **`CASE`** / **`WHEN`** allows the evaluation of multiple conditional statements with alternative outcomes based on table contents.

Here, we demonstrate wrapping this control flow logic in a function that will be reusable anywhere we can execute SQL.

In [0]:
CREATE OR REPLACE FUNCTION item_preference(name STRING, price INT)
  RETURNS STRING
  RETURN CASE 
    WHEN name = "Standard Queen Mattress" THEN "This is my default mattress"
    WHEN name = "Premium Queen Mattress" THEN "This is my favorite mattress"
    WHEN price > 100 THEN concat("I'd wait until the ", name, " is on sale for $", round(price * 0.8, 0))
    ELSE concat("I don't need a ", name)
  END;


SELECT *, 
  item_preference(name, price) 
FROM item_lookup;

item_id,name,price,"dbacademy.labuser9051024_1738251370.item_preference(name, price)"
M_PREM_Q,Premium Queen Mattress,1795.0,This is my favorite mattress
M_STAN_F,Standard Full Mattress,945.0,I'd wait until the Standard Full Mattress is on sale for $756
M_PREM_F,Premium Full Mattress,1695.0,I'd wait until the Premium Full Mattress is on sale for $1356
M_PREM_T,Premium Twin Mattress,1095.0,I'd wait until the Premium Twin Mattress is on sale for $876
M_PREM_K,Premium King Mattress,1995.0,I'd wait until the Premium King Mattress is on sale for $1596
P_DOWN_S,Standard Down Pillow,119.0,I'd wait until the Standard Down Pillow is on sale for $95
M_STAN_Q,Standard Queen Mattress,1045.0,This is my default mattress
M_STAN_K,Standard King Mattress,1195.0,I'd wait until the Standard King Mattress is on sale for $956
M_STAN_T,Standard Twin Mattress,595.0,I'd wait until the Standard Twin Mattress is on sale for $476
P_FOAM_S,Standard Foam Pillow,59.0,I don't need a Standard Foam Pillow


While the examples provided here are simple, these same basic principles can be used to add custom computations and logic for native execution in Spark SQL. 

Especially for enterprises that might be migrating users from systems with many defined procedures or custom-defined formulas, SQL UDFs can allow a handful of users to define the complex logic needed for common reporting and analytic queries.


&copy; 2025 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the 
<a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/><a href="https://databricks.com/privacy-policy">Privacy Policy</a> | 
<a href="https://databricks.com/terms-of-use">Terms of Use</a> | 
<a href="https://help.databricks.com/">Support</a>