# Functions in Databricks Unity Catalog
## Scalar & Table Functions (SQL & Python)

In this session, we will explore how to create and manage functions within Databricks Unity Catalog.

### Agenda
1.  **Built-in Functions:** Using standard functions provided by Databricks SQL.
2.  **User Defined Functions (UDFs):**
    *   **Scalar Functions:** Return a single value.
    *   **Table Functions:** Return a set of rows (table).
3.  **Language Support:** Writing functions using **SQL** and **Python**.
4.  **Integration:** Using SQL UDFs inside PySpark code.

### Prerequisites
*   A Unity Catalog enabled workspace.
*   `CREATE FUNCTION` privileges on the target schema.

In [None]:
# Setup: Prepare the Data
# We will create the 'emp' table in a schema to perform our tests.
# Adjust catalog.schema names as per your environment.

catalog_name = "dev"
schema_name = "bronze"

spark.sql(f"CREATE SCHEMA IF NOT EXISTS {catalog_name}.{schema_name}")
spark.sql(f"USE CATALOG {catalog_name}")
spark.sql(f"USE SCHEMA {schema_name}")

# Create dummy employee data
spark.sql(f"""
    CREATE OR REPLACE TABLE {catalog_name}.{schema_name}.emp (
        emp_id STRING,
        emp_name STRING,
        dept_code STRING,
        salary DOUBLE
    )
""")

spark.sql(f"""
    INSERT INTO {catalog_name}.{schema_name}.emp VALUES
    ('1001', 'Subham', 'D101', 15000),
    ('1002', 'Ramesh', 'D101', 12000),
    ('1003', 'Rohan', 'D102', 10000),
    ('1004', 'Rakesh', 'D101', 18000),
    ('1005', 'Satish', 'D103', 10000)
""")

display(spark.sql(f"SELECT * FROM {catalog_name}.{schema_name}.emp"))

## 1. Built-in Functions
Databricks SQL comes with many built-in functions like `upper`, `lower`, `concat`, etc.
A common miscellaneous function is `current_user()`, which returns the email of the user running the query.

In [None]:
-- Built-in function example
SELECT current_user();

## 2. Scalar Functions (SQL)
A **Scalar Function** accepts zero or more inputs and returns a single value (scalar) as output.

### Use Case: Calculate Tax
We will create a function `tax_sql` that takes `salary` as input and returns the tax amount (assumed 30%).

**Syntax:**
```sql
CREATE OR REPLACE FUNCTION <function_name>(<param> <type>)
RETURNS <return_type>
LANGUAGE SQL
RETURN <expression>

In [None]:
-- Create a Scalar SQL Function to calculate Tax
CREATE OR REPLACE FUNCTION dev.bronze.tax_sql(sal DOUBLE)
RETURNS DOUBLE
LANGUAGE SQL
RETURN sal * 0.3;

In [None]:
-- Validate the function with a static value
SELECT dev.bronze.tax_sql(10000) as tax_amount;

In [None]:
-- Use the function on the table
SELECT 
    emp_id, 
    emp_name, 
    salary, 
    dev.bronze.tax_sql(salary) as tax 
FROM dev.bronze.emp;

### Using SQL UDF in PySpark
You can invoke Unity Catalog registered SQL functions inside PySpark code using the `expr` function.

In [None]:
from pyspark.sql.functions import expr

# Load the dataframe
df = spark.read.table("dev.bronze.emp")

# Use expr to call the SQL UDF
df_with_tax = df.withColumn("tax", expr("dev.bronze.tax_sql(salary)"))

display(df_with_tax)

## 3. Scalar Functions (Python)
You can also write functions using Python. This is powerful when you need libraries (like `requests`) that are not available in SQL.

### Use Case: Fetch API Data
We will create a function that takes a fruit name and fetches nutrition info from an external API.

*Note: Requires internet access on the cluster.*

In [None]:
-- Create a Scalar Python Function
CREATE OR REPLACE FUNCTION dev.bronze.fruit_nutrition(fruit_name STRING)
RETURNS STRING
LANGUAGE PYTHON
AS $$
    import requests
    try:
        # Calling public API
        api_url = f"https://www.fruityvice.com/api/fruit/{fruit_name}"
        response = requests.get(api_url)
        data = response.json()
        # Returning data as a string (JSON representation)
        return str(data.get("nutritions", "N/A"))
    except:
        return "Error fetching data"
$$

In [None]:
-- Validate Python Function
SELECT dev.bronze.fruit_nutrition('banana') as nutrition_info;

## 4. Table Functions
A **Table Function** (or Tabular Function) returns a set of rows (a table) instead of a single value.

### Use Case: Get Employees by Department
We want a function that takes a `dept_code` as input and returns all employees belonging to that department.

**Syntax:**
```sql
CREATE OR REPLACE FUNCTION <name>(<params>)
RETURNS TABLE (<col_name> <type>, ...)
LANGUAGE SQL
RETURN SELECT ...

In [None]:
-- Create a Table Function
CREATE OR REPLACE FUNCTION dev.bronze.get_emp(dept STRING)
RETURNS TABLE (emp_id STRING, emp_name STRING)
LANGUAGE SQL
RETURN SELECT emp_id, emp_name FROM dev.bronze.emp WHERE dept_code = dept;

In [None]:
-- Validate Table Function
-- Note: You query a table function using FROM, not SELECT directly.
SELECT * FROM dev.bronze.get_emp('D101');

### Summary
*   **Scalar Functions:** map 1 input row to 1 output value (e.g., Tax calculation).
*   **Table Functions:** map inputs to multiple output rows (e.g., Filtering employees).
*   **Unity Catalog** allows you to register these functions centrally and use them across SQL and PySpark.