# Column Masking with UDFs on Materialized Views

## Overview
This notebook demonstrates how to implement **column-level security** on materialized views using masking UDFs (User-Defined Functions). This approach provides an alternative to row-level security, which is not supported on materialized views.

## Use Case
While row-level security is not available for materialized views, you can use column masking to:
* Hide sensitive data from specific users
* Show NULL or masked values based on user identity
* Avoid creating redundant copies of data just for security

## Solution
We'll create a masking UDF that checks `current_user()` and returns NULL for restricted users, then apply this UDF directly in the materialized view definition.

## Step 1: Create a Masking UDF

The UDF checks the current user and returns:
* **Actual value** for `jwneil17@gmail.com`
* **NULL** for all other users (including `john.neil@databricks.com`)

In [0]:
%sql
-- Create a masking UDF that masks datetime for specific user
CREATE OR REPLACE FUNCTION mask_dropoff_datetime(dropoff_time TIMESTAMP)
RETURNS TIMESTAMP
RETURN CASE 
  WHEN current_user() = 'jwneil17@gmail.com' THEN dropoff_time
  ELSE NULL
END;

## Step 2: Create Materialized View with Column Masking

The materialized view applies the masking UDF using the `MASK` clause in the column definition. This ensures the masking logic is enforced at the view level.

In [0]:
%sql
CREATE MATERIALIZED VIEW `main`.`default`.`nyctaxi_trips_masked`
  (

    tpep_pickup_datetime TIMESTAMP, tpep_dropoff_datetime TIMESTAMP MASK
    `john_neil`.`default`.`mask_dropoff_datetime`, trip_distance DOUBLE, fare_amount DOUBLE,
    pickup_zip INT, dropoff_zip INT
  ) AS
SELECT
  tpep_pickup_datetime,
  mask_dropoff_datetime(tpep_dropoff_datetime) as tpep_dropoff_datetime,
  trip_distance,
  fare_amount,
  pickup_zip,
  dropoff_zip
FROM
  samples.nyctaxi.trips

## Step 3: Grant Permissions

Grant the necessary permissions to users:
* `USAGE` on catalog and schema (required to access objects)
* `SELECT` on the materialized view (to query the data)

The masking UDF will automatically apply based on who queries the view.

In [0]:
%sql
GRANT USAGE ON CATALOG main TO `john.neil@databricks.com`;
GRANT USAGE ON SCHEMA main.default TO `john.neil@databricks.com`;

In [0]:
%sql
-- Grant read access to jwneil17@gmail.com
GRANT SELECT ON MATERIALIZED VIEW main.default.nyctaxi_trips_masked TO `john.neil@databricks.com`;

## Step 4: Test the Materialized View

Query the materialized view to verify the masking works correctly.

In [0]:
-- Test the materialized view
SELECT * FROM main.default.nyctaxi_trips_masked LIMIT 10;

## Results

**When queried by `jwneil17@gmail.com`:**
* All columns visible including actual `tpep_dropoff_datetime` values

**When queried by `john.neil@databricks.com`:**
* The `tpep_dropoff_datetime` column shows NULL (masked)
* All other columns remain !

---

## Key Takeaways

* **Column masking with UDFs** provides an alternative to row-level security for materialized views
* **No data duplication** required - security is enforced at the view level
* **User-based logic** can be implemented using `current_user()` in the UDF
* **For Delta Live Tables/Lakeflow Pipelines**: Use regular Delta tables instead of materialized views if you need full row-level security support