
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img
    src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png"
    alt="Databricks Learning"
  >
</div>

# 3.2 DEMO: Fine Grained Access Control using Dynamic Views and Partition Filtering \[Recipient]

## Overview
This demo demonstrates how recipients experience different views of the same data based on dynamic access controls. You'll see how:

- Row-level security filters data based on recipient identity
- Column-level masking protects sensitive information
- Partition filtering restricts regional or classification access
- Data transformations provide appropriate aggregation levels

**Provider Notebook:** Created source data with comprehensive security controls, dynamic views, and recipient-specific filtering.

**Recipient Notebook (This Notebook):** Access shared data as different recipient types and observe how the same underlying dataset appears differently based on access controls.

### Learning Objectives
By the end of this demo, you will be able to:
1. Mount and query shared data with dynamic access controls
2. Understand how recipient identity affects data visibility
3. Observe data masking and anonymization in action
4. Compare different access levels across recipient types
5. Analyze the impact of security controls on analytics workflows
6. Implement recipient-side data governance practices

## Background

**Scenario:**
You represent different types of organizations receiving customer data from SecureBank Corp:

1. **Regional Partner** (NA/EU/APAC): Analytics partner focused on specific geographic markets
2. **Marketing Agency**: Demographic insights without sensitive financial data
3. **Risk Analytics Firm**: Financial patterns with anonymized customer information
4. **Compliance Auditor**: Full access for regulatory oversight

Each persona will mount different shares and observe how the data appears with varying levels of detail and protection.

## Setup

Run the setup script to configure the demo environment.

In [None]:
%run ./Includes/Demo-Setup-3.2

## Step 1: Regional Partner Access (North America)

As a North America regional partner, you'll see customer data filtered to your geographic region with full PII and financial access.

In [None]:
-- Create catalog for regional partner access
CREATE CATALOG IF NOT EXISTS na_partner_analytics
COMMENT 'North America regional partner analytics catalog';

USE CATALOG na_partner_analytics;

CREATE SCHEMA IF NOT EXISTS customer_insights
COMMENT 'Customer insights for North America region';

USE SCHEMA customer_insights;

In [None]:
-- Mount the regional customer share
CREATE CATALOG IF NOT EXISTS na_regional_customers
USING SHARE secure_bank_provider.regional_customer_data
COMMENT 'Regional customer data shared by SecureBank';

-- Verify the mounted share
SHOW SCHEMAS IN na_regional_customers;

In [None]:
-- Explore regional customer data
-- Note: This would show only North America customers when accessed as 'na_partner'
SELECT * 
FROM na_regional_customers.customer_data.regional_customers_view
LIMIT 10;

**Observation:** Notice that you can see:
- Full customer names
- Email addresses
- Phone numbers
- Account balances
- Credit scores
- But ONLY for North America region

In [None]:
-- Regional customer analytics
SELECT 
    region,
    country,
    COUNT(*) as customer_count,
    ROUND(AVG(account_balance), 2) as avg_balance,
    ROUND(AVG(credit_score), 2) as avg_credit_score,
    COUNT(CASE WHEN is_vip THEN 1 END) as vip_customers
FROM na_regional_customers.customer_data.regional_customers_view
GROUP BY region, country
ORDER BY customer_count DESC;

In [None]:
-- Transaction analysis for regional customers
SELECT 
    DATE(transaction_date) as transaction_date,
    COUNT(*) as transaction_count,
    ROUND(SUM(amount), 2) as total_amount,
    ROUND(AVG(amount), 2) as avg_amount,
    COUNT(DISTINCT customer_id) as unique_customers
FROM na_regional_customers.customer_data.regional_transactions_view
WHERE transaction_date >= CURRENT_DATE - INTERVAL 30 DAYS
GROUP BY DATE(transaction_date)
ORDER BY transaction_date DESC
LIMIT 10;

## Step 2: Marketing Agency Access

As a marketing agency, you'll see demographics and behavior patterns without sensitive PII or financial data.

In [None]:
-- Create catalog for marketing agency access
CREATE CATALOG IF NOT EXISTS marketing_insights
USING SHARE secure_bank_provider.marketing_safe_data
COMMENT 'Marketing-safe customer demographics';

-- Explore marketing data structure
DESCRIBE EXTENDED marketing_insights.customer_data.marketing_customers_view;

In [None]:
-- View marketing-safe customer data
SELECT * 
FROM marketing_insights.customer_data.marketing_customers_view
LIMIT 10;

**Observation:** Notice the data transformations:
- No SSN or phone numbers visible
- Email addresses are masked (e.g., j***@example.com)
- Exact dates of birth replaced with age groups (18-25, 26-35, etc.)
- Account balances replaced with tiers (Bronze, Silver, Gold, Platinum)
- Credit scores not visible
- Customer IDs are hashed for anonymity

In [None]:
-- Marketing demographic analysis
SELECT 
    region,
    age_group,
    account_tier,
    COUNT(*) as customer_count,
    COUNT(CASE WHEN is_vip THEN 1 END) as vip_customers,
    ROUND(COUNT(CASE WHEN is_vip THEN 1 END) * 100.0 / COUNT(*), 2) as vip_percentage
FROM marketing_insights.customer_data.marketing_customers_view
GROUP BY region, age_group, account_tier
ORDER BY region, age_group, customer_count DESC;

In [None]:
-- Campaign effectiveness analysis
SELECT 
    region,
    account_tier,
    COUNT(*) as total_customers,
    COUNT(CASE WHEN last_activity >= CURRENT_DATE - INTERVAL 30 DAYS THEN 1 END) as active_last_30_days,
    ROUND(COUNT(CASE WHEN last_activity >= CURRENT_DATE - INTERVAL 30 DAYS THEN 1 END) * 100.0 / COUNT(*), 2) as activity_rate
FROM marketing_insights.customer_data.marketing_customers_view
GROUP BY region, account_tier
ORDER BY activity_rate DESC;

## Step 3: Risk Analytics Firm Access

As a risk analytics firm, you'll see financial patterns and credit information but with customer identities anonymized.

In [None]:
-- Mount risk analytics share
CREATE CATALOG IF NOT EXISTS risk_analytics_data
USING SHARE secure_bank_provider.risk_analytics_share
COMMENT 'Risk analytics with anonymized customer data';

-- Explore the risk data structure
SHOW TABLES IN risk_analytics_data.financial_data;

In [None]:
-- View anonymized customer financial data
SELECT * 
FROM risk_analytics_data.financial_data.risk_customers_view
LIMIT 10;

**Observation:** Risk analytics view shows:
- Anonymized customer IDs (hashed)
- No PII (names, emails, phone numbers, SSN)
- Full financial data (account balances, credit scores)
- Transaction patterns
- Risk indicators
- Geographic region (for regional risk analysis)

In [None]:
-- Credit risk segmentation
SELECT 
    CASE 
        WHEN credit_score >= 750 THEN 'Excellent (750+)'
        WHEN credit_score >= 700 THEN 'Good (700-749)'
        WHEN credit_score >= 650 THEN 'Fair (650-699)'
        WHEN credit_score >= 600 THEN 'Poor (600-649)'
        ELSE 'Very Poor (<600)'
    END as credit_category,
    COUNT(*) as customer_count,
    ROUND(AVG(account_balance), 2) as avg_balance,
    ROUND(MIN(account_balance), 2) as min_balance,
    ROUND(MAX(account_balance), 2) as max_balance,
    ROUND(STDDEV(account_balance), 2) as balance_stddev
FROM risk_analytics_data.financial_data.risk_customers_view
GROUP BY credit_category
ORDER BY MIN(credit_score) DESC;

In [None]:
-- Transaction pattern analysis
SELECT 
    anonymized_customer_id,
    COUNT(*) as transaction_count,
    ROUND(SUM(amount), 2) as total_amount,
    ROUND(AVG(amount), 2) as avg_transaction,
    ROUND(MAX(amount), 2) as max_transaction,
    DATEDIFF(MAX(transaction_date), MIN(transaction_date)) as days_active
FROM risk_analytics_data.financial_data.risk_transactions_view
GROUP BY anonymized_customer_id
ORDER BY total_amount DESC
LIMIT 20;

In [None]:
-- Risk scoring by region and balance tier
SELECT 
    region,
    CASE 
        WHEN account_balance >= 100000 THEN 'High Net Worth'
        WHEN account_balance >= 50000 THEN 'Upper Middle'
        WHEN account_balance >= 25000 THEN 'Middle'
        WHEN account_balance >= 10000 THEN 'Lower Middle'
        ELSE 'Entry Level'
    END as wealth_segment,
    COUNT(*) as customers,
    ROUND(AVG(credit_score), 2) as avg_credit_score,
    COUNT(CASE WHEN credit_score < 650 THEN 1 END) as at_risk_customers,
    ROUND(COUNT(CASE WHEN credit_score < 650 THEN 1 END) * 100.0 / COUNT(*), 2) as risk_percentage
FROM risk_analytics_data.financial_data.risk_customers_view
GROUP BY region, wealth_segment
ORDER BY region, MIN(account_balance) DESC;

## Step 4: Compliance Auditor Access

As a compliance auditor, you have the highest level of access for regulatory oversight purposes.

In [None]:
-- Mount compliance auditor share
CREATE CATALOG IF NOT EXISTS compliance_audit_data
USING SHARE secure_bank_provider.compliance_full_access
COMMENT 'Full access for compliance and regulatory audit';

-- Explore audit data structure
SHOW SCHEMAS IN compliance_audit_data;

In [None]:
-- View full customer data with all PII and financial information
SELECT * 
FROM compliance_audit_data.audit_data.full_customer_view
LIMIT 5;

**Observation:** Compliance view provides:
- Complete PII (names, SSN, DOB, contact info)
- Full financial data (balances, credit scores)
- All regions and countries
- Audit trails and timestamps
- Data quality indicators
- Regulatory compliance flags

In [None]:
-- Compliance audit: Check data coverage by region
SELECT 
    region,
    COUNT(*) as total_customers,
    COUNT(CASE WHEN email IS NOT NULL AND email != '' THEN 1 END) as with_email,
    COUNT(CASE WHEN phone IS NOT NULL AND phone != '' THEN 1 END) as with_phone,
    COUNT(CASE WHEN ssn IS NOT NULL AND ssn != '' THEN 1 END) as with_ssn,
    ROUND(COUNT(CASE WHEN email IS NOT NULL THEN 1 END) * 100.0 / COUNT(*), 2) as email_coverage,
    ROUND(COUNT(CASE WHEN ssn IS NOT NULL THEN 1 END) * 100.0 / COUNT(*), 2) as ssn_coverage
FROM compliance_audit_data.audit_data.full_customer_view
GROUP BY region
ORDER BY total_customers DESC;

In [None]:
-- Compliance audit: Identify high-risk accounts
SELECT 
    customer_id,
    full_name,
    region,
    country,
    credit_score,
    account_balance,
    last_transaction_date,
    DATEDIFF(CURRENT_DATE, last_transaction_date) as days_inactive
FROM compliance_audit_data.audit_data.full_customer_view
WHERE 
    credit_score < 600
    OR account_balance < 0
    OR DATEDIFF(CURRENT_DATE, last_transaction_date) > 180
ORDER BY credit_score ASC, account_balance ASC
LIMIT 20;

In [None]:
-- Compliance audit: Large transaction monitoring
SELECT 
    t.transaction_id,
    c.full_name,
    c.ssn,
    t.amount,
    t.transaction_date,
    t.transaction_type,
    c.region,
    c.country
FROM compliance_audit_data.audit_data.full_transactions_view t
JOIN compliance_audit_data.audit_data.full_customer_view c
    ON t.customer_id = c.customer_id
WHERE t.amount > 10000
ORDER BY t.amount DESC, t.transaction_date DESC
LIMIT 20;

## Step 5: Comparison of Access Levels

Let's create a comparison matrix to understand what each recipient type can see.

### Access Control Matrix

| Data Element | Regional Partner | Marketing Agency | Risk Analytics | Compliance Auditor |
|--------------|-----------------|------------------|----------------|--------------------|
| Customer ID | ✅ Original | 🔒 Hashed | 🔒 Hashed | ✅ Original |
| Full Name | ✅ Visible | ❌ Hidden | ❌ Hidden | ✅ Visible |
| Email | ✅ Visible | 🔒 Masked | ❌ Hidden | ✅ Visible |
| Phone | ✅ Visible | ❌ Hidden | ❌ Hidden | ✅ Visible |
| SSN | ❌ Hidden | ❌ Hidden | ❌ Hidden | ✅ Visible |
| Date of Birth | ✅ Visible | 🔒 Age Group | ❌ Hidden | ✅ Visible |
| Region | ✅ NA Only | ✅ All | ✅ All | ✅ All |
| Account Balance | ✅ Exact | 🔒 Tier | ✅ Exact | ✅ Exact |
| Credit Score | ✅ Visible | ❌ Hidden | ✅ Visible | ✅ Visible |
| VIP Status | ✅ Visible | ✅ Visible | ✅ Visible | ✅ Visible |
| Transactions | ✅ Full | 🔒 Aggregated | ✅ Full | ✅ Full |

**Legend:**
- ✅ Full access
- 🔒 Transformed/Masked
- ❌ No access

## Step 6: Understanding Dynamic View Implementation

Let's explore how these access controls are implemented using dynamic views.

In [None]:
-- Example of what a marketing-safe view might look like (conceptual)
-- This would be created by the provider

/*
CREATE VIEW marketing_customers_view AS
SELECT 
    SHA2(customer_id, 256) as anonymized_id,
    CONCAT(LEFT(first_name, 1), '***') as masked_name,
    CONCAT(LEFT(email, 1), '***@', SPLIT(email, '@')[1]) as masked_email,
    CASE 
        WHEN YEAR(CURRENT_DATE) - YEAR(date_of_birth) < 25 THEN '18-24'
        WHEN YEAR(CURRENT_DATE) - YEAR(date_of_birth) < 35 THEN '25-34'
        WHEN YEAR(CURRENT_DATE) - YEAR(date_of_birth) < 45 THEN '35-44'
        WHEN YEAR(CURRENT_DATE) - YEAR(date_of_birth) < 55 THEN '45-54'
        ELSE '55+'
    END as age_group,
    region,
    country,
    CASE 
        WHEN account_balance >= 100000 THEN 'Platinum'
        WHEN account_balance >= 50000 THEN 'Gold'
        WHEN account_balance >= 25000 THEN 'Silver'
        ELSE 'Bronze'
    END as account_tier,
    is_vip,
    last_activity
FROM customers
*/

## Step 7: Partition Filtering in Action

Partition filtering restricts data access at the storage level for efficiency.

In [None]:
-- Regional partner can only query their region
-- Attempting to query other regions returns no data
SELECT region, COUNT(*) as count
FROM na_regional_customers.customer_data.regional_customers_view
GROUP BY region;

-- Result: Only 'North America' appears, even if you try to filter for 'Europe'

In [None]:
-- Verify partition filtering - this returns empty for non-NA regions
SELECT * 
FROM na_regional_customers.customer_data.regional_customers_view
WHERE region = 'Europe'
LIMIT 10;

-- Result: Empty result set, because partition filtering prevents access

**Key Insight:** Partition filtering happens at the storage layer, not at query time. The recipient literally cannot access data from other partitions, making it more secure than WHERE clause filtering.

## Step 8: Data Quality and Governance

Recipients should implement their own governance practices on shared data.

In [None]:
-- Create a local view that adds recipient-side business logic
CREATE OR REPLACE VIEW na_partner_analytics.customer_insights.high_value_customers AS
SELECT 
    customer_id,
    full_name,
    email,
    region,
    country,
    account_balance,
    credit_score,
    is_vip,
    CASE 
        WHEN account_balance >= 100000 AND credit_score >= 750 THEN 'Premier'
        WHEN account_balance >= 50000 AND credit_score >= 700 THEN 'Elite'
        WHEN account_balance >= 25000 THEN 'Preferred'
        ELSE 'Standard'
    END as customer_segment
FROM na_regional_customers.customer_data.regional_customers_view
WHERE account_balance >= 25000;

In [None]:
-- Query the recipient-side enhanced view
SELECT 
    customer_segment,
    COUNT(*) as customer_count,
    ROUND(AVG(account_balance), 2) as avg_balance,
    ROUND(AVG(credit_score), 2) as avg_credit_score
FROM na_partner_analytics.customer_insights.high_value_customers
GROUP BY customer_segment
ORDER BY MIN(account_balance) DESC;

## Step 9: Monitoring Data Freshness

Recipients should monitor the freshness of shared data.

In [None]:
-- Check when the shared data was last updated
DESCRIBE DETAIL na_regional_customers.customer_data.regional_customers_view;

In [None]:
-- Monitor data volume trends
SELECT 
    DATE_TRUNC('week', last_activity) as week,
    COUNT(*) as active_customers,
    COUNT(DISTINCT country) as countries_active
FROM na_regional_customers.customer_data.regional_customers_view
WHERE last_activity >= CURRENT_DATE - INTERVAL 90 DAYS
GROUP BY DATE_TRUNC('week', last_activity)
ORDER BY week DESC;

## Step 10: Best Practices for Recipients

Key best practices when working with fine-grained access controlled data:

### Best Practices

1. **Understand Your Access Level**
   - Always verify what data you can see
   - Document access restrictions in your code
   - Don't assume you have complete data

2. **Respect Data Masking**
   - Don't attempt to reverse-engineer masked data
   - Don't combine multiple masked datasets to re-identify individuals
   - Follow privacy regulations (GDPR, CCPA, etc.)

3. **Cache Appropriately**
   - Cache aggregated results, not raw PII
   - Set appropriate TTLs on cached data
   - Honor data retention policies

4. **Document Limitations**
   - Document which analyses are possible with your access level
   - Note any biases introduced by filtering
   - Track data lineage

5. **Monitor Quality**
   - Check for data freshness
   - Validate completeness within your scope
   - Report issues to provider

6. **Implement Recipient Governance**
   - Apply your own security controls
   - Create derived views with additional protection
   - Audit access to shared data

## Cleanup

Clean up the demo resources when finished.

In [None]:
-- Uncomment to drop catalogs when finished
-- DROP CATALOG IF EXISTS na_regional_customers CASCADE;
-- DROP CATALOG IF EXISTS marketing_insights CASCADE;
-- DROP CATALOG IF EXISTS risk_analytics_data CASCADE;
-- DROP CATALOG IF EXISTS compliance_audit_data CASCADE;
-- DROP CATALOG IF EXISTS na_partner_analytics CASCADE;

## Summary

In this demo, you experienced fine-grained access control from the recipient perspective:

✅ **Row-Level Security**: Regional partners saw only their geographic data
✅ **Column-Level Masking**: Marketing agencies saw masked PII
✅ **Data Anonymization**: Risk analysts saw financial data without identity
✅ **Full Access Control**: Compliance auditors saw complete data
✅ **Partition Filtering**: Storage-level restrictions prevented unauthorized access
✅ **Dynamic Views**: Same source data, different recipient experiences

## Key Takeaways

1. **Flexible Security**: One dataset can serve multiple use cases securely
2. **Automatic Enforcement**: Access controls apply automatically at query time
3. **No Data Duplication**: Provider maintains single source of truth
4. **Privacy by Design**: Sensitive data never leaves provider's control
5. **Recipient Transparency**: Recipients understand their access level
6. **Compliance Ready**: Supports regulatory requirements (GDPR, HIPAA, etc.)

## Real-World Applications

**Healthcare**: Share patient data with researchers (anonymized) and providers (full PII)

**Financial Services**: Share transaction data with regulators (full) and analysts (anonymized)

**Retail**: Share sales data with vendors (product-level) and partners (aggregated)

**Government**: Share public records with citizens (redacted) and agencies (full)

## Next Steps

1. Learn about Change Data Feed (CDF) for tracking updates
2. Implement custom recipient-side data quality checks
3. Explore audit logging and monitoring
4. Study regulatory compliance patterns

---
&copy; 2025 Databricks, Inc. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the <a href="https://www.apache.org/" target="_blank">Apache Software Foundation</a>.<br/><br/><a href="https://databricks.com/privacy-policy" target="_blank">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use" target="_blank">Terms of Use</a> | <a href="https://help.databricks.com/" target="_blank">Support</a>