# D2O Delta Sharing - Recipient Demo

## Overview
This notebook demonstrates how to access Databricks Delta Sharing data as an **open recipient** (non-Databricks user). 

In this demo, we'll:
1. Load credentials from the config file
2. Connect to the Delta Share
3. List available shares and tables
4. Query the shared data using pandas
5. Create visualizations using seaborn

**Prerequisites:**
- The provider has created a share and recipient
- You have received the credential file (`config.share`)
- This notebook is running in a Docker container with the necessary libraries

## Step 1: Import Required Libraries

In [None]:
import delta_sharing
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import warnings

# Configure matplotlib and seaborn
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
warnings.filterwarnings('ignore')

print("✓ Libraries imported successfully")
print(f"Delta Sharing version: {delta_sharing.__version__}")
print(f"Pandas version: {pd.__version__}")

## Step 2: Load Delta Sharing Credentials

The credentials are mounted as a file at `/tmp/config.share` when running the Docker container.

In [None]:
# Path to the mounted credential file
config_file_path = '/tmp/config.share'

# Create a SharingClient using the mounted config file
client = delta_sharing.SharingClient(config_file_path)

print("✓ Delta Sharing client initialized successfully")
print(f"✓ Using config from: {config_file_path}")

## Step 3: List Available Shares

Let's discover what shares are available to us.

In [None]:
# List all available shares
shares = client.list_shares()

print(f"✓ Found {len(shares)} share(s)\n")
for share in shares:
    print(f"Share: {share.name}")
    if hasattr(share, 'id'):
        print(f"  ID: {share.id}")

## Step 4: List Schemas in the Share

Now let's see what schemas (databases) are available in the share.

In [None]:
# Get the first share (assuming it's the external_retail share)
share_name = shares[0].name
print(f"Working with share: {share_name}\n")

# List schemas in the share
schemas = client.list_schemas(delta_sharing.Share(name=share_name))

print(f"✓ Found {len(schemas)} schema(s)\n")
for schema in schemas:
    print(f"Schema: {schema.name}")
    if hasattr(schema, 'share'):
        print(f"  Share: {schema.share}")

## Step 5: List Tables in the Schema

Let's see what tables are available for us to query.

In [None]:
# Get the first schema
schema_name = schemas[0].name
print(f"Working with schema: {schema_name}\n")

# List all tables in the schema
tables = client.list_tables(delta_sharing.Schema(name=schema_name, share=share_name))

print(f"✓ Found {len(tables)} table(s)\n")
for i, table in enumerate(tables, 1):
    print(f"{i}. Table: {table.name}")
    if hasattr(table, 'share'):
        print(f"   Share: {table.share}")
    if hasattr(table, 'schema'):
        print(f"   Schema: {table.schema}")
    print()

## Step 6: Query the Customers Table

Let's load the customers table into a pandas DataFrame and explore the data.

In [None]:
# Construct table URL for customers
customers_table_url = f"{config_file_path}#{share_name}.{schema_name}.customers"

# Load the table into a pandas DataFrame
print("Loading customers table...")
customers_df = delta_sharing.load_as_pandas(customers_table_url)

print(f"✓ Loaded {len(customers_df)} customer records\n")

print("\nFirst few records:")
customers_df.head()

## Step 7: Query the Sales Transactions Table

Now let's load the sales transactions data.

In [None]:
# Construct table URL for sales transactions
sales_table_url = f"{config_file_path}#{share_name}.{schema_name}.sales_transactions"

# Load the table into a pandas DataFrame
print("Loading sales transactions table...")
sales_df = delta_sharing.load_as_pandas(sales_table_url)

print(f"✓ Loaded {len(sales_df)} transaction records\n")

print("\nFirst few records:")
sales_df.head()

## Step 8: Visualization 2 - Revenue by Customer Segment

Join customer and sales data to show total revenue by segment.

In [None]:
# Merge customer and sales data
merged_df = sales_df.merge(customers_df, on='customer_id', how='left')

# Aggregate revenue by customer segment
segment_revenue = merged_df.groupby('customer_segment')['total_amount'].sum().sort_values(ascending=False).reset_index()

# Create a beautiful bar chart
plt.figure(figsize=(12, 6))
ax = sns.barplot(data=segment_revenue, x='customer_segment', y='total_amount', 
                 palette=['#2ecc71', '#3498db', '#e74c3c'], edgecolor='white', linewidth=2)

plt.title('Total Revenue by Customer Segment', fontsize=16, fontweight='bold', pad=20)
plt.xlabel('Customer Segment', fontsize=13)
plt.ylabel('Total Revenue ($)', fontsize=13)
plt.grid(axis='y', alpha=0.3, linestyle='--')

# Add value labels on bars
for i, bar in enumerate(ax.patches):
    height = bar.get_height()
    ax.text(bar.get_x() + bar.get_width()/2., height,
            f'${height:,.0f}',
            ha='center', va='bottom', fontsize=11, fontweight='bold')

plt.tight_layout()
plt.show()

# Print summary
total_revenue = segment_revenue['total_amount'].sum()
print(f"\n✓ Visualization complete")
print(f"Total Revenue: ${total_revenue:,.2f}")

## Summary

🎉 **Demo Complete!**

In this notebook, we successfully demonstrated D2O (Databricks-to-Open) Delta Sharing as a recipient:

✅ **What we accomplished:**
1. Loaded credentials from environment variable
2. Connected to the Delta Sharing endpoint
3. Listed available shares, schemas, and tables
4. Queried shared data using pandas
5. Performed data analysis and generated summary statistics
6. Created a visualization using seaborn and matplotlib

**Key Benefits of D2O Delta Sharing:**
- 🚀 **No Data Duplication**: Access live data without copying
- 🔒 **Secure**: Token-based authentication
- ⚡ **Real-time**: Always get the latest data from provider
- 💰 **Cost-effective**: No storage costs for recipients
- 🛠️ **Tool Agnostic**: Use any tool that supports Delta Sharing (Python, Power BI, Tableau, etc.)
- 🌐 **Open Standard**: Based on open Delta Sharing protocol

**Next Steps:**
- Explore more complex queries and aggregations
- Integrate with your existing data pipelines
- Build dashboards using Power BI or other BI tools
- Set up automated reporting workflows

## 📚 Additional Resources

### External Links
- [Databricks Delta Sharing Docs](https://docs.databricks.com/delta-sharing/)
- [Delta Sharing Protocol](https://github.com/delta-io/delta-sharing)
- [Python delta-sharing Library](https://github.com/delta-io/delta-sharing/tree/main/python)

---
**© 2025 Databricks, Inc. All rights reserved.**