
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img
    src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png"
    alt="Databricks Learning"
  >
</div>


# 2.3 DEMO: Implementing Delta Sharing (Databricks-to-Open) \[Recipient]

## Overview
In this demo, you will access data shared via Delta Sharing as a recipient without using a Databricks workspace. You'll learn how to download and use credential files to access shared data using various tools.

**Your Role:** External Partner (non-Databricks recipient)

**Scenario:**
You are an external partner receiving shared data from Acme Corp. You don't have access to a Databricks workspace, but you can access the shared data using open-source tools like Python/pandas, or commercial tools like Power BI.

**Learning Objectives:**
By the end of this demo, you will:
1. Understand how to download and secure Delta Sharing credential files
2. Access shared data using Python and the `delta-sharing` library
3. Query shared data with pandas
4. Understand how to connect Power BI Desktop to Delta Sharing
5. Learn security best practices for managing credentials

## Prerequisites

Before starting, ensure you have:
- Received the activation link from your data provider (via secure channel)
- Python 3.7+ installed (if using Python/pandas)
- Power BI Desktop installed (if using Power BI)

**Note:** This demo can be run in a standard Python environment or Jupyter notebook outside of Databricks.

## Step 1: Download the Credential File

The data provider will share an **activation link** with you securely. This link allows you to download your credential file.

### Instructions:

1. **Access the Activation Link**
   - Click the activation link provided by your data provider
   - You'll be redirected to a Databricks page

2. **Download the Credential File**
   - Click the **Download credential file** button
   - Save the file (typically named `config.share`) to a secure location

<br />
<br />
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img
    src="https://github.com/stackql/databricks-data-sharing-and-collaboration/blob/main/images/recipient-activation-link.png?raw=true"
    alt="Delta Sharing activation page"
  >
</div>
<br />
<br />

3. **Secure the Credential File** ⚠️
   - **IMPORTANT**: The credential file contains authentication tokens
   - Store it securely (e.g., encrypted folder, password manager)
   - Do NOT commit it to version control (add `*.share` to `.gitignore`)
   - Do NOT share it outside your organization
   - Treat it like a password or API key

4. **Credential File Format**
   The downloaded file is a JSON file with the `.share` extension containing:
   ```json
   {
     "shareCredentialsVersion": 1,
     "endpoint": "https://<workspace-url>/api/2.0/delta-sharing",
     "bearerToken": "<long-token-string>",
     "expirationTime": "<timestamp>"
   }
   ```

**Note:** The activation link can only be used once to download the credential file. If you lose the file, contact your data provider to rotate the token and receive a new activation link.

## Step 2: Analyze data in Python using the `delta-sharing` open client

> Run the subsequent steps outside of a Databricks environment

Use the Jupyter notebook provided in the GitHub repo or a local Python environment to test open sharing (to a non Databricks client).

### 1. Install the `delta-sharing` Python package

The `delta-sharing` library is an open-source Python connector for accessing Delta Sharing data. Install this using:

```bash
pip install delta-sharing
```

### 2. Instantiate a `SharingClient` instance

Using the `delta_sharing` package installed previously and the `config.share` file downloaded using the activation link provided to you by the provider, instantiate a `SharignClient` instance:

```python
import delta_sharing
CREDENTIAL_FILE = '.creds/config.share'

# Create a SharingClient
client = delta_sharing.SharingClient(CREDENTIAL_FILE)
```

### 3. List shares and assets accessible using your credential

Let's explore what data has been shared with you.

```python
# List all shares available to you
shares = client.list_shares()
print(f"Available shares: {len(shares)}\n")

for share in shares:
    print(f"Share: {share.name}")

# List schemas in the share
share_name = "external_retail"

schemas = client.list_schemas(delta_sharing.Share(share_name))
print(f"\nSchemas in '{share_name}': {len(schemas)}\n")

for schema in schemas:
    print(f"Schema: {schema.name}")

# List tables in a schema
schema_name = "external_retail"

tables = client.list_tables(delta_sharing.Schema(share_name, schema_name))
print(f"\nTables in '{share_name}.{schema_name}': {len(tables)}\n")

for table in tables:
    print(f"Table: {table.name}")
    print(f"  Share: {table.share}")
    print(f"  Schema: {table.schema}")
    print()
```

### 4. Query Shared Data with pandas

Now let's query the shared data and load it into pandas DataFrames.

```python
# Load customers table into pandas DataFrame
customers_url = f"{CREDENTIAL_FILE}#{share_name}.{schema_name}.customers"
customers_df = delta_sharing.load_as_pandas(customers_url)

print("Customers Data:")
print(f"Shape: {customers_df.shape}")
print("\nFirst few rows:")
customers_df.head()

# Load sales_transactions table
sales_url = f"{CREDENTIAL_FILE}#{share_name}.{schema_name}.sales_transactions"
sales_df = delta_sharing.load_as_pandas(sales_url)

print("Sales Transactions Data:")
print(f"Shape: {sales_df.shape}")
print("\nFirst few rows:")
sales_df.head()
```

### 5. Perform Analysis

With the data loaded into pandas dataframes, you can perform any analysis you need, including visualizations using `seaborn` or `matplotlib`.

## Step 3: Access Data with Power BI

You can also connect Power BI to Delta Sharing to create interactive dashboards.

### Instructions for Power BI:

1. **Open Power BI**

2. **Get Data**
   - Click **Get Data** on the Home ribbon
   - Search for and select **Delta Sharing** connector
   - Click **Connect**

<div style="padding: 10px; border: 2px solid #0078d4; background-color: #e3f2fd; margin: 10px 0;">
  <img src="https://github.com/stackql/databricks-data-sharing-and-collaboration/blob/main/images/power-bi-delta-sharing-1.png?raw=true" alt="Power BI Delta Sharing Connector" style="max-width: 600px;"/>
  <p><i>Power BI Get Data dialog with Delta Sharing connector</i></p>
</div>

3. **Configure Connection**
   - In the **Delta Sharing Server URL** text box paste the `endpoint` from your `config.share` file
   - In the **Authentication** text box paste the `bearerToken` from your `config.share` file
   - Click **Next**

<div style="padding: 10px; border: 2px solid #0078d4; background-color: #e3f2fd; margin: 10px 0;">
  <img src="https://github.com/stackql/databricks-data-sharing-and-collaboration/blob/main/images/power-bi-delta-sharing-2.png?raw=true" alt="Select Credential File" style="max-width: 600px;"/>
  <p><i>Screenshot: Delta Sharing credential file selection dialog</i></p>
</div>

4. **Select Tables**
   - Navigator window will show available shares, schemas, and tables
   - Check the tables you want to import (e.g., `customers`, `sales_transactions`)
   - You can preview the data before loading
   - Click **Load** or **Transform Data** (to use Power Query Editor)

5. **Build Reports**
   - Once data is loaded, create visualizations using Power BI's tools
   - Create relationships between tables if needed
   - Build interactive dashboards

### Security Considerations:

- The credential file is stored in your Power BI Desktop file
- When publishing to Power BI Service, configure data source credentials
- Do not share `.pbix` files containing credentials
- Use Power BI Service gateway for enterprise deployments

## Step 4: Access with Apache Spark (Optional)

If you have Apache Spark available, you can also use it to access Delta Sharing data.

```python
# Example Spark code (requires Spark environment with delta-sharing connector)

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("DeltaSharingExample") \
    .config("spark.jars.packages", "io.delta:delta-sharing-spark_2.12:1.0.0") \
    .getOrCreate()

df = (spark.read
  .format("deltasharing")
  .load("<profile-path>#<share-name>.<schema-name>.<table-name>")
)

# Read shared table into Spark DataFrame
customers_spark_df = spark.read.format("deltaSharing") \
    .option("responseFormat", "delta") \
    .load(customers_url)

customers_spark_df.show()

# Perform Spark SQL operations
customers_spark_df.createOrReplaceTempView("customers")
spark.sql("SELECT customer_segment, COUNT(*) as count FROM customers GROUP BY customer_segment").show()
```

## Best Practices and Security

### Credential Management

✅ **DO:**
- Store credential files in secure, encrypted locations
- Use environment variables or secure credential stores
- Add `*.share` to `.gitignore`
- Rotate credentials if compromised
- Use service accounts for production applications
- Implement proper access controls on credential files

❌ **DON'T:**
- Commit credential files to version control
- Share credentials outside your organization
- Hardcode credentials in source code
- Email credentials in plain text
- Store credentials in publicly accessible locations

### Performance Tips

- Use column pruning when possible (select only needed columns)
- Apply filters server-side when supported
- Consider caching frequently accessed data
- Use appropriate data formats for your use case
- Monitor query performance and optimize as needed

### Monitoring and Troubleshooting

- Check token expiration dates regularly
- Monitor for authentication errors
- Contact provider if shares become unavailable
- Keep the `delta-sharing` library updated
- Review provider's data update schedule

## Summary

Congratulations! You've successfully:

✅ Downloaded and secured your Delta Sharing credential file  
✅ Accessed shared data using Python and the `delta-sharing` library  
✅ Queried and analyzed shared data with pandas  
✅ Learned how to connect Power BI Desktop to Delta Sharing  
✅ Understood security best practices for credential management  

**Key Takeaways:**

- **Tool Flexibility**: Use Python, Power BI, Spark, Tableau, or other compatible tools
- **Zero Setup**: No complex infrastructure or ETL processes needed
- **Real-time Access**: Always query the latest data from the provider
- **Cost Effective**: No data storage or compute costs for recipients
- **Secure**: Token-based authentication with provider-controlled access

**Compatible Tools:**
- Python (pandas, Spark)
- Power BI Desktop
- Tableau
- AWS Glue
- Presto/Trino
- Any tool supporting the open Delta Sharing protocol

---
&copy; 2025 Databricks, Inc. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the <a href="https://www.apache.org/" target="_blank">Apache Software Foundation</a>.<br/><br/><a href="https://databricks.com/privacy-policy" target="_blank">Privacy Policy</a> | <a href="https://databricks.com/terms-of-use" target="_blank">Terms of Use</a> | <a href="https://help.databricks.com/" target="_blank">Support</a>