
# B2B Data Exchange with Delta Sharing

On this notebook, we'll explore how to create a SHARE to share data with another organization.


##  Discovering the data
To Illustrate let's consider us a company like **TripActions**, a Corporate Travel & Spend Management Platform. 

We have already adopted a <b> Delta Lakehouse Architecture </b> for servicing all of our data internally. 

A few of our largest partnered airlines, <b>American Airlines</b> & <b>Southwest</b> just let us know that they are looking to partner to add reward and recommendation programs to airline customers using TripActions data. In order to pilot this new feature, they need daily data of scheduled and results of flights taking within TripActions.

We'll leverage Delta Sharing to grant data access to Americal Airlines and Southwest without data duplication and replication. 

<img width="1px" src="https://ppxrzfxige.execute-api.us-west-2.amazonaws.com/v1/analytics?category=governance&org_id=4214571749987147&notebook=%2F02-provider-delta-sharing-demo&demo_name=delta-sharing-airlines&event=VIEW&path=%2F_dbdemos%2Fgovernance%2Fdelta-sharing-airlines%2F02-provider-delta-sharing-demo&version=1">

## Cluster setup for UC

<img src="https://github.com/QuentinAmbard/databricks-demo/raw/main/product_demos/uc/uc-cluster-setup-single-user.png" style="float: right"/>


To be able to run this demo, make sure you create a cluster with the security mode enabled.

Go in the compute page, create a new cluster.

Select "Single User" and your UC-user (the user needs to exist at the workspace and the account level)

**Make sure your cluster is using DBR 11.2+**

In [0]:
%run ./_resources/00-setup $reset_all_data=false

## Delta Sharing

Delta Sharing let you share data with external recipient without creating copy of the data. Once they're authorized, recipients can access and download your data directly.

In Delta Sharing, it all starts with a Delta Lake table registered in the Delta Sharing Server by the data provider. <br/>
This is done with the following steps:
- Create a RECIPIENT and share activation link with your recipient 
- Create a SHARE
- Add your Delta tables to the given SHARE
- GRANT SELECT on your SHARE to your RECIPIENT
 
Once this is done, your customer will be able to download the credential files and use it to access the data directly:

- Client authenticates to Sharing Server
- Client requests a table (including filters)
- Server checks access permissions
- Server generates and returns pre-signed short-lived URLs
- Client uses URLs to directly read files from object storage
<br>
<img src="https://raw.githubusercontent.com/QuentinAmbard/databricks-demo/main/product_demos/delta-sharing-flow.png" width="1000" />

## Unity Catalog
Databricks Unity Catalog is the central place to administer your data governance and security.<br/>
Unity Catalog’s security model is based on standard ANSI SQL, to grant permissions at the level of databases, tables, views, rows and columns<br/>
Using Databricks, we'll leverage the Unity Catalog to easily share data with our customers.

In [0]:
-- the catalog has been created for your user and is defined as default. All shares will be created inside.
-- make sure you run the 00-setup cell above to init the catalog to your user. 
SELECT CURRENT_CATALOG(), CURRENT_SCHEMA();

### Step 1: Create a Share

<img src="https://raw.githubusercontent.com/QuentinAmbard/databricks-demo/main/product_demos/delta-sharing-flow-1.png" width="700" style="float:right" />

We'll use the UNITY catalog to create 2 shares:
- One for American Airlines data
- One for Southwest Airlines data

In [0]:
-- Note: you need to be account ADMIN to create the shares or GRANT CREATE PERMISSION to another principal:
-- GRANT CREATE SHARE ON metastore TO `<my_principal@xx.com>`;
-- GRANT CREATE RECIPIENT ON metastore TO `<my_principal@xx.com>`;

CREATE SHARE IF NOT EXISTS dbdemos_americanairlines 
COMMENT 'Daily Flight Data provided by Tripactions to American Airlines for Extended Rewards';

CREATE SHARE IF NOT EXISTS dbdemos_southwestairlines 
COMMENT 'Daily Flight Data provided by Tripactions to Southwest Airlines for Extended Rewards';

-- You can grant ownership to other users. Typical deployments wouls have admin groups or similar.
-- ALTER SHARE dbdemos_americanairlines OWNER TO `<my_principal@xx.com>`;
-- ALTER SHARE dbdemos_southwestairlines OWNER TO `<my_principal@xx.com>`;

In [0]:
DESCRIBE SHARE dbdemos_southwestairlines;


<img src="https://github.com/QuentinAmbard/databricks-demo/raw/main/product_demos/delta-sharing-create-recipient.png" width="500" style="float:right" />

**Did you know?** Delta Sharing isn't about SQL only. 

You can visualize all your Delta Sharing Shares using Databricks Data Explorer UI!

You can also create your share and recipient with just a few click.<br/>
Select "Delta Sharing" in the Data Explorer menu, then "Create Share", "Create recipient" ...

### Step 2: Add the tables to the SHARES

<img src="https://raw.githubusercontent.com/QuentinAmbard/databricks-demo/main/product_demos/delta-sharing-flow-2.png" width="700" style="float:right" />

We'll add our main table `airlinedata.lookupcodes` to the 2 SHARES:

In [0]:
ALTER SHARE dbdemos_americanairlines  ADD TABLE lookupcodes ;
ALTER SHARE dbdemos_southwestairlines ADD TABLE lookupcodes;

In [0]:
SELECT * FROM lookupcodes WHERE Description = "Southwest Airlines Co." OR Description = "American Airlines Inc."

### Sharing a subset of a table to a SHARE recipient based on dynamic properties
We shouldn't share all the historical flights to all Airline. It might be private information and we don't want all our consumers accessing the entire `flights` table. 
<br>
#### Customizing Consumer Experience
To restrict the data access, we can properties on the recipient Shares, and then create a dynamic view that will query these properties.

Note: before supporting VIEW, you could also restrict the access through Delta partition, but this is much less flexible.
```
 ALTER SHARE americanairlines 
   ADD TABLE dbdemos_sharing_airlinedata.flights 
   PARTITION (UniqueCarrier = "AA") as dbdemos_sharing_airlinedata.`aa_flights`;
```

In [0]:
-- current_recipient('carrier_id') will be resolved to 'WN' or 'AA' based on the current recipient properties (see below to set the property value)
CREATE VIEW pds.dbdemos_sharing_airlinedata.flights_protected AS
    SELECT * FROM pds.dbdemos_sharing_airlinedata.flights
    WHERE UniqueCarrier = current_recipient('carrier_id');

In [0]:
ALTER SHARE dbdemos_americanairlines 
  ADD VIEW pds.dbdemos_sharing_airlinedata.flights_protected;

ALTER SHARE dbdemos_southwestairlines 
  ADD VIEW pds.dbdemos_sharing_airlinedata.flights_protected;

In [0]:
SHOW ALL IN SHARE dbdemos_southwestairlines;

### Step 3: Create a Recipient(s)

<img src="https://raw.githubusercontent.com/QuentinAmbard/databricks-demo/main/product_demos/delta-sharing-flow-3.png" width="700" style="float:right" />

Our next step is now to create the `RECIPIENT`.

We can have multiple RECIPIENT, and assign them to multiple SHARE.

In [0]:
CREATE RECIPIENT IF NOT EXISTS dbdemos_southwestairlines_recipient;
CREATE RECIPIENT IF NOT EXISTS dbdemos_americanairlines_recipient;

ALTER RECIPIENT dbdemos_southwestairlines_recipient SET PROPERTIES ('carrier_id' = 'WN');
ALTER RECIPIENT dbdemos_americanairlines_recipient SET PROPERTIES ('carrier_id' = 'AA');

-- You can set the ownership to a group of admin or similar.
-- ALTER RECIPIENT dbdemos_southwestairlines_recipient OWNER TO `<my_principal>`;
-- ALTER RECIPIENT dbdemos_americanairlines_recipient OWNER TO `<my_principal>`;

### Step 4: Share the activation link with external consumers


<img src="https://raw.githubusercontent.com/QuentinAmbard/databricks-demo/main/product_demos/delta-sharing-flow-5.png" width="700" style="float:right" />

Each Recipient has an activation link that the consumer can use to download it's credential.

<img src="https://raw.githubusercontent.com/QuentinAmbard/databricks-demo/main/product_demos/delta-sharing-credential.png" width=400>

The credentials are typically saved as a file containing. The Delta Server identify and authorize consumer based on these identifiants.<br/>
Note that the activation link is single use. You can only access it once (it'll return null if already used)

### Sharing data with customers using Databricks

Sharing data within Databricks is even simpler. All you need to do is get the Metastore ID from your recipient and create the share using it. <br/>
You won't need any credential file doing so, Databricks Unity Catalog does all the security for you.

`CREATE RECIPIENT IF NOT EXISTS southwestairlines_recipient USING ID 'aws:us-west-2:<the_reciever_recipient>' COMMENT 'Recipient for my external customer using Databricks';`

For more details, open the [Sharing data within Databricks]($./04-share-data-within-databricks) demo.

In [0]:
DESCRIBE RECIPIENT dbdemos_southwestairlines_recipient

In [0]:
%python
#This function just download the credential file for the RECIPIENT and save it under the given location as we'll need it next to access the data.
download_recipient_credential("dbdemos_southwestairlines_recipient", "/Volumes/pds/dbdemos_sharing_airlinedata/raw_data/southwestairlines.share")
download_recipient_credential("dbdemos_americanairlines_recipient", "/Volumes/pds/dbdemos_sharing_airlinedata/raw_data/americanairlines.share")

### Step 5: Define which Data to Share, and Access Level 

<img src="https://raw.githubusercontent.com/QuentinAmbard/databricks-demo/main/product_demos/delta-sharing-flow-4.png" width="600" style="float:right" />

We now have RECIPIENT and SHARE.

The next logical step is to make sure our RECIPIENT can have SELECT access to our SHARE.

As usual, this is done using standard SQL:

In [0]:
GRANT SELECT ON SHARE dbdemos_southwestairlines TO RECIPIENT dbdemos_southwestairlines_recipient;
GRANT SELECT ON SHARE dbdemos_americanairlines TO RECIPIENT dbdemos_americanairlines_recipient;

In [0]:
SHOW GRANT ON SHARE dbdemos_southwestairlines;

In [0]:
SHOW GRANT TO RECIPIENT dbdemos_southwestairlines_recipient;

In [0]:
REVOKE SELECT ON SHARE dbdemos_southwestairlines FROM RECIPIENT dbdemos_americanairlines_recipient;

In [0]:
SHOW ALL IN SHARE dbdemos_southwestairlines;

In [0]:
%python
#CLEANUP THE DEMO FOR FRESH START, delete all share and recipient created
#cleanup_demo()


## Let's now see how a Receiver can access the data

We saw how to create the 



Next: Discover how an external [receiver can your access]($./03-receiver-delta-sharing-demo) or easily [share data within Databricks with Unity Catalog]($./04-share-data-within-databricks)

[Back to Overview]($./01-Delta-Sharing-presentation)