# Accessing the data as a Consumer

In the previous notebook, we shared our data and granted read access to our RECIPIENT.

Let's now see how external consumers can directly access the data.

<img src="https://github.com/QuentinAmbard/databricks-demo/raw/main/delta-sharing/resources/images/delta-sharing-flow.png" width="900px"/>

<img width="1px" src="https://ppxrzfxige.execute-api.us-west-2.amazonaws.com/v1/analytics?category=governance&org_id=4214571749987147&notebook=%2F03-receiver-delta-sharing-demo&demo_name=delta-sharing-airlines&event=VIEW&path=%2F_dbdemos%2Fgovernance%2Fdelta-sharing-airlines%2F03-receiver-delta-sharing-demo&version=1">


## Delta Sharing Credentials

When a new Recipient entity is created for a Delta Share an activation link for that recipient will be generated. That URL will lead to a website for data recipients to download a credential file that contains a long-term access token for that recipient. Following the link will be take the recipient to an activation page that looks similar to this:

<img src="https://raw.githubusercontent.com/databricks/tech-talks/master/images/kanonymity_share_activation.png" width=600>


From this site the .share credential file can be downloaded by the recipient. This file contains the information and authorization token needed to access the Share. The contents of the file will look similar to the following example.


<img src="https://raw.githubusercontent.com/databricks/tech-talks/master/images/delta_sharing_cred_file_3.png" width="800">

Due to the sensitive nature of the token, be sure to save it in a secure location and be careful when visualising or displaying the contents. 

# Accessing the data using plain Python

`delta-sharing` is available as a python package that can be installed via pip. <br>

This simplifies the consumer side integration; anyone who can run python can consume shared data via SharingClient object. <br>

In [0]:
%pip install delta-sharing

In [0]:
import delta_sharing
# Southwest Airlines
# In the previous notebook, we saved the credential file under dbfs:/FileStore/southwestairlines.share
# Let's re-use it directly to access our data. If you get access error, please re-run the previous notebook
americanairlines_profile = '/Volumes/pds/dbdemos_sharing_airlinedata/raw_data/americanairlines.share'

# Create a SharingClient
client = delta_sharing.SharingClient(americanairlines_profile)

# List all shared tables.
client.list_all_tables()


It is possible to iterate through the list to view all of the tables along with their corresponding schemas and shares. <br>
The share file can be stored on a remote storage.

In [0]:
shares = client.list_shares()

for share in shares:
    schemas = client.list_schemas(share)
    for schema in schemas:
        tables = client.list_tables(schema)
        for table in tables:
            print(f'Table Name = {table.name}, share = {table.share}, schema = {table.schema}')

# Query the Shared Table Using the Ever so Popular Pandas

Delta sharing allows us to access data via Pandas connector. <br>
To access the shared data we require a properly constructed url. <br>
The expected format of the url is: < profile_file \>#< share_id \>.< database \>.< table \><br>

In [0]:
table_url = f"{americanairlines_profile}#dbdemos_americanairlines.dbdemos_sharing_airlinedata.lookupcodes"

# Use delta sharing client to load data
flights_df = delta_sharing.load_as_pandas(table_url)

flights_df.head(10)

# Query Big Dataset using Spark

Similarly to Pandas connect delta sharing comes with a spark connector. <br>
The way to specify the location of profile file slightly differs between connectors. <br>
For spark connector the profile file path needs to be HDFS compliant. <br>

To load the data into spark, we can use delta sharing client.

In [0]:
spark_flights_df = delta_sharing.load_as_spark(f"{americanairlines_profile}#dbdemos_americanairlines.dbdemos_sharing_airlinedata.flights_protected")

from pyspark.sql.functions import sum, col, count

display(spark_flights_df.
        where('cancelled = 1').
        groupBy('UniqueCarrier', 'month', 'year').
        agg(count('FlightNum').alias('Total Cancellations')).
        orderBy(col('year').asc(), col('month').asc(), col('Total Cancellations').desc()))

Alternatively, we can use 'deltaSharing' fromat in spark reader. 

In [0]:
spark_flights_df = spark.read.format('deltaSharing').load(f"{americanairlines_profile}#dbdemos_americanairlines.dbdemos_sharing_airlinedata.flights_protected")

display(spark_flights_df.
        where('cancelled = 1').
        groupBy('UniqueCarrier', 'month').
        agg(count('FlightNum').alias('Total Cancellations')).
        orderBy(col('month').asc(), col('Total Cancellations').desc()))

# Query your Delta Sharing table using plain SQL with Databricks!

As a Databricks user, you can experience Delta Sharing using plain SQL directly in your notebook, making data access even easier.

It's then super simple to do any kind of queries using the remote table, including joining a Delta Sharing table with a local one or any other operation.

We can create a SQL table and use `'deltaSharing'` as a datasource. <br>
As usual, we need to provide the url as: `< profile_file >#< share_id >.< database >.< table >` <br>
Note that in this case we cannot use secrets since other parties would be able to see the token in clear text via table properties.

In [0]:
%sql
DROP TABLE IF EXISTS dbdemos_delta_sharing_demo_flights;
-- CREATE TABLE IF NOT EXISTS dbdemos_delta_sharing_demo_flights
--     USING deltaSharing
--     LOCATION "/<ADD YOUR PATH>>americanairlines.share#dbdemos_americanairlines.dbdemos_sharing_airlinedata.flights_protected";

In [0]:
%sql 
-- select * from dbdemos_delta_sharing_demo_flights

In [0]:
#CLEANUP THE DEMO FOR FRESH START, delete all share and recipient created
#cleanup_demo()

# Integration with external tools such as Power BI

Delta Sharing is natively integrated with many tools outside of Databricks. 

As example, users can natively access a Delta Sharing table within powerBI directly:


<iframe width="560" height="315" src="https://www.youtube.com/embed/vZ1jcDh_tsw" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>


# Conclusion
To recap, Delta Sharing is a cloud and platform agnostic solution to share your data with external consumer. 

It's simple (pure SQL), open (can be used on any system) and scalable.

All recipients can access your data, using Databricks or any other system on any Cloud.

Delta Sharing enable critical use cases around Data Sharing and Data Marketplace. 

When combined with Databricks Unity catalog, it's the perfect too to accelerate your Datamesh deployment and improve your data governance.

Next: Discover how to easily [Share data within Databricks with Unity Catalog]($./04-share-data-within-databricks)


[Back to Overview]($./01-Delta-Sharing-presentation)