# The Delta Sharing protocol

The protocol is described in the delta github [repo](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md).

In this notebook we will discover the different elements through some examples.

## Concepts

- Share: A share is a logical grouping to share with recipients. A share can be shared with one or multiple recipients. A recipient can access all resources in a share. A share may contain multiple schemas.
- Schema: A schema is a logical grouping of tables. A schema may contain multiple tables.
- Table: A table is a Delta Lake table or a view on top of a Delta Lake table.

## Delta Sharing server configuration file

```yaml
# The format version of this config file
version: 1
# Config shares/schemas/tables to share
shares:
- name: "demo"
  schemas:
  - name: "world"
    tables:
    - name: "cities"
      location: "s3://demodata/silver/world/cities"
  - name: "sales"
    tables:
    - name: "sample"
      location: "s3://demodata/silver/sales"
- name: "azurite_demo"
  schemas:
  - name: "azworld"
    tables:
    - name: "cities"
      location: "wasbs://world@devstoreaccount1.blob.azserver:10000/cities/cities"
  - name: "azsales"
    tables:
    - name: "sample"
      location: "wasbs://sales@devstoreaccount1.blob.azserver:10000/sales"
- name: "gcs_demo"
  schemas:
  - name: "gcsworld"
    tables:
    - name: "cities"
      location: "gs://storage_bucket/cities"
  - name: "gcssales"
    tables:
    - name: "sample"
      location: "gs://storage_bucket/sales"
# Set the host name that the server will use
host: "0.0.0.0"
# Set the port that the server will listen on
port: 8080
# Set the url prefix for the REST APIs
endpoint: "/delta-sharing"
# Set the timeout of S3 presigned url in seconds
preSignedUrlTimeoutSeconds: 900
# How many tables to cache in the server
deltaTableCacheSize: 10
# Whether we can accept working with a stale version of the table. This is useful when sharing
# static tables that will never be changed.
stalenessAcceptable: false
# Whether to evaluate user provided `predicateHints`
evaluatePredicateHints: false
authorization:
  bearerToken: authTokenDeltaSharing432
```

In [2]:
import requests
import json
from IPython.display import JSON

In [3]:
headers = {
    'Authorization': 'Bearer authTokenDeltaSharing432',
}
base_url='http://delta:8080/delta-sharing/'
s3_share='demo'
s3_schema='world'
s3_sales_schema='sales'
az_share='azurite_demo'
az_schema='azworld'
az_sales_schema='azsales'
gcs_share='gcs_demo'
gcs_schema='gcsworld'
gcs_sales_schema='gcssales'
table='cities'
sales_table='sample'

In [4]:
def call_api(path, method='POST'):
    return requests.request(method, f'{base_url}/{path}', headers=headers)

def split_response(response):
    lines = response.iter_lines()

    protocol = json.loads(next(lines))
    metadata = json.loads(next(lines))

    files = [json.loads(file) for file in lines]
    
    return (protocol, metadata, files)


## REST Api's

### List shares

In [5]:
path='shares'
response = call_api(path, method='GET')
JSON(response.json(), expanded=True)

<IPython.core.display.JSON object>

### List schemas in a share

#### S3 storage

In [6]:
path=f'shares/{s3_share}/schemas'
response = call_api(path, method='GET')
JSON(response.json(), expanded=True)

<IPython.core.display.JSON object>

#### Azure storage

In [7]:
path=f'shares/{az_share}/schemas'
response = call_api(path, method='GET')
JSON(response.json(), expanded=True)

<IPython.core.display.JSON object>

#### GCS storage

In [8]:
path=f'shares/{gcs_share}/schemas'
response = call_api(path, method='GET')
JSON(response.json(), expanded=True)

<IPython.core.display.JSON object>

### List tables in a schema

#### S3 storage

In [9]:
path=f'shares/{s3_share}/schemas/{s3_schema}/tables'
response = call_api(path, method='GET')
JSON(response.json(), expanded=True)

<IPython.core.display.JSON object>

In [10]:
path=f'shares/{s3_share}/schemas/{s3_sales_schema}/tables'
response = call_api(path, method='GET')
JSON(response.json(), expanded=True)

<IPython.core.display.JSON object>

#### Azure storage

In [11]:
path=f'shares/{az_share}/schemas/{az_schema}/tables'
response = call_api(path, method='GET')
JSON(response.json(), expanded=True)

<IPython.core.display.JSON object>

In [12]:
path=f'shares/{az_share}/schemas/{az_sales_schema}/tables'
response = call_api(path, method='GET')
JSON(response.json(), expanded=True)

<IPython.core.display.JSON object>

#### GCS storage

In [13]:
path=f'shares/{gcs_share}/schemas/{gcs_schema}/tables'
response = call_api(path, method='GET')
JSON(response.json(), expanded=True)

<IPython.core.display.JSON object>

In [14]:
path=f'shares/{gcs_share}/schemas/{gcs_sales_schema}/tables'
response = call_api(path, method='GET')
JSON(response.json(), expanded=True)

<IPython.core.display.JSON object>

### List all tables in a share

#### S3 storage

In [15]:
path=f'shares/{s3_share}/all-tables'
response = call_api(path, method='GET')
JSON(response.json(), expanded=True)

<IPython.core.display.JSON object>

#### Azure storage

In [16]:
path=f'shares/{az_share}/all-tables'
response = call_api(path, method='GET')
JSON(response.json(), expanded=True)

<IPython.core.display.JSON object>

#### GCS storage

In [17]:
path=f'shares/{gcs_share}/all-tables'
response = call_api(path, method='GET')
JSON(response.json(), expanded=True)

<IPython.core.display.JSON object>

### Query table version

#### S3 storage

In [18]:
path=f'shares/{s3_share}/schemas/{s3_schema}/tables/{table}'
response = call_api(path, method='HEAD')
JSON(dict(response.headers), expanded=True)

<IPython.core.display.JSON object>

In [19]:
path=f'shares/{s3_share}/schemas/{s3_sales_schema}/tables/{sales_table}'
response = call_api(path, method='HEAD')
JSON(dict(response.headers), expanded=True)

<IPython.core.display.JSON object>

#### Azure storage

In [20]:
path=f'shares/{az_share}/schemas/{az_schema}/tables/{table}'
response = call_api(path, method='HEAD')
JSON(dict(response.headers), expanded=True)

<IPython.core.display.JSON object>

In [21]:
path=f'shares/{az_share}/schemas/{az_sales_schema}/tables/{sales_table}'
response = call_api(path, method='HEAD')
JSON(dict(response.headers), expanded=True)

<IPython.core.display.JSON object>

#### GCS storage

In [22]:
path=f'shares/{gcs_share}/schemas/{gcs_schema}/tables/{table}'
response = call_api(path, method='HEAD')
JSON(dict(response.headers), expanded=True)

<IPython.core.display.JSON object>

In [23]:
path=f'shares/{gcs_share}/schemas/{gcs_sales_schema}/tables/{sales_table}'
response = call_api(path, method='HEAD')
JSON(dict(response.headers), expanded=True)

<IPython.core.display.JSON object>

### Get table metadata

#### S3 storage

In [24]:
path=f'shares/{s3_share}/schemas/{s3_schema}/tables/{table}/metadata'
response = call_api(path, method='GET')
_, metadata, _ = split_response(response)
JSON(metadata, expanded=True)

<IPython.core.display.JSON object>

In [25]:
path=f'shares/{s3_share}/schemas/{s3_sales_schema}/tables/{sales_table}/metadata'
response = call_api(path, method='GET')
_, metadata, _ = split_response(response)
JSON(metadata, expanded=True)

<IPython.core.display.JSON object>

#### Azure storage

In [26]:
path=f'shares/{az_share}/schemas/{az_schema}/tables/{table}/metadata'
response = call_api(path, method='GET')
_, metadata, _ = split_response(response)
JSON(metadata, expanded=True)

<IPython.core.display.JSON object>

In [27]:
path=f'shares/{az_share}/schemas/{az_sales_schema}/tables/{sales_table}/metadata'
response = call_api(path, method='GET')
_, metadata, _ = split_response(response)
JSON(metadata, expanded=True)

<IPython.core.display.JSON object>

#### GCS storage

In [28]:
path=f'shares/{gcs_share}/schemas/{gcs_schema}/tables/{table}/metadata'
response = call_api(path, method='GET')
response
_, metadata, _ = split_response(response)
JSON(metadata, expanded=True)

<IPython.core.display.JSON object>

In [29]:
path=f'shares/{gcs_share}/schemas/{gcs_sales_schema}/tables/{sales_table}/metadata'
response = call_api(path, method='GET')
_, metadata, _ = split_response(response)
JSON(metadata, expanded=True)

<IPython.core.display.JSON object>

### Read data from a table

#### S3 storage

In [30]:
path=f'shares/{s3_share}/schemas/{s3_schema}/tables/{table}/query'
response = call_api(path)
protocol, metadata, files = split_response(response)

In [31]:
JSON(protocol, expanded=True)

<IPython.core.display.JSON object>

In [32]:
JSON(metadata, expanded=True)

<IPython.core.display.JSON object>

In [33]:
JSON(files, expanded=True)

<IPython.core.display.JSON object>

In [34]:
path=f'shares/{s3_share}/schemas/{s3_sales_schema}/tables/{sales_table}/query'
sales_response = call_api(path)
sales_protocol, sales_metadata, sales_files = split_response(sales_response)

In [35]:
JSON(sales_protocol, expanded=True)

<IPython.core.display.JSON object>

In [36]:
JSON(sales_metadata, expanded=True)

<IPython.core.display.JSON object>

In [37]:
JSON(sales_files, expanded=True)

<IPython.core.display.JSON object>

#### Azure storage

In [38]:
path=f'shares/{az_share}/schemas/{az_schema}/tables/{table}/query'
response = call_api(path)
protocol, metadata, files = split_response(response)

In [39]:
JSON(protocol, expanded=True)

<IPython.core.display.JSON object>

In [40]:
JSON(metadata, expanded=True)

<IPython.core.display.JSON object>

In [41]:
JSON(files, expanded=True)

<IPython.core.display.JSON object>

In [42]:
path=f'shares/{az_share}/schemas/{az_sales_schema}/tables/{sales_table}/query'
sales_response = call_api(path)
sales_protocol, sales_metadata, sales_files = split_response(sales_response)

In [43]:
JSON(sales_protocol, expanded=True)

<IPython.core.display.JSON object>

In [44]:
JSON(sales_metadata, expanded=True)

<IPython.core.display.JSON object>

In [45]:
JSON(sales_files, expanded=True)

<IPython.core.display.JSON object>

#### GCS storage

In [46]:
path=f'shares/{gcs_share}/schemas/{gcs_schema}/tables/{table}/query'
response = call_api(path)
protocol, metadata, files = split_response(response)

In [47]:
JSON(protocol, expanded=True)

<IPython.core.display.JSON object>

In [48]:
JSON(metadata, expanded=True)

<IPython.core.display.JSON object>

In [49]:
JSON(files, expanded=True)

<IPython.core.display.JSON object>

In [50]:
path=f'shares/{gcs_share}/schemas/{gcs_sales_schema}/tables/{sales_table}/query'
sales_response = call_api(path)
sales_protocol, sales_metadata, sales_files = split_response(sales_response)

In [51]:
JSON(sales_protocol, expanded=True)

<IPython.core.display.JSON object>

In [52]:
JSON(sales_metadata, expanded=True)

<IPython.core.display.JSON object>

In [53]:
JSON(sales_files, expanded=True)

<IPython.core.display.JSON object>

## CDC API

In [62]:
path=f'shares/{s3_share}/schemas/{s3_sales_schema}/tables/{sales_table}/changes?startingVersion=0'
response = call_api(path, method='GET')
JSON(response.json(), expanded=True)

<IPython.core.display.JSON object>