# Delta Sharing Release 0.5.0

We are excited to announce the release of Delta Sharing 0.5.0, which introduces the following improvements:

## Improvements

    - Support for Change Data Feed which allows clients to fetch incremental changes for the shared tables. (#135, #136, #137, #138, #140, #141, #142, #145, #146, #147, #148, #149, #150, #151, #152, #153, #155, #159)
    - Include response body in HTTPError exception in Python library (#124)
    - Improve the error message for the /share/schema/table APIs (#120)
    - Protocol and REST API documentation improvements (#121, #128, #131)
    - Add query_table_version to the rest client (#111)


In [2]:
# Uncomment below and run to install the Python Delta Sharing connector

#import sys
#!{sys.executable} -m pip install delta-sharing==0.5.1 pandas requests

In [6]:
from IPython.display import display
import delta_sharing

# Replace the location of the file after downloading from:
# https://github.com/delta-io/delta-sharing/blob/main/examples/open-datasets.share
profile_file_path = '~/Downloads/sharing_profile.share' 
client = delta_sharing.SharingClient(profile_file_path)

# Display all tables
client.list_all_tables()

[Table(name='boston-housing', share='delta_sharing', schema='default')]

## Change Data Feed Demo

New in Delta Sharing release 0.5.0 introduces Change Data Free, allowing sharing clients to fetch incremental changes for shared tables.

### Data Provider: Configuration

First, the data provider enables [Change Data Feed](https://docs.delta.io/2.0.0/delta-change-data-feed.html) (CDF) on the underlying table. CDF can be enable on existing Delta tables by updating the table properties.

``` python
spark.sql(f"""
ALTER TABLE delta.`{cloud_storage_path}`
SET TBLPROPERTIES (delta.enableChangeDataFeed=true)
""")
```

For new tables, the data provider can enable CDF by using the `DeltaTableBuilder` API:

```python
from delta import DeltaTable

# enable CDF for a new Delta table using the DeltaTableBuilder API
DeltaTable.createOrReplace(spark) \
  .addColumn("ID", "INT") \
  .addColumn("crim", "DOUBLE") \
  .addColumn("zn", "DOUBLE") \
  .addColumn("indus", "DOUBLE") \
  .addColumn("chas", "INT") \
  .addColumn("nox", "DOUBLE") \
  .addColumn("rm", "DOUBLE") \
  .addColumn("age", "DOUBLE") \
  .property("delta.enableChangeDataFeed", "true") \
  .location(cloud_storage_path) \
  .execute()
```

Lastly, the data provider updates the `cdfEnabled` attribute to enable sharing CDF for the table in the sharing server config:

```yaml
shares:
- name: "delta_sharing"
  schemas:
    - name: "default"
      tables:
        - name: "boston-housing"
          location: "abfss://datasets@deltasharing.dfs.core.windows.net/boston_housing"
          cdfEnabled: true
```

### Data Recipient: Reading the Change Data Feed
Release 0.5.0 adds two new functions for reading a shared Delta table's Change Data Feed:

1. `load_table_changes_as_pandas()` - loads table changes as a Pandas DataFrame
2. `load_table_changes_as_spark()` - loads table changes as an Apache Spark DataFrame

In [8]:
import pandas as pd

table_url = f'{profile_file_path}#delta_sharing.default.boston-housing'

# Load the table changes as a Pandas DataFrame
table_changes_pdf = delta_sharing.load_table_changes_as_pandas(table_url, starting_version=1, ending_version=15)

# Display the first 5 table changes
table_changes_pdf.head()

Unnamed: 0,ID,crim,zn,indus,chas,nox,rm,age,dis,rad,tax,ptratio,black,lstat,medv,_change_type,_commit_version,_commit_timestamp
0,1,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24.0,update_preimage,2,1664763221000
1,1,0.00651,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,396.9,4.98,24.0,update_postimage,2,1664763221000
2,2,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,21.6,update_preimage,3,1664763223000
3,2,0.02813,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,396.9,9.14,21.6,update_postimage,3,1664763223000
4,4,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,394.63,2.94,33.4,update_preimage,4,1664763225000


## Querying Table Version

New in release 0.5.0, is an additional function `query_table_version()` added to the Python rest client that allows data recipients to query the version of the shared Delta table. This is a great way for data recipients to quickly check what version of the table they are using.

In [10]:
from delta_sharing.rest_client import DataSharingRestClient
from delta_sharing.protocol import DeltaSharingProfile, Schema, Table

# Create a new instance of the Python rest client
profile = DeltaSharingProfile.read_from_file(profile_file_path)
rest_client = DataSharingRestClient(profile)

# Check the table version of the `boston-housing` table
boston_housing_table = Table(name="boston-housing", share="delta_sharing", schema="default")
response = rest_client.query_table_version(boston_housing_table)

# The response is a new `QueryTableVersionResponse` object added to the rest client
print(f'The response is a new new `QueryTableVersionResponse` object: {response}')
print(f'The current table version is: {response.delta_table_version}')

The response is a new new `QueryTableVersionResponse` object: QueryTableVersionResponse(delta_table_version=36)
The current table version is: 36


## Improved Error Messages in the Python Rest Client

Also new in this release, is an enhancement to the Python rest client to include the response body of the sharing server in the error message. Often times the response body is important in conveying a detailed error message. By including the response body from the server, data recepients can quickly determine problems that arise at the sharing server processing.

In [13]:
# Add an invalid endpoint and API token in the request
invalid_profile_dict = {
 "shareCredentialsVersion": 1,
 "endpoint": "https://sharing.delta.io/invalid_uri/",
 "bearerToken": "bad_token"
}
invalid_profile = DeltaSharingProfile.from_json(json.dumps(invalid_profile_dict))
rest_client_invalid_profile = DataSharingRestClient(invalid_profile)
try:
    rest_client_invalid_profile.list_shares()
except Exception as e:
    print(e)

404 Client Error: Not Found for url: https://sharing.delta.io/invalid_uri/shares


## Improved Error Messages from the Sharing Server

New in this release are improved error messages from sharing server's `TableManager`. In prior releases, if a Table, Share, or Schema was not located by the sharing server, a less descriptive message was returned, like `schema 'schema2' not found`. In release 0.5.0, this error message has been enhanced to instruct the data recipient that they should reach out to the data provider.

In [1]:
import requests
import json

response = requests.get(
    'https://sharing.delta.io/delta-sharing/shares/delta_share/schemas/default/tables/nyc_housing/metadata',
    headers={
        'Authorization': 'Bearer faaie590d541265bcab1f2de9813274bf233'
    }
)
print(response.status_code)
print(json.dumps(response.json(), indent=3))

404
{
   "errorCode": "RESOURCE_DOES_NOT_EXIST",
   "message": "[Share/Schema/Table] 'delta_share/default/nyc_housing' does not exist, please contact your share provider for further information."
}
