### Introducing Kotosiro Sharing Server

I am excited to announce the release of [Kotosiro Sharing](https://github.com/kotosiro/sharing), a minimalistic Rust implementation of the Delta Sharing server aimed at helping engineers easily host their own Delta Sharing service. In this article, I will provide instructions on how to share your data with cohorts who may have varying technical backgrounds, ranging from data engineers to business intelligence analysts, using self-hosted [Kotosiro Sharing](https://github.com/kotosiro/sharing) server. The instructions are fairly easy and straightforward, and you can easily share your data with cohorts who have different levels of technical expertise. The implementation is currently in the beta phase, and hence, it does not provide a GUI yet. However, this feature will be added in the near future.

### Delta Table Structure

You have [historical data on avocado prices and sales volume in multiple US markets](https://www.kaggle.com/datasets/neuromusic/avocado-prices) stored in your Delta table. Your colleague has come to your desk and asked if they could use the data for further data analytics. The structure of the table is as follows:

In [1]:
%%bash

tree -a ../../data/avocado-table

[01;34m../../data/avocado-table[0m
├── [00m.part-00000-04d10a18-acde-4d66-bb3b-39f5d0feb689-c000.snappy.parquet.crc[0m
├── [00m.part-00000-c5135c42-2c15-4da5-8cd6-f0fc527dff9c-c000.snappy.parquet.crc[0m
├── [00m.part-00000-c6c1e092-bef3-41a0-8a05-826a33ecff6f-c000.snappy.parquet.crc[0m
├── [00m.part-00000-d7afaec2-4373-4865-ab48-e9f60495b41e-c000.snappy.parquet.crc[0m
├── [01;34m_delta_log[0m
│   ├── [00m.00000000000000000000.json.crc[0m
│   ├── [00m.00000000000000000001.json.crc[0m
│   ├── [00m.00000000000000000002.json.crc[0m
│   ├── [00m.00000000000000000003.json.crc[0m
│   ├── [00m00000000000000000000.json[0m
│   ├── [00m00000000000000000001.json[0m
│   ├── [00m00000000000000000002.json[0m
│   └── [00m00000000000000000003.json[0m
├── [00mpart-00000-04d10a18-acde-4d66-bb3b-39f5d0feb689-c000.snappy.parquet[0m
├── [00mpart-00000-c5135c42-2c15-4da5-8cd6-f0fc527dff9c-c000.snappy.parquet[0m
├── [00mpart-00000-c6c1e092-bef3-41a0-8a05-826a33ecff6f-c000.snap

The table is partitioned using the `year` column, and each partition is appended sequentially. Therefore, the table has [four different versions](https://docs.databricks.com/delta/history.html) in chronological order.

### Log in to Kotosiro Sharing Server and Get the Admin Access Token

Now let's get started with the interesting part. As the owner of the data and administrator of your [Kotosiro Sharing](https://github.com/kotosiro/sharing) server, you need to log in to the system and obtain the admin access token. This token will enable you to create a [share](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#concepts). Here's how you can obtain the token:

In [2]:
%%bash

curl -s -X POST http://localhost:8080/admin/login \
     -H "Content-Type: application/json" \
     -d '{"account": "kotosiro", "password": "password"}' \
     | jq '.'

{
  "profile": {
    "shareCredentialsVersion": 1,
    "endpoint": "http://127.0.0.1:8080",
    "bearerToken": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJuYW1lIjoia290b3Npcm8iLCJlbWFpbCI6ImtvdG9zaXJvQGVtYWlsLmNvbSIsIm5hbWVzcGFjZSI6ImFkbWluIiwicm9sZSI6ImFkbWluIiwiZXhwIjoxNjgxOTM3NzMyfQ.rVjA6S7EWq7CakpB0IHik0mvxl58ynZNxNM3a3RJibY",
    "expirationTime": "2023-04-19 20:55:32 UTC"
  }
}


### Register a New Share

Next, you need to register a new [share](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#concepts), which is simply a logical grouping used to share with [recipients](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#concepts). For example, you can name your share `share1`. Note that this [share](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#concepts) is currently empty, meaning that you haven't added any data to it yet. Here's how you can create the [share](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#concepts):

In [3]:
%%bash

curl -s -X POST "http://localhost:8080/admin/shares" \
     -H "Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJuYW1lIjoia290b3Npcm8iLCJlbWFpbCI6ImtvdG9zaXJvQGVtYWlsLmNvbSIsIm5hbWVzcGFjZSI6ImFkbWluIiwicm9sZSI6ImFkbWluIiwiZXhwIjoxNjgxOTM3NzMyfQ.rVjA6S7EWq7CakpB0IHik0mvxl58ynZNxNM3a3RJibY" \
     -H "Content-Type: application/json" \
     -d'{ "name": "share1" }' \
     | jq '.'

{
  "share": {
    "id": "78f84b5e-29e7-4adf-8df5-c40487a8da43",
    "name": "share1"
  }
}


### Register a New Table

So far, so good. Now it's time to register the Delta table on [AWS S3](https://aws.amazon.com/s3/) to your [Kotosiro Sharing](https://github.com/kotosiro/sharing) service via the API. It's fairly simple like other operations. Just post a JSON data that specifies the S3 bucket object path to the Delta table, along with the table name. For example, you can name your table `table1`. Here's how you can register the [table](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#concepts):

In [4]:
%%bash

curl -s -X POST "http://localhost:8080/admin/tables" \
     -H "Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJuYW1lIjoia290b3Npcm8iLCJlbWFpbCI6ImtvdG9zaXJvQGVtYWlsLmNvbSIsIm5hbWVzcGFjZSI6ImFkbWluIiwicm9sZSI6ImFkbWluIiwiZXhwIjoxNjgxOTM3NzMyfQ.rVjA6S7EWq7CakpB0IHik0mvxl58ynZNxNM3a3RJibY" \
     -H "Content-Type: application/json" \
     -d'{ "name": "table1", "location": "s3://kotosiro-sharing-example/avocado" }' \
     | jq '.'

{
  "table": {
    "id": "8a040c74-4505-44e5-aeda-9db662f338eb",
    "name": "table1",
    "location": "s3://kotosiro-sharing-example/avocado"
  }
}


### Register a New Table as a Part of `schema1` in the `share1`

You have created a new [share](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#concepts) and registered a new [table](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#concepts). Now, you need to associate the [table](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#concepts) with the [share](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#concepts) by creating a [schema](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#concepts). To do this, you can register the [table](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#concepts) as part of, for example, the `schema1` in `share1`. The API operation to register the table to the share is fairly straightforward. Here's an example of how to do it:

In [5]:
%%bash

curl -s -X POST "http://localhost:8080/admin/shares/share1/schemas/schema1/tables" \
     -H "Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJuYW1lIjoia290b3Npcm8iLCJlbWFpbCI6ImtvdG9zaXJvQGVtYWlsLmNvbSIsIm5hbWVzcGFjZSI6ImFkbWluIiwicm9sZSI6ImFkbWluIiwiZXhwIjoxNjgxOTM3NzMyfQ.rVjA6S7EWq7CakpB0IHik0mvxl58ynZNxNM3a3RJibY" \
     -H "Content-Type: application/json" \
     -d'{ "table": "table1" }' \
     | jq '.'

{
  "schema": {
    "id": "62bf785c-1764-4953-9986-a6708996e72c",
    "name": "schema1"
  }
}


### Issue a New Recipient Profile

This is the final and most important step in sharing your Delta table with your cohorts. You need to issue a new recipient [profile](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#profile-file-format), which contains the necessary credentials for your cohorts to access the shared data. The resulting [profile](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#profile-file-format) JSON is a credential, so you must share it securely with your cohorts. As an administrator, you are responsible for ensuring that the [profile](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#profile-file-format) is shared only with authorized recipients. Here's how you can issue the [profile](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#profile-file-format):

In [6]:
%%bash

curl -s -X GET "http://localhost:8080/admin/profile" \
     -H "Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJuYW1lIjoia290b3Npcm8iLCJlbWFpbCI6ImtvdG9zaXJvQGVtYWlsLmNvbSIsIm5hbWVzcGFjZSI6ImFkbWluIiwicm9sZSI6ImFkbWluIiwiZXhwIjoxNjgxOTM3NzMyfQ.rVjA6S7EWq7CakpB0IHik0mvxl58ynZNxNM3a3RJibY" \
     -H "Content-Type: application/json" \
     | jq '.'

{
  "profile": {
    "shareCredentialsVersion": 1,
    "endpoint": "http://127.0.0.1:8080",
    "bearerToken": "eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJuYW1lIjoia290b3Npcm8iLCJlbWFpbCI6ImtvdG9zaXJvQGVtYWlsLmNvbSIsIm5hbWVzcGFjZSI6ImFkbWluIiwicm9sZSI6Imd1ZXN0IiwiZXhwIjoxNjgxOTM3ODA1fQ.Pwqa5ylTDnjyivNsyNTi0QNR1oKuHJhCPPxWiznomRE",
    "expirationTime": "2023-04-19 20:56:45 UTC"
  }
}


### Create Sharing Client

From now on, you are the [recipient](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#concepts) of the shared Delta table. To open the shared Delta table as a [pandas](https://pandas.pydata.org/) dataframe, you, as the [recipient](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#concepts) of the shared Delta table, need to first install the [delta-sharing](https://pypi.org/project/delta-sharing/) package. After installing the package, you can create a `delta_sharing.SharingClient` object using the shared [profile](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#profile-file-format). This will allow you to access the shared Delta table.

In [7]:
import delta_sharing

profile = "../../creds/profile.json"
client = delta_sharing.SharingClient(profile)

### List Tables

Let us verify that we can access the shared [table](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#concepts) properly. The following script retrieves a list of all tables shared by the [share](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#concepts) provided by your cohort:

In [8]:
client.list_all_tables()

[Table(name='table1', share='share1', schema='schema1')]

### Load Tables

Now it's time to access the shared data. The operation is incredibly simple: there's no need to prepare troublesome cloud service credentials, and you don't have to worry about what platform your cohort is using. All you have to do is specify the path to the [table](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#concepts). A [table](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#concepts) path consists of the [profile](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#profile-file-format) file path followed by `#` and the fully qualified name of a [table](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#concepts): `<share-name>.<schema-name>.<table-name>`.

In [9]:
url = profile + "#share1.schema1.table1"
delta_sharing.load_as_pandas(url, limit=10)

Unnamed: 0,row,date,average_price,total_volume,4046,4225,4770,total_bags,small_bags,large_bags,xlarge_bags,type,year,region
0,0,2015-12-26 15:00:00,1.33,64236.62,1036.74,54454.85,48.16,8696.87,8603.62,93.25,0.0,conventional,2015,Albany
1,1,2015-12-19 15:00:00,1.35,54876.98,674.28,44638.81,58.33,9505.56,9408.07,97.49,0.0,conventional,2015,Albany
2,2,2015-12-12 15:00:00,0.93,118220.22,794.7,109149.67,130.5,8145.35,8042.21,103.14,0.0,conventional,2015,Albany
3,3,2015-12-05 15:00:00,1.08,78992.15,1132.0,71976.41,72.58,5811.16,5677.4,133.76,0.0,conventional,2015,Albany
4,4,2015-11-28 15:00:00,1.28,51039.6,941.48,43838.39,75.78,6183.95,5986.26,197.69,0.0,conventional,2015,Albany
5,5,2015-11-21 15:00:00,1.26,55979.78,1184.27,48067.99,43.61,6683.91,6556.47,127.44,0.0,conventional,2015,Albany
6,6,2015-11-14 15:00:00,0.99,83453.76,1368.92,73672.72,93.26,8318.86,8196.81,122.05,0.0,conventional,2015,Albany
7,7,2015-11-07 15:00:00,0.98,109428.33,703.75,101815.36,80.0,6829.22,6266.85,562.37,0.0,conventional,2015,Albany
8,8,2015-10-31 15:00:00,1.02,99811.42,1022.15,87315.57,85.34,11388.36,11104.53,283.83,0.0,conventional,2015,Albany
9,9,2015-10-24 15:00:00,1.07,74338.76,842.4,64757.44,113.0,8625.92,8061.47,564.45,0.0,conventional,2015,Albany


### SQL Expressions for Filtering

Great! Now you can access the desired data from the data lake. Suppose you are only interested in the data within the date range of `2016-01-01` and `2017-12-31`. In this case, you can send [SQL snippets](https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#sql-expressions-for-filtering) as hints to the sharing server so that it filters out redundant partitions. Here's how you can request the desired partition:

In [24]:
url = profile + "#share1.schema1.table1"
delta_sharing.load_as_pandas(url, predicateHints=['year >= 2016', 'year <= 2017'])

Unnamed: 0,row,date,average_price,total_volume,4046,4225,4770,total_bags,small_bags,large_bags,xlarge_bags,type,year,region
0,0,2016-12-24 15:00:00,1.52,73341.73,3202.39,58280.33,426.92,11432.09,11017.32,411.83,2.94,conventional,2016,Albany
1,1,2016-12-17 15:00:00,1.53,68938.53,3345.36,55949.79,138.72,9504.66,8876.65,587.73,40.28,conventional,2016,Albany
2,2,2016-12-10 15:00:00,1.49,71777.85,2323.39,56545.79,86.65,12822.02,12176.75,645.27,0.00,conventional,2016,Albany
3,3,2016-12-03 15:00:00,1.48,113031.96,6530.78,99746.05,50.84,6704.29,6476.12,228.17,0.00,conventional,2016,Albany
4,4,2016-11-26 15:00:00,1.52,58171.89,2793.99,47106.18,18.14,8253.58,7973.98,279.60,0.00,conventional,2016,Albany
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11333,46,2017-01-28 15:00:00,1.30,17839.37,1486.34,4498.48,26.12,11828.43,11821.76,6.67,0.00,organic,2017,WestTexNewMexico
11334,47,2017-01-21 15:00:00,1.21,16430.64,1413.93,2820.53,20.25,12175.93,12073.07,102.86,0.00,organic,2017,WestTexNewMexico
11335,48,2017-01-14 15:00:00,1.19,17014.23,1203.87,2904.22,23.07,12883.07,12476.57,406.50,0.00,organic,2017,WestTexNewMexico
11336,49,2017-01-07 15:00:00,1.18,14375.39,1327.98,2617.20,5.75,10424.46,10283.85,140.61,0.00,organic,2017,WestTexNewMexico


### JSON predicates for Filtering