<img src="https://store-images.s-microsoft.com/image/apps.22094.728e1f25-a784-458f-90e1-7729049edba2.144bf785-b784-41dd-bcef-c91792108c09.f0be1bc2-af8f-49fc-ac4c-dfd9d53d9e8d" alt="lakeFS logo" width=130/> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <img src="https://trino.io/assets/images/trino-logo/trino-ko_tiny-alt.svg" alt="Trino logo" width=100/>  

## lakeFS ❤️ Trino - an example using TPCH dataset

First let's install trino and sqlalchemy-trino

In [9]:
pip install trino sqlalchemy-trino --quiet

Note: you may need to restart the kernel to use updated packages.


In [10]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [11]:
%sql trino://user@trino-1:8080/minio

First create a schema under the Trino Hive catalog called `minio` that was pointing to minio but is now wrapped by LakeFS to add the git like layer around the file storage.

In [15]:
%%sql
CREATE SCHEMA minio.tpch_tiny
WITH (location = 's3a://demo/main/tpch_tiny')

Now, create two tables, `customer` and `orders` by setting `external_location` using the same namespace used in the schema and adding the table name. The data will pull from the tiny TPCH data set.

In [18]:
%%sql
CREATE TABLE minio.tpch_tiny.customer
WITH (
  format = 'PARQUET',
  external_location = 's3a://demo/main/tpch_tiny/customer/'
) 
AS SELECT * FROM tpch.tiny.customer

rows
1500


In [19]:
%%sql
CREATE TABLE minio.tpch_tiny.orders
WITH (
  format = 'PARQUET',
  external_location = 's3a://demo/main/tpch_tiny/orders/'
) 
AS SELECT * FROM tpch.tiny.orders

rows
15000


List the tables in the schema `tpch_tiny`

In [20]:
%sqlcmd tables --schema tpch_tiny

Name
orders
customer


Verify that you can see the table directories in LakeFS once they exist. http://localhost:28220/repositories/demo/objects?ref=main&path=tpch_tiny%2F

In [23]:
%%sql
SELECT ORDERKEY, ORDERDATE, SHIPPRIORITY
FROM minio.tpch_tiny.customer c, minio.tpch_tiny.orders o
WHERE MKTSEGMENT = 'BUILDING' AND c.CUSTKEY = o.CUSTKEY AND
ORDERDATE < date'1995-03-15'
GROUP BY ORDERKEY, ORDERDATE, SHIPPRIORITY
ORDER BY ORDERDATE

ORDERKEY,ORDERDATE,SHIPPRIORITY
27137,1992-01-01,0
5607,1992-01-01,0
46085,1992-01-03,0
24167,1992-01-03,0
9379,1992-01-04,0
56033,1992-01-04,0
34145,1992-01-04,0
44646,1992-01-05,0
39619,1992-01-05,0
16036,1992-01-06,0


Open the LakeFS UI again and click on the **Unversioned Changes** tab. Click **Commit Changes**. Type a commit message on the popup and click **Commit Changes**.

Once the changes are commited on branch main, click on the Branches tab. Click **Create Branch**. Name a new branch `sandbox` that branches off of the main branch. Now click **Create**.

Although there is a branch that exists called `sandbox`, this only exists logically and we need to make Trino aware by adding another schema and tables that point to the new branch. Do this by making a new schema called `tpch_tiny_sandbox`
 and changing the location property to point to the `sandbox` branch instead of the `main`branch.

In [26]:
%%sql
CREATE SCHEMA minio.tpch_tiny_sandbox
WITH (location = 's3a://demo/sandbox/tpch_tiny')

In [28]:
%%sql
CREATE TABLE minio.tpch_tiny_sandbox.customer (
   custkey bigint,
   name varchar(25),
   address varchar(40),
   nationkey bigint,
   phone varchar(15),
   acctbal double,
   mktsegment varchar(10),
   comment varchar(117)
)
WITH (
   external_location = 's3a://demo/sandbox/tpch_tiny/customer',
   format = 'PARQUET'
)

In [29]:
%%sql

CREATE TABLE minio.tpch_tiny_sandbox.orders (
   orderkey bigint,
   custkey bigint,
   orderstatus varchar(1),
   totalprice double,
   orderdate date,
   orderpriority varchar(15),
   clerk varchar(15),
   shippriority integer,
   comment varchar(79)
)
WITH (
   external_location = 's3a://demo/sandbox/tpch_tiny/orders',
   format = 'PARQUET'
)

Once these table definitions exist, go ahead and run the same query as before, but update using the `tpch_tiny_sandbox` schema instead of the `tpch_tiny`
 schema.

In [31]:
%%sql
SELECT ORDERKEY, ORDERDATE, SHIPPRIORITY
FROM minio.tpch_tiny_sandbox.customer c, minio.tpch_tiny_sandbox.orders o
WHERE MKTSEGMENT = 'BUILDING' AND c.CUSTKEY = o.CUSTKEY AND
ORDERDATE < date'1995-03-15'
ORDER BY ORDERDATE

ORDERKEY,ORDERDATE,SHIPPRIORITY
27137,1992-01-01,0
5607,1992-01-01,0
46085,1992-01-03,0
24167,1992-01-03,0
56033,1992-01-04,0
34145,1992-01-04,0
9379,1992-01-04,0
39619,1992-01-05,0
44646,1992-01-05,0
16036,1992-01-06,0


One last bit of functionality we want to test is the merging capabilities. To do this, create a table called `lineitem` in the `sandbox` branch using a CTAS statement.

In [35]:
%%sql
CREATE TABLE minio.tpch_tiny_sandbox.lineitem
WITH (
  format = 'PARQUET',
  external_location = 's3a://demo/sandbox/tpch_tiny/lineitem/'
) 
AS SELECT * FROM tpch.tiny.lineitem

rows
60175


Verify that you can see three table directories in LakeFS including lineitem in the **sandbox** branch. 
http://localhost:28220/repositories/demo/objects?ref=sandbox&path=tpch_tiny%2F

Verify that you do not see lineitem in the table directories in LakeFS in the **main** branch. 
http://localhost:28220/repositories/demo/objects?ref=main&path=tpch_tiny%2F

In [None]:
To merge the new table lineitem to show up in the main branch, first commit the new change to sandbox by again going to Unversioned Changes tab. Click Commit Changes. Type a commit message on the popup and click Commit Changes.

Once the lineitem add is committed, click on the Compare tab. Set the base branch to main and the compared to branch to sandbox. You should see the addition of a line item show up in the diff view. Click Merge and click Yes.