### Install liten
Must install latest tendb before running the commands
Install from released package from pypi
```bash
$ pip install -i https://test.pypi.org/simple/ liten
```
Read local setup.py and install tendb
```bash
pip install /mnt/c/Users/hkver/Documents/dbai/dbaistuff/py/liten
```
Install from local wheel file
```bash
pip install /mnt/c/Users/hkver/Documents/dbai/dbaistuff/py/liten/dist/liten-0.0.1-py3-none-any.whl
```

Import Apache arrow

In [1]:
import pyarrow as pa
from pyarrow import csv

Import Liten-ten is local rten is remote. rten imports pyarrow library as well.

In [2]:
import liten as ten



In [3]:
import liten.rcliten as rten

Import Ray to be used as a cluster

In [4]:
import ray

Start a cluster with single worker.

In [5]:
ray.init(num_cpus=1)

2021-07-26 22:16:05,886	INFO services.py:1171 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m


{'node_ip_address': '172.27.237.243',
 'raylet_ip_address': '172.27.237.243',
 'redis_address': '172.27.237.243:6379',
 'object_store_address': '/tmp/ray/session_2021-07-26_22-16-05_188877_1362/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2021-07-26_22-16-05_188877_1362/sockets/raylet',
 'webui_url': '127.0.0.1:8265',
 'session_dir': '/tmp/ray/session_2021-07-26_22-16-05_188877_1362',
 'metrics_export_port': 40120,
 'node_id': 'eb018d9139b333b05c0312ed33e4fa46cac682b6'}

In [6]:
ray.cluster_resources()

{'object_store_memory': 42.0,
 'memory': 123.0,
 'CPU': 1.0,
 'node:172.27.237.243': 1.0}

Create a Liten Cache Actor. It is residing on a remote node, and being executed on that node. tc is the Liten Cache actor handle.

In [7]:
rten.RCLiten = ray.remote(rten.RCLiten)
tc = rten.RCLiten.remote()

[2m[36m(pid=1458)[0m I20210726 22:16:14.485251  1458 TCache.cpp:24] Created a new TCache


These are fact and dimension tables of TPCH. Read them remotely.

In [8]:
fact_tables = ['lineitem']
dim_tables = ['customer','orders','supplier','nation','region']
tpch_dir = '/mnt/c/Users/hkver/Documents/dbai/tpch-kit/sf1g/'

In [9]:
def read_tables(tables, table_type):
    arrow_tables = []
    for table_name in tables:
        tpch_table = tpch_dir+table_name+'.tbl'
        print('Reading ', tpch_table)
        tc.set_table.remote(table_name, table_type)
        pytable = tc.read_csv.remote(input_file=tpch_table, parse_options=csv_options)
        # print(' Rows=', pytable.num_rows,' Cols=', pytable.num_columns)
        arrow_tables.append(pytable)
    return arrow_tables

In [10]:
%%time
csv_options = pa.csv.ParseOptions(delimiter='|')
pa_fact_tables = read_tables(fact_tables, 1)
pa_dim_tables = read_tables(dim_tables, 0)

Reading  /mnt/c/Users/hkver/Documents/dbai/tpch-kit/sf1g/lineitem.tbl
Reading  /mnt/c/Users/hkver/Documents/dbai/tpch-kit/sf1g/customer.tbl
Reading  /mnt/c/Users/hkver/Documents/dbai/tpch-kit/sf1g/orders.tbl
Reading  /mnt/c/Users/hkver/Documents/dbai/tpch-kit/sf1g/supplier.tbl
Reading  /mnt/c/Users/hkver/Documents/dbai/tpch-kit/sf1g/nation.tbl
Reading  /mnt/c/Users/hkver/Documents/dbai/tpch-kit/sf1g/region.tbl
CPU times: user 11.9 ms, sys: 22.9 ms, total: 34.8 ms
Wall time: 19.5 ms


[2m[36m(pid=1458)[0m I20210726 22:16:26.249943  1458 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=1458)[0m I20210726 22:16:26.249975  1458 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=1458)[0m I20210726 22:16:26.250736  1458 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=1458)[0m I20210726 22:16:26.322634  1458 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=1458)[0m I20210726 22:16:26.322664  1458 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=1458)[0m I20210726 22:16:26.322723  1458 TCatalog.cpp:17] Created a new TCatalog


[2m[36m(pid=1458)[0m Added Table= b'lineitem'
[2m[36m(pid=1458)[0m Added Table= b'customer'


[2m[36m(pid=1458)[0m I20210726 22:16:26.959901  1458 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=1458)[0m I20210726 22:16:26.959933  1458 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=1458)[0m I20210726 22:16:26.960117  1458 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=1458)[0m I20210726 22:16:26.975929  1458 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=1458)[0m I20210726 22:16:26.975958  1458 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=1458)[0m I20210726 22:16:26.975998  1458 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=1458)[0m I20210726 22:16:26.983350  1458 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=1458)[0m I20210726 22:16:26.983371  1458 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=1458)[0m I20210726 22:16:26.983402  1458 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=1458)[0m I20210726 22:16:26.987301  1458 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=1458)[0m I20210726 22:16:

[2m[36m(pid=1458)[0m Added Table= b'orders'
[2m[36m(pid=1458)[0m Added Table= b'supplier'
[2m[36m(pid=1458)[0m Added Table= b'nation'
[2m[36m(pid=1458)[0m Added Table= b'region'


In [11]:
tc.info.remote()

ObjectRef(fafba2bafaed5dc3df5a1a820100000001000000)

[2m[36m(pid=1458)[0m I20210726 22:17:07.673210  1458 TConfigs.cpp:16] Created a new TConfigs
[2m[36m(pid=1458)[0m I20210726 22:17:07.673278  1458 TCatalog.cpp:17] Created a new TCatalog


Read a table into TCache

In [12]:
%%time
result = tc.make_dtensor.remote()

CPU times: user 532 µs, sys: 236 µs, total: 768 µs
Wall time: 417 µs


Read Arrow table

In [13]:
result = tc.query6.remote()

[2m[36m(pid=1458)[0m  TPCH QUERY 6 
[2m[36m(pid=1458)[0m SELECT 
[2m[36m(pid=1458)[0m   SUM(L_EXTENDEDPRICE * L_DISCOUNT) AS REVENUE 
[2m[36m(pid=1458)[0m FROM 
[2m[36m(pid=1458)[0m   LINEITEM
[2m[36m(pid=1458)[0m WHERE
[2m[36m(pid=1458)[0m   L_SHIPDATE >= DATE '1997-01-01'
[2m[36m(pid=1458)[0m   AND L_SHIPDATE < DATE '1997-01-01' + INTERVAL '1' YEAR
[2m[36m(pid=1458)[0m   AND L_DISCOUNT BETWEEN 0.07 - 0.01 AND 0.07 + 0.01
[2m[36m(pid=1458)[0m   AND L_QUANTITY < 25;
[2m[36m(pid=1458)[0m 
[2m[36m(pid=1458)[0m Revenue= 156594095.60960016
[2m[36m(pid=1458)[0m 


In [14]:
result = tc.query5.remote()

[2m[36m(pid=1458)[0m  
[2m[36m(pid=1458)[0m SELECT
[2m[36m(pid=1458)[0m 	N_NAME,
[2m[36m(pid=1458)[0m 	SUM(L_EXTENDEDPRICE * (1 - L_DISCOUNT)) AS REVENUE
[2m[36m(pid=1458)[0m FROM
[2m[36m(pid=1458)[0m 	CUSTOMER,
[2m[36m(pid=1458)[0m 	ORDERS,
[2m[36m(pid=1458)[0m 	LINEITEM,
[2m[36m(pid=1458)[0m 	SUPPLIER,
[2m[36m(pid=1458)[0m 	NATION,
[2m[36m(pid=1458)[0m 	REGION
[2m[36m(pid=1458)[0m WHERE
[2m[36m(pid=1458)[0m 	C_CUSTKEY = O_CUSTKEY
[2m[36m(pid=1458)[0m 	AND L_ORDERKEY = O_ORDERKEY
[2m[36m(pid=1458)[0m 	AND L_SUPPKEY = S_SUPPKEY
[2m[36m(pid=1458)[0m 	AND C_NATIONKEY = S_NATIONKEY
[2m[36m(pid=1458)[0m 	AND S_NATIONKEY = N_NATIONKEY
[2m[36m(pid=1458)[0m 	AND N_REGIONKEY = R_REGIONKEY
[2m[36m(pid=1458)[0m 	AND R_NAME = 'EUROPE'
[2m[36m(pid=1458)[0m 	AND O_ORDERDATE >= DATE '1995-01-01'
[2m[36m(pid=1458)[0m 	AND O_ORDERDATE < DATE '1995-01-01' + INTERVAL '1' YEAR
[2m[36m(pid=1458)[0m GROUP BY
[2m[36m(pid=1458)[0m 	N_NAME


This will kill remote Liten Cache.

In [15]:
ray.kill(tc)

Shut down ray now

In [16]:
ray.shutdown()