### Install liten
Must install latest tendb before running the commands
Install from released package from pypi
```bash
$ pip install -i https://test.pypi.org/simple/ liten
```
Read local setup.py and install tendb
```bash
pip install /mnt/c/Users/hkver/Documents/dbai/dbaistuff/py/liten
```
Install from local wheel file
```bash
pip install /mnt/c/Users/hkver/Documents/dbai/dbaistuff/py/liten/dist/liten-0.0.1-py3-none-any.whl
```

Import Apache arrow

In [1]:
import pyarrow as pa
from pyarrow import csv
import json
import pandas as pd

Import Liten-ten is local rten is remote. rten imports pyarrow library as well.

In [2]:
import liten as ten



Import Ray to be used as a cluster

In [3]:
import ray

Start a cluster with single worker.

In [4]:
ray.init(num_cpus=1)

2021-08-31 22:05:43,238	INFO services.py:1171 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m


{'node_ip_address': '172.31.213.225',
 'raylet_ip_address': '172.31.213.225',
 'redis_address': '172.31.213.225:6379',
 'object_store_address': '/tmp/ray/session_2021-08-31_22-05-42_583054_6958/sockets/plasma_store',
 'raylet_socket_name': '/tmp/ray/session_2021-08-31_22-05-42_583054_6958/sockets/raylet',
 'webui_url': '127.0.0.1:8265',
 'session_dir': '/tmp/ray/session_2021-08-31_22-05-42_583054_6958',
 'metrics_export_port': 60595,
 'node_id': 'fdf24a27f787bf84dbf16ffebf51dfc884039837'}

In [5]:
ray.cluster_resources()

{'node:172.31.213.225': 1.0,
 'object_store_memory': 28.0,
 'CPU': 1.0,
 'memory': 82.0}

Create a Liten Cache Actor. It is residing on a remote node, and being executed on that node. tc is the Liten Cache actor handle.

In [6]:
ten.Cache = ray.remote(ten.Cache)
tc = ten.Cache.remote()

[2m[36m(pid=7048)[0m I20210831 22:05:49.149718  7048 TCache.cpp:25] Created a new TCache


These are fact and dimension tables of TPCH. Read them remotely.

In [7]:
fact_tables = ['lineitem']
dim_tables = ['customer','orders','supplier','nation','region','partsupp','part']
tpch_dir = '/mnt/c/Users/hkver/Documents/dbai/tpch-kit/sf1g/'

In [8]:
def read_tables(tables, table_type):
    tc_tables = []
    for table_name in tables:
        tpch_table = tpch_dir+table_name+'.tbl'
        print('Reading ', tpch_table)
        csv_options = pa.csv.ParseOptions(delimiter='|')
        table = tc.read_csv.remote(input_file=tpch_table, parse_options=csv_options, table_name=table_name, ttype=table_type)
        # print(' Rows=', pytable.num_rows,' Cols=', pytable.num_columns)
        tc_tables.append(table)
    return tc_tables

In [9]:
%%time
fact_tables = read_tables(fact_tables, ten.Cache.FactTable)

Reading  /mnt/c/Users/hkver/Documents/dbai/tpch-kit/sf1g/lineitem.tbl
CPU times: user 5.11 ms, sys: 2.65 ms, total: 7.76 ms
Wall time: 4.84 ms


[2m[36m(pid=7048)[0m I20210831 22:06:05.060601  7048 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=7048)[0m I20210831 22:06:05.060652  7048 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=7048)[0m I20210831 22:06:05.060664  7048 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=7048)[0m I20210831 22:06:05.062120  7048 TCatalog.cpp:17] Created a new TCatalog


In [10]:
dim_tables = read_tables(dim_tables, ten.Cache.DimensionTable)

Reading  /mnt/c/Users/hkver/Documents/dbai/tpch-kit/sf1g/customer.tbl
Reading  /mnt/c/Users/hkver/Documents/dbai/tpch-kit/sf1g/orders.tbl
Reading  /mnt/c/Users/hkver/Documents/dbai/tpch-kit/sf1g/supplier.tbl
Reading  /mnt/c/Users/hkver/Documents/dbai/tpch-kit/sf1g/nation.tbl
Reading  /mnt/c/Users/hkver/Documents/dbai/tpch-kit/sf1g/region.tbl
Reading  /mnt/c/Users/hkver/Documents/dbai/tpch-kit/sf1g/partsupp.tbl
Reading  /mnt/c/Users/hkver/Documents/dbai/tpch-kit/sf1g/part.tbl


[2m[36m(pid=7048)[0m I20210831 22:06:08.055871  7048 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=7048)[0m I20210831 22:06:08.055918  7048 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=7048)[0m I20210831 22:06:08.055928  7048 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=7048)[0m I20210831 22:06:08.056032  7048 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=7048)[0m I20210831 22:06:09.432011  7048 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=7048)[0m I20210831 22:06:09.432070  7048 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=7048)[0m I20210831 22:06:09.432085  7048 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=7048)[0m I20210831 22:06:09.432575  7048 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=7048)[0m I20210831 22:06:09.456665  7048 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=7048)[0m I20210831 22:06:09.456727  7048 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=7048)[0m I20210831 22:06:

In [11]:
tc.info.remote()

ObjectRef(3106d80c4e3c2369df5a1a820100000001000000)

[2m[36m(pid=7048)[0m I20210831 22:06:14.240711  7048 TConfigs.cpp:16] Created a new TConfigs
[2m[36m(pid=7048)[0m I20210831 22:06:14.240798  7048 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=7048)[0m I20210831 22:06:14.240859  7048 TCatalog.cpp:17] Created a new TCatalog


Read a table into TCache

In [12]:
%%time
result = tc.make_tensor.remote()

CPU times: user 1.48 ms, sys: 669 µs, total: 2.15 ms
Wall time: 1.04 ms


[2m[36m(pid=7048)[0m I20210831 22:06:16.575196  7048 TCatalog.cpp:17] Created a new TCatalog


Read Arrow table

In [13]:
result = tc.query6.remote()

[2m[36m(pid=7048)[0m I20210831 22:07:06.244240  7048 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=7048)[0m I20210831 22:07:06.244343  7048 TpchDemo.cpp:98] Found table lineitem in cache
[2m[36m(pid=7048)[0m I20210831 22:07:06.244443  7048 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=7048)[0m I20210831 22:07:06.244508  7048 TpchDemo.cpp:98] Found table customer in cache
[2m[36m(pid=7048)[0m I20210831 22:07:06.244535  7048 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=7048)[0m I20210831 22:07:06.244565  7048 TpchDemo.cpp:98] Found table orders in cache
[2m[36m(pid=7048)[0m I20210831 22:07:06.244601  7048 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=7048)[0m I20210831 22:07:06.244644  7048 TpchDemo.cpp:98] Found table supplier in cache
[2m[36m(pid=7048)[0m I20210831 22:07:06.244683  7048 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=7048)[0m I20210831 22:07:06.244729  7048 TpchDemo.cpp:98] Found table nation in cache
[2m[36m

[2m[36m(pid=7048)[0m  TPCH QUERY 6 
[2m[36m(pid=7048)[0m SELECT 
[2m[36m(pid=7048)[0m   SUM(L_EXTENDEDPRICE * L_DISCOUNT) AS REVENUE 
[2m[36m(pid=7048)[0m FROM 
[2m[36m(pid=7048)[0m   LINEITEM
[2m[36m(pid=7048)[0m WHERE
[2m[36m(pid=7048)[0m   L_SHIPDATE >= DATE '1997-01-01'
[2m[36m(pid=7048)[0m   AND L_SHIPDATE < DATE '1997-01-01' + INTERVAL '1' YEAR
[2m[36m(pid=7048)[0m   AND L_DISCOUNT BETWEEN 0.07 - 0.01 AND 0.07 + 0.01
[2m[36m(pid=7048)[0m   AND L_QUANTITY < 25;
[2m[36m(pid=7048)[0m 


[2m[36m(pid=7048)[0m I20210831 22:07:06.717332  7048 TpchDemo.cpp:240] Completed Query6 Revenue=1.56594e+08


[2m[36m(pid=7048)[0m Revenue= 156594095.60960016
[2m[36m(pid=7048)[0m 


In [14]:
result = tc.query5.remote()

[2m[36m(pid=7048)[0m I20210831 22:07:12.807348  7048 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=7048)[0m I20210831 22:07:12.807417  7048 TpchDemo.cpp:98] Found table lineitem in cache
[2m[36m(pid=7048)[0m I20210831 22:07:12.807448  7048 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=7048)[0m I20210831 22:07:12.807463  7048 TpchDemo.cpp:98] Found table customer in cache
[2m[36m(pid=7048)[0m I20210831 22:07:12.807474  7048 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=7048)[0m I20210831 22:07:12.807487  7048 TpchDemo.cpp:98] Found table orders in cache
[2m[36m(pid=7048)[0m I20210831 22:07:12.807497  7048 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=7048)[0m I20210831 22:07:12.807507  7048 TpchDemo.cpp:98] Found table supplier in cache
[2m[36m(pid=7048)[0m I20210831 22:07:12.807516  7048 TCatalog.cpp:17] Created a new TCatalog
[2m[36m(pid=7048)[0m I20210831 22:07:12.807528  7048 TpchDemo.cpp:98] Found table nation in cache
[2m[36m

[2m[36m(pid=7048)[0m  
[2m[36m(pid=7048)[0m SELECT
[2m[36m(pid=7048)[0m 	N_NAME,
[2m[36m(pid=7048)[0m 	SUM(L_EXTENDEDPRICE * (1 - L_DISCOUNT)) AS REVENUE
[2m[36m(pid=7048)[0m FROM
[2m[36m(pid=7048)[0m 	CUSTOMER,
[2m[36m(pid=7048)[0m 	ORDERS,
[2m[36m(pid=7048)[0m 	LINEITEM,
[2m[36m(pid=7048)[0m 	SUPPLIER,
[2m[36m(pid=7048)[0m 	NATION,
[2m[36m(pid=7048)[0m 	REGION
[2m[36m(pid=7048)[0m WHERE
[2m[36m(pid=7048)[0m 	C_CUSTKEY = O_CUSTKEY
[2m[36m(pid=7048)[0m 	AND L_ORDERKEY = O_ORDERKEY
[2m[36m(pid=7048)[0m 	AND L_SUPPKEY = S_SUPPKEY
[2m[36m(pid=7048)[0m 	AND C_NATIONKEY = S_NATIONKEY
[2m[36m(pid=7048)[0m 	AND S_NATIONKEY = N_NATIONKEY
[2m[36m(pid=7048)[0m 	AND N_REGIONKEY = R_REGIONKEY
[2m[36m(pid=7048)[0m 	AND R_NAME = 'EUROPE'
[2m[36m(pid=7048)[0m 	AND O_ORDERDATE >= DATE '1995-01-01'
[2m[36m(pid=7048)[0m 	AND O_ORDERDATE < DATE '1995-01-01' + INTERVAL '1' YEAR
[2m[36m(pid=7048)[0m GROUP BY
[2m[36m(pid=7048)[0m 	N_NAME


[2m[36m(pid=7048)[0m I20210831 22:07:22.078845  7048 TpchDemo.cpp:588] Query 5 Elapsed ms=9270


[2m[36m(pid=7048)[0m b'RUSSIA' = 32382.172400000003
[2m[36m(pid=7048)[0m b'FRANCE' = 45906.1421
[2m[36m(pid=7048)[0m b'GERMANY' = 101655.74960000001
[2m[36m(pid=7048)[0m 


This will kill remote Liten Cache.

In [15]:
ray.kill(tc)

Shut down ray now

In [16]:
ray.shutdown()