# Deployment Insights Trivia

Make sure to update the first cell with appropriate config details.

# Trino Setup
For the SQL queries to exexute in this notebook, you must have a Trino server running, connected to the Aerospike database via the Aerospike Trino Connector. 

Set the following Trino server parameters.

In [149]:
TRINO_IP = "<IP>"
TRINO_PORT = "<port>"
TRINO_USER = "<user>"
TRINO_PASSWORD = "<password>"

## Install Trino Client

In [None]:
%%sh
wget -nc -nv https://repo1.maven.org/maven2/io/trino/trino-cli/379/trino-cli-379-executable.jar 
mv trino-cli-379-executable.jar trino
chmod +x trino

## Trino Command
Define the environment variable for short form of the Trino command. 

You can also run the Trino command line tool in a separate shell tab.

In [150]:
%env TRINO=./trino --server $TRINO_IP:$TRINO_PORT --catalog aerospike --schema test --output-format=TSV_HEADER
%env TRINO_VERTICAL=./trino --server $TRINO_IP:$TRINO_PORT --catalog aerospike --schema test --output-format=VERTICAL

env: TRINO=./trino --server 184.169.149.193:8080 --catalog aerospike --schema test --output-format=TSV_HEADER
env: TRINO_VERTICAL=./trino --server 184.169.149.193:8080 --catalog aerospike --schema test --output-format=VERTICAL


# Trino SQL Queries

## Get the range of case numbers and timeframe in the database.

`
select min(case_num) as from_case, max(case_num) as to_case, 
       substr(min(timestamp),1,10) as from_date, substr(max(timestamp),1,10) as to_date 
from insights
`


In [151]:
!$TRINO --execute "select min(case_num) as from_case, max(case_num) as to_case, substr(min(timestamp),1,10) as from_date, substr(max(timestamp),1,10) as to_date from insights";


from_case	to_case	from_date	to_date
20564	26997	2019-04-08	2022-05-06


## Show the top 10 customers that have the largest clusters by the number of nodes.

`
select customer, max(cluster_size) as largest_cluster 
from insights 
group by customer 
order by largest_cluster desc 
limit 10
`


In [136]:
!$TRINO --execute "select crc32(cast(customer as varbinary)) as anonymized, max(cluster_size) as largest_cluster from insights group by customer order by largest_cluster desc limit 10" ;


anonymized	largest_cluster
1111028921	236
1137622643	154
4096821281	128
3112961089	110
1090175353	105
3871927074	78
427820430	76
1710688683	75
1816270098	63
4004477108	54


## Show the top 10 customers that have the largest clusters by the number of records.

`
select customer, round(max(total_objects)/pow(10,9),1) as billion_records 
from insights  
group by customer  
order by billion_records desc 
limit 10";
`

In [137]:
!$TRINO --execute "select crc32(cast(customer as varbinary)) as anonymized, round(max(total_objects)/pow(10,9),1) as billion_records from insights  group by customer  order by billion_records desc limit 10";


anonymized	billion_records
1816270098	1123.5
3112961089	759.6
2658734021	600.6
1090175353	352.7
4096821281	273.7
1137622643	234.6
1347477773	211.5
1322757265	208.5
2789728449	197.2
877719067	128.3


## Show the top features in use by the number of customers using them.
`
select feature, count(*) as num_customers 
from (
    select distinct customer, feature 
    from insights 
        cross join 
            unnest(cast(features as array(VARCHAR))) as t(feature)) 
group by feature 
order by num_customers desc;
`


In [138]:
!$TRINO --execute "select rpad(feature, 15, ' ') as feature, count(*) as num_customers from (select distinct customer, feature from insights cross join unnest(cast(features as array(VARCHAR))) as t(feature)) group by feature order by num_customers desc";


feature	num_customers
kvs            	174
scan           	153
batch          	124
query          	96
xdr_src        	83
xdr_dest       	79
sindex         	76
security       	64
rack_aware     	61
udf            	44
tls_service    	38
err_format     	30
tls_fabric     	30
sc             	23
index_on_device	16
aggregation    	11
index_on_pmem  	7


## Show the top dot releases by the number of customers using them.

`
select substr(a.server_release,1,3) as dot_release, count(a.customer) as num_customers from insights a, (
    select customer, cluster_name, max(timestamp) as latest 
    from insights 
    group by customer, cluster_name) b 
where a.customer = b.customer and 
    a.cluster_name = b.cluster_name and 
    a.timestamp = b.latest 
group by substr(a.server_release,1,3) 
order by num_customers desc
`


In [157]:
!$TRINO --execute "select substr(a.server_release,1,3) as dot_release, count(a.customer) as num_customers from insights a, (select customer, cluster_name, max(timestamp) as latest from insights group by customer, cluster_name) b where a.customer = b.customer and a.cluster_name = b.cluster_name and a.timestamp = b.latest group by substr(a.server_release,1,3) order by num_customers desc";


dot_release	num_customers
4.5	188
4.9	125
5.5	105
4.8	98
5.6	96
5.7	66
5.2	59
5.0	53
5.1	52
4.7	48
3.1	46
5.3	28
4.0	23
4.3	20
4.4	14
4.2	12
5.4	8
4.1	6
4.6	4
