# Deployment Insights Trivia

Make sure to update the first cell with appropriate config details.

# Trino Setup
For the SQL queries to exexute in this notebook, you must have a Trino server running, connected to the Aerospike database via the Aerospike Trino Connector. 

Set the following Trino server parameters.

In [None]:
TRINO_IP = "<IP>"
TRINO_PORT = "<port>"
TRINO_USER = "<user>"
TRINO_PASSWORD = "<password>"

## Install Trino Client

In [None]:
%%sh
wget -nc -nv https://repo1.maven.org/maven2/io/trino/trino-cli/379/trino-cli-379-executable.jar 
mv trino-cli-379-executable.jar trino
chmod +x trino

## Trino Command
Define the environment variable for short form of the Trino command. 

You can also run the Trino command line tool in a separate shell tab.

In [None]:
%env TRINO=./trino --server $TRINO_IP:$TRINO_PORT --catalog aerospike --schema test --output-format=TSV_HEADER
%env TRINO_VERTICAL=./trino --server $TRINO_IP:$TRINO_PORT --catalog aerospike --schema test --output-format=VERTICAL

# Trino SQL Queries

## Get the range of case numbers and timeframe in the database.

`
select min(case_num) as from_case, max(case_num) as to_case, 
       substr(min(timestamp),1,10) as from_date, substr(max(timestamp),1,10) as to_date 
from insights
`


In [None]:
!$TRINO --execute "select min(case_num) as from_case, max(case_num) as to_case, substr(min(timestamp),1,10) as from_date, substr(max(timestamp),1,10) as to_date from insights";


## Show the top 10 customers that have the largest clusters by the number of nodes.

`
select customer, max(cluster_size) as largest_cluster 
from insights 
group by customer 
order by largest_cluster desc 
limit 10
`


In [None]:
!$TRINO --execute "select crc32(cast(customer as varbinary)) as anonymized, max(cluster_size) as largest_cluster from insights group by customer order by largest_cluster desc limit 10" ;


## Show the top 10 customers that have the largest clusters by the number of records.

`
select customer, round(max(total_objects)/pow(10,9),1) as billion_records 
from insights  
group by customer  
order by billion_records desc 
limit 10";
`

In [None]:
!$TRINO --execute "select crc32(cast(customer as varbinary)) as anonymized, round(max(total_objects)/pow(10,9),1) as billion_records from insights  group by customer  order by billion_records desc limit 10";


## Show the top features in use by the number of customers using them.
`
select feature, count(*) as num_customers 
from (
    select distinct customer, feature 
    from insights 
        cross join 
            unnest(cast(features as array(VARCHAR))) as t(feature)) 
group by feature 
order by num_customers desc;
`


In [None]:
!$TRINO --execute "select rpad(feature, 15, ' ') as feature, count(*) as num_customers from (select distinct customer, feature from insights cross join unnest(cast(features as array(VARCHAR))) as t(feature)) group by feature order by num_customers desc";


## Show the top dot releases by the number of customers using them.

`
select substr(a.server_release,1,3) as dot_release, count(a.customer) as num_customers from insights a, (
    select customer, cluster_name, max(timestamp) as latest 
    from insights 
    group by customer, cluster_name) b 
where a.customer = b.customer and 
    a.cluster_name = b.cluster_name and 
    a.timestamp = b.latest 
group by substr(a.server_release,1,3) 
order by num_customers desc
`


In [None]:
!$TRINO --execute "select substr(a.server_release,1,3) as dot_release, count(a.customer) as num_customers from insights a, (select customer, cluster_name, max(timestamp) as latest from insights group by customer, cluster_name) b where a.customer = b.customer and a.cluster_name = b.cluster_name and a.timestamp = b.latest group by substr(a.server_release,1,3) order by num_customers desc";
