## Module 6. Query Acceleration Service for Performance Tuning

Module 6 is to explore how SOS can improve a query’s performance. 

### 6.1 Identify QAS Candidates & Data Preparation

In this step, we will explore when Query Acceleration Servie can improve query performance.

First, let's identify eligible candidates of QAS. You can run the following query from account_usage views:

```sql
SELECT 
    query_id, 
    eligible_query_acceleration_time
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_ACCELERATION_ELIGIBLE
WHERE
    start_time > DATEADD('day', -7, CURRENT_TIMESTAMP())
ORDER BY eligible_query_acceleration_time DESC;
```

However there might be up to 24 hours latency for account_usage view QUERY_ACCELERATION_ELIGIBLE, which is not practical for our 90-Mins HOL. So we have developed an SP using the system function SYSTEM\$ESTIMATE_QUERY_ACCELERATION from information schema since SYSTEM\$ESTIMATE_QUERY_ACCELERATION only supports a single query ID. As you may know, information schema is available immediately with real-time or near real-time availability. 

Below is the SP's definition.

In [None]:
alter session set use_cached_result = false; 
use schema sql_perf_optimization.public;
use warehouse WH_SUMMIT25_PERF_OPS; -- for operations & analysis

CREATE OR REPLACE  PROCEDURE qas_eligibilty (WH_NAME VARCHAR)
RETURNS TEXT
LANGUAGE JAVASCRIPT
EXECUTE AS CALLER
AS 
$$    
    // Get all query IDs using CTE    
    var query_ids_sql = `
        SELECT
            DISTINCT query_id, warehouse_name       
        FROM TABLE(
            INFORMATION_SCHEMA.QUERY_HISTORY_BY_WAREHOUSE(
                WAREHOUSE_NAME => '${WH_NAME}', 
                RESULT_LIMIT => 10000
            )
        )
        WHERE query_type = 'SELECT'
        AND query_text ilike '%BASE WORKLOAD QUERY%'
    `;
    var query_ids_stmt = snowflake.createStatement({        
        sqlText: query_ids_sql,
        binds: [ WH_NAME ]
    });        
    var query_ids_result = query_ids_stmt.execute();    

    var eligible_queries = {};
    while (query_ids_result.next()) {        
        var current_query_id = query_ids_result.getColumnValueAsString(1);

        var qas_eligible_query = `
            with qas_json as (
                select PARSE_JSON(
                    SYSTEM$ESTIMATE_QUERY_ACCELERATION(
                        '${current_query_id}'
                    )
                ) v
            )
            select 
                v:status::string status,
                v:queryUUID::string query_id
            from qas_json
            where
                status in ('eligible','accelerated')
        `;
        var qas_stmt = snowflake.createStatement({                
            sqlText: qas_eligible_query 
        });                        
        var eligible_result = qas_stmt.execute(); 
    
        if (eligible_result.next()) {
            eligible_queries[current_query_id] = eligible_result.getColumnValueAsString(1);
        }
        
    }

    return JSON.stringify(eligible_queries);
$$;

Now we can just call it.

In [None]:
CALL qas_eligibilty('WH_SUMMIT25_PERF_BASE');

There is no query returned. So no eligible queries from our base reporting workloads. Now if you recall what kinds of queries can benefit from QAS, the number one criteria is that the table needs to be big enough, for example, 100+GB. Since our biggest table Traffic only has ~17GB, we need to increase the size.

To save some time, we have prepared this table `TRAFFIC_LARGE` with >100GB in size for this purpose. This simulates a scenario where QAS is beneficial.

In [None]:
SHOW TABLES LIKE 'TRAFFIC_LARGE';

Identify queries with heavy table scans (e.g., full TableScan) and their output has a large number of rows as well.

In [None]:
-- large tableScan and output is big 
select 
    query_id,
    query_tag,
    operator_type,
    operator_statistics:pruning:partitions_scanned as mp_scanned,
    operator_statistics:pruning:partitions_total as mp_total,
    operator_statistics:output_rows as output_rows
from base_query_stats
where 
    mp_total is not null
    and query_tag ilike '%BASE WORKLOAD QUERY%'
    and operator_type = 'TableScan'
order by output_rows desc, mp_total desc;

Choose the top 3 queries and test them with QAS. We are going to test on Query 08 (from the original Base Queries that we ran in Module2 Part1) since it takes quite sometime to run query 04. Replace table `TRAFFIC` with `TRAFFIC_LARGE` in the query and we need to rerun this query as baseline against `TRAFFIC_LARGE` table. 

Change the query comment reference to “Query 08 - QAS” (verse to the original Query 08) for reporting purposes.

In [None]:
USE WAREHOUSE WH_SUMMIT25_PERF_BASE;

-- Q08 - QAS: Analyzes yearly traffic hits by category 
SELECT   -- BASE WORKLOAD QUERY - 08 - QAS
  DATE_PART (YEAR, t.timestamp) AS Year,
  c.id AS Category_ID,
  c.name AS Category_Name,
  SUM(COUNT(*)) OVER (
    PARTITION BY DATE_PART (YEAR, t.timestamp),
    c.id
  ) AS Total_Hits
FROM
  traffic_large AS t,
  category AS c
WHERE
  t.category_id = c.id
  AND t.timestamp >= DATEADD (YEAR, -5, CURRENT_TIMESTAMP())
  AND MONTH (t.timestamp) = 12
GROUP BY
  DATE_PART (YEAR, t.timestamp),
  c.name,
  c.id
ORDER BY
  1,
  4,
  2
LIMIT
  200;

Now run our SP again, it should report that the one of the queries is now eligible for QAS, which is the query above.

In [None]:
CALL qas_eligibilty('WH_SUMMIT25_PERF_BASE');

Copy the query_id value from the result above and replace in the <query_id> placeholder below.

In [None]:
-- directly run on the query ID, replace <query_ID> with the above query ID. 
-- Find query ID from query history and Replace <query_ID> with the acutal query ID, 
-- such as select SYSTEM$ESTIMATE_QUERY_ACCELERATION('01bc5406-0105-4191-0005-e2d7002672b2');
select SYSTEM$ESTIMATE_QUERY_ACCELERATION('<query_id>'); 

### 6.2 Enable QAS

Enabling QAS in the new warehouse WH_SUMMIT25_PERF_QAS, we can rerun the query in WH_SUMMIT25_PERF_QAS and compare the performance.

In [None]:
-- Enable query acceleration service on warehouse WH_SUMMIT25_PERF_QAS
-- We use WH_SUMMIT25_PERF_OPS (Large warehouse) so that it can be faster
USE WAREHOUSE WH_SUMMIT25_PERF_OPS;

ALTER WAREHOUSE WH_SUMMIT25_PERF_QAS
    SET ENABLE_QUERY_ACCELERATION = true,
        QUERY_ACCELERATION_MAX_SCALE_FACTOR = 64;

Validate that the "ENABLE_QUERY_ACCELERATION" column is "true" using the SHOW command.

In [None]:
SHOW WAREHOUSES LIKE 'WH_SUMMIT25_PERF_QAS';

Screenshot is provided in the quickstart guide under the same section.

### 6.3 Rerun Query and Validate

We need to warm the new warehouse WH_SUMMIT25_PERF_QAS first by running query 01 and 02 similar to base workload on WH_SUMMIT25_PERF_BASE.

Since the QAS is enabled, we can rerun the "Query 08 - QAS" that we mentioned above.

In [None]:
use warehouse WH_SUMMIT25_PERF_QAS;

-- rerun query 01 and 02 to warm the new warehouse WH_SUMMIT25_PERF_QAS 
with age_20_to_30 as ( -- BASE WORKLOAD QUERY - 01
    select distinct uuid
    from user_profile
    where question_id = 3 -- DOB question
        and value::date between dateadd(year, -30, current_date) 
            and dateadd(year, -20, current_date)
),
gender_male as (
    select distinct uuid
    from user_profile
    where question_id = 4 -- Gender question
        and value::string = 'M'
),
income_50K_to_100K as (
    select distinct uuid
    from user_profile
    where question_id = 10 -- Income question
        and value::int between 50000 and 100000
)
select
    c.name,
    url,
    count(1) as visits
from traffic_large t
join category c on (
    c.id = t.category_id
)
join age_20_to_30 a on (
    a.uuid = t.uuid
)
join gender_male g on (
    g.uuid = t.uuid
)
join income_50K_to_100K i on (
    i.uuid = t.uuid
)
where
    t.timestamp between '2025-01-01' and '2025-02-01'
group by all
qualify row_number() over (
    partition by c.name order by visits desc
) <= 100
order by c.name, visits desc;

WITH url_stats AS ( -- BASE WORKLOAD QUERY - 02
  SELECT
    c.name AS category_name,
    t.url,
    COUNT(DISTINCT t.uuid) AS unique_visitors,
    COUNT(*) AS total_visits,
    RANK() OVER (
      PARTITION BY c.name
      ORDER BY
        COUNT(*) DESC
    ) AS rank_in_category
  FROM
    traffic_large AS t
    JOIN category AS c ON t.category_id = c.id
  WHERE
    t.timestamp between '2025-01-01' and '2025-02-01'
  GROUP BY
    c.name,
    t.url
)
SELECT
  category_name,
  url,
  unique_visitors,
  total_visits,
  rank_in_category
FROM
  url_stats
WHERE
  rank_in_category <= 10
ORDER BY
  category_name,
  rank_in_category;
  
--- Q08: Analyzes yearly traffic hits by category 
SELECT   -- BASE WORKLOAD QUERY - 08 - QAS
  DATE_PART (YEAR, t.timestamp) AS Year,
  c.id AS Category_ID,
  c.name AS Category_Name,
  SUM(COUNT(*)) OVER (
    PARTITION BY DATE_PART (YEAR, t.timestamp),
    c.id
  ) AS Total_Hits
FROM
  traffic_large AS t,
  category AS c
WHERE
  t.category_id = c.id
  AND t.timestamp >= DATEADD (YEAR, -5, CURRENT_TIMESTAMP())
  AND MONTH (t.timestamp) = 12
GROUP BY
  DATE_PART (YEAR, t.timestamp),
  c.name,
  c.id
ORDER BY
  1,
  4,
  2
LIMIT
  200;

Check the query profile of above query to validate that QAS is used (you can follow instructions from step 4.3 of the notebook MODULE4_MV_OPTIMIZATION earlier to find the query profile, an example is provided in the quickstart guide), looking for “**partition scanned by service**” on the node TableScan of `TRAFFIC_LARGE` table.

Then rerun the eligibility check stored procedure to verify QAS has been applied.


In [None]:
USE WAREHOUSE WH_SUMMIT25_PERF_OPS;

CALL qas_eligibilty('WH_SUMMIT25_PERF_QAS');

### 6.4 Compare Performance and cost

Compare the performance (execution time and credits) between `WH_SUMMIT25_PERF_BASE` and `WH_SUMMIT25_PERF_QAS`, and examine the QUERY_ACCELERATION_HISTORY view to understand the impact of QAS on cost.

In [None]:
-- comparing performance and Cost 
with query_noqas as (
    select 
        query_id,
        total_elapsed_time,
        query_parameterized_hash
    FROM TABLE(
        INFORMATION_SCHEMA.QUERY_HISTORY_BY_WAREHOUSE(
            WAREHOUSE_NAME =>'WH_SUMMIT25_PERF_BASE', 
            RESULT_LIMIT =>10000
        )
    )   
    where execution_time > 0
    and query_text ilike '%BASE WORKLOAD QUERY - 08 - QAS%'
    and warehouse_name = 'WH_SUMMIT25_PERF_BASE' 
    and error_code is null 
    and query_type = 'SELECT'
    qualify row_number() over (order by start_time desc) = 1
),
query_qas as (
    select  
        query_id,
        total_elapsed_time,
        query_parameterized_hash
    FROM TABLE(
        INFORMATION_SCHEMA.QUERY_HISTORY_BY_WAREHOUSE(
            WAREHOUSE_NAME =>'WH_SUMMIT25_PERF_QAS', 
            RESULT_LIMIT =>10000
        )
    ) 
    where execution_time > 0
    and query_text ilike '%BASE WORKLOAD QUERY - 08 - QAS%'
    and warehouse_name = 'WH_SUMMIT25_PERF_QAS'
    and error_code is null 
    and query_type = 'SELECT'
    qualify row_number() over (order by start_time desc) = 1
)
select 
    qn.total_elapsed_time as noqas_elapsed_time,
    qq.total_elapsed_time as qas_elapsed_time,
    (noqas_elapsed_time - qas_elapsed_time) / 36000  as simple_saved_credits
from query_noqas qn
join query_qas qq 
;

In [None]:
-- lastly, You can compare cost of QAS from informatiom_schema function as below. It may return nothing if the cost is too small (ie. less than 0.001)
-- QUERY_ACCELERATION_HISTORY funciton in information schema
select 
    credits_used
from table(
    information_schema.QUERY_ACCELERATION_HISTORY(
        date_range_start=>dateadd(D, -7, current_date),
        date_range_end=>current_date,
        warehouse_name=>'WH_SUMMIT25_PERF_QAS'
    )
);