## Module 5. Search Optimization Service for Performance Tuning

Module 5 is to explore how SOS can improve a query’s performance.


### 5.1 Identify SOS Candidates

In this step, we will identify Search Optimization Service (SOS) Candidates by analyzing the reporting queries and look for a lookup type of queries where it does big tableScan(i.e., full tableScan) and outputs a few number of rows. 


In [None]:
alter session set use_cached_result = false; -- disable result caching
use schema sql_perf_optimization.public;
use warehouse WH_SUMMIT25_PERF_OPS;  -- for operations & analysis

-- lookup tableScan
select
    query_tag,
    operator_type,
    operator_statistics:pruning:partitions_scanned as mp_scanned,
    operator_statistics:pruning:partitions_total as mp_total,
    operator_statistics:output_rows as output_rows
from base_query_stats
where 
    mp_total is not null
    and (mp_scanned/mp_total) > 0.8 
    and query_tag ilike 'BASE WORKLOAD QUERY%'
    and operator_type = 'TableScan'
order by mp_total desc, output_rows asc
  ;

We can see that query 03 is a good candidate for a lookup query that scans a large number of micro-partitions and only a few rows returned. 

Let's revisit what query 03 does. 

```sql
WITH user_profile_age as ( -- BASE WORKLOAD QUERY - 03
    select 
        uuid,
        timestampdiff(year, value::date, current_date()) as value
    from user_profile
    where question_id = 3
),
user_profile_gender as (
    select 
        uuid,
        value::string as value
    from user_profile
    where question_id = 4
),
user_lookup AS (-- Q03
  SELECT
    t.uuid,
    age.value AS gender,
    gender.value AS age,
    COUNT(DISTINCT t.url) AS visited_sites,
    MIN(t.timestamp) AS first_visit,
    MAX(t.timestamp) AS last_visit
  FROM
    traffic AS t
    LEFT JOIN user_profile_age AS age ON t.uuid = age.uuid
    LEFT JOIN user_profile_gender AS gender ON t.uuid = gender.uuid
  WHERE
    t.timestamp between '2025-01-01' and '2025-02-01'
  GROUP BY ALL
)
SELECT
  *
FROM
  user_lookup
WHERE
  uuid in ('64d91ddc-cad4-4bc3-993e-53f050984738', '2d738ba3-9c32-4fe5-92ba-e8fdb3f2f6e6',
  'a628176d-dec0-453d-8f7a-33d4682bd26f','a9adda58-5762-4e1b-85b8-02b4eb5e9970',
  '93d96769-4cc8-4521-9cfb-8b81472f70e5','cb4645ef-454c-4af3-9fcd-24c48d4ad0be',
  'aabf7d37-51ee-4ce6-abc7-c07c2533a0b7','c75a4784-eddb-4e9c-b3d5-89115855bce9',
  '9484fac2-fdf6-4fa7-8084-149ec921f1af','34c8e1e4-d05e-4d71-af07-e53ec3a0d87c',
  '6a524226-0c88-4874-bdcf-a489d3b7707b','67617602-28a7-4cc4-802c-a0a871d3a967',
  '1da9d743-c674-4dd8-8c41-66b45d877afb','6c0514c7-a44e-4e14-96cd-a5f93bf9f00c'
  )
ORDER BY
  last_visit DESC;
```

### 5.2 Design Search Optimized Object

The next step is to design a Search-Optimized Object. Let's look at this query. We look for high-cardinality filter columns (e.g., UUID in this query with an IN list), though it needs to have sufficient distinct values, ideally 100K-200K distinct values or more.

In [None]:
select 
    approx_count_distinct(uuid)
from traffic 
; -- 8809584, more than 200K 

UUID does qualify for SOS. Now we can create a search optimization object on UUID.

In [None]:
ALTER TABLE traffic ADD SEARCH OPTIMIZATION ON EQUALITY(uuid);

In [None]:
SHOW TABLES like 'traffic'; -- verify sos is enabled on the table

We can also estimate the cost of the SOS on the UUID equality filter.

In [None]:
SELECT 
    SYSTEM$ESTIMATE_SEARCH_OPTIMIZATION_COSTS('traffic', 'EQUALITY(uuid)')
        AS estimate_for_columns_with_search_optimization;

### 5.3 Rerun the Query

We need to warm the new warehouse WH_SUMMIT25_PERF_SOS first by running query 01 and 02 similar to base workload on WH_SUMMIT25_PERF_BASE.

Then rerun Query 03 on warehouse WH_SUMMIT25_PERF_SOS. 


In [None]:
USE WAREHOUSE WH_SUMMIT25_PERF_SOS;

-- rerun query 01 and 02 to warm the new warehouse WH_SUMMIT25_PERF_SOS
with age_20_to_30 as ( -- BASE WORKLOAD QUERY - 01
    select distinct uuid
    from user_profile
    where question_id = 3 -- DOB question
        and value::date between dateadd(year, -30, current_date) 
            and dateadd(year, -20, current_date)
),
gender_male as (
    select distinct uuid
    from user_profile
    where question_id = 4 -- Gender question
        and value::string = 'M'
),
income_50K_to_100K as (
    select distinct uuid
    from user_profile
    where question_id = 10 -- Income question
        and value::int between 50000 and 100000
)
select
    c.name,
    url,
    count(1) as visits
from traffic t
join category c on (
    c.id = t.category_id
)
join age_20_to_30 a on (
    a.uuid = t.uuid
)
join gender_male g on (
    g.uuid = t.uuid
)
join income_50K_to_100K i on (
    i.uuid = t.uuid
)
where
    t.timestamp between '2025-01-01' and '2025-02-01'
group by all
qualify row_number() over (
    partition by c.name order by visits desc
) <= 100
order by c.name, visits desc;

WITH url_stats AS ( -- BASE WORKLOAD QUERY - 02
  SELECT
    c.name AS category_name,
    t.url,
    COUNT(DISTINCT t.uuid) AS unique_visitors,
    COUNT(*) AS total_visits,
    RANK() OVER (
      PARTITION BY c.name
      ORDER BY
        COUNT(*) DESC
    ) AS rank_in_category
  FROM
    traffic AS t
    JOIN category AS c ON t.category_id = c.id
  WHERE
    t.timestamp between '2025-01-01' and '2025-02-01'
  GROUP BY
    c.name,
    t.url
)
SELECT
  category_name,
  url,
  unique_visitors,
  total_visits,
  rank_in_category
FROM
  url_stats
WHERE
  rank_in_category <= 10
ORDER BY
  category_name,
  rank_in_category;

WITH user_profile_age as ( -- BASE WORKLOAD QUERY - 03
    select 
        uuid,
        timestampdiff(year, value::date, current_date()) as value
    from user_profile
    where question_id = 3
),
user_profile_gender as (
    select 
        uuid,
        value::string as value
    from user_profile
    where question_id = 4
),
user_lookup AS (
  SELECT
    t.uuid,
    age.value AS gender,
    gender.value AS age,
    COUNT(DISTINCT t.url) AS visited_sites,
    MIN(t.timestamp) AS first_visit,
    MAX(t.timestamp) AS last_visit
  FROM
    traffic AS t
    LEFT JOIN user_profile_age AS age ON t.uuid = age.uuid
    LEFT JOIN user_profile_gender AS gender ON t.uuid = gender.uuid
  WHERE
    t.timestamp between '2025-01-01' and '2025-02-01'
  GROUP BY ALL
)
SELECT
  *
FROM
  user_lookup
WHERE
  --NOT uuid IS NULL
  uuid in ('64d91ddc-cad4-4bc3-993e-53f050984738', '2d738ba3-9c32-4fe5-92ba-e8fdb3f2f6e6',
  'a628176d-dec0-453d-8f7a-33d4682bd26f','a9adda58-5762-4e1b-85b8-02b4eb5e9970',
  '93d96769-4cc8-4521-9cfb-8b81472f70e5','cb4645ef-454c-4af3-9fcd-24c48d4ad0be',
  'aabf7d37-51ee-4ce6-abc7-c07c2533a0b7','c75a4784-eddb-4e9c-b3d5-89115855bce9',
  '9484fac2-fdf6-4fa7-8084-149ec921f1af','34c8e1e4-d05e-4d71-af07-e53ec3a0d87c',
  '6a524226-0c88-4874-bdcf-a489d3b7707b','67617602-28a7-4cc4-802c-a0a871d3a967',
  '1da9d743-c674-4dd8-8c41-66b45d877afb','6c0514c7-a44e-4e14-96cd-a5f93bf9f00c'
  )
ORDER BY
  last_visit DESC;

Check the query profile and see if SOS is used or not. Why or why not?

### 5.4 Compare Performance

Now let's compare the query performance between WH_SUMMIT25_PERF_BASE (without SOS) and WH_SUMMIT25_PERF_SOS(with SOS). 


In [None]:
-- for operations & analysis
USE WAREHOUSE WH_SUMMIT25_PERF_OPS; 

-- compare query performance on SOS
select 
    warehouse_name,
    query_text, 
    end_time,
    total_elapsed_time
FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY(RESULT_LIMIT =>10000))   
where 
    execution_time > 0
    and query_text ilike '%BASE WORKLOAD QUERY - 03%'
    and warehouse_name in ( 'WH_SUMMIT25_PERF_BASE', 'WH_SUMMIT25_PERF_SOS')
    and error_code is null 
    and query_type = 'SELECT'
qualify row_number() over (partition by warehouse_name order by end_time desc) = 1
;

Based on total_elapsed_time, you can quantify improvements from SOS.  SOS helps prune micro-partitions, improving query performance for highly selective filters.

### 5.5 Compare Costs

Next, let's compare the cost of query execution with and without SOS, also check the cost of SOS service


In [None]:
-- compare cost of sos 
-- SEARCH_OPTIMIZATION_HISTORY in account_usage views and information schema
-- comparing cost, note WH_SUMMIT25_PERF_BASE contains more other queries
select 
    WAREHOUSE_NAME,
    SUM(CREDITS_USED)
from table(information_schema.warehouse_metering_history(
    dateadd('days', -10, current_date())
))
where WAREHOUSE_NAME in ( 'WH_SUMMIT25_PERF_BASE', 'WH_SUMMIT25_PERF_SOS')
GROUP BY 1;

In [None]:
-- check the cost of SOS service, may be empty due to latency (you can rerun after a minute or so) or 0 if the cost is below 0.001
-- SEARCH_OPTIMIZATION_HISTORY in account_usage views and information schema
select 
    credits_used
from table(
    information_schema.search_optimization_history (
        date_range_start => dateadd(D, -7, current_date),
        date_range_end => current_date,
        table_name => 'SQL_PERF_OPTIMIZATION.PUBLIC.TRAFFIC'
    )
);