# Types of caching in snowflake:

## Cloud Service layer:

* Result Cache
    * Reuse the results displayed if:
        * table data not changed
        * Micro-Partitions not changed
        * No UDF / External functions
        * Sufficinet priviledges
        * Can be disabled with "USE_CACHED_RESULT" PARAMETER
        * If not reused, purged after 24 hrs.
* Metadata Cache
    * Count rows, distinct values, min/max.
    * No use of virtual warehouse
    * Used for Describe / System defined functions
    * VPS Edition has dedicated meta store


## Query Processing Layer:
* Data Cache
    * Stored under Local SSD of the virtual warehouse
    * Operations ran in the individual virtual warehouse
    * Dropped when warehouse is suspended or resized.

    

In [None]:
%%sql -r dataframe_2
USE SCHEMA SNOWFLAKE_SAMPLE_DATA.TPCH_SF10;
USE ROLE ACCOUNTADMIN;

## Result Cache

In [None]:
%%sql -r dataframe_3
-- Running the first, no cache will be used.
SELECT * FROM ORDERS;

-- Run again to check the difference


### last_query_id() / GET_QUERY_OPERATOR_STATS()

## last_query_id() 
when running via Notebook, we need to look for the 2nd from last query, because the last one will be a query do format the SQL result.

* last_query_id(-1) = same as last_query_id() = Gets the latest query.
* last_query_id(-2) = 2nd from last and so on.

###  GET_QUERY_OPERATOR_STATS()
Returns statistics about individual query operators within a query that has completed.
You can run this function for any completed query that was executed in the past 14 days.

https://docs.snowflake.com/en/sql-reference/functions/get_query_operator_stats


In [None]:
%%sql -r dataframe_5

select operator_type, * 
from table(get_query_operator_stats(last_query_id(-2)));


In [None]:
%%sql -r dataframe_8
ALTER WAREHOUSE COMPUTE_WH SET AUTO_RESUME = FALSE;
ALTER WAREHOUSE COMPUTE_WH SUSPEND;
ALTER WAREHOUSE COMPUTE_WH RESUME;
-- When Result Cache is not used

In [None]:
%%sql -r dataframe_7
-- Running the first, no cache will be used.
SELECT * FROM ORDERS;



 ***result cache*** it is still reused, even if the virtual warehouse is resized.
 The reason for that is that this type of cache resides in the cloud service layer.

In [None]:
%%sql -r dataframe_4

select operator_type, * 
from table(get_query_operator_stats(last_query_id(-2)));


In [None]:
%%sql -r dataframe_10
SELECT * FROM ORDERS LIMIT 4;

### any change in the query will not use the cache.

In [None]:
%%sql -r dataframe_11

select operator_type, * 
from table(get_query_operator_stats(last_query_id(-2)));


In [None]:
%%sql -r dataframe_13
ALTER SESSION SET USE_CACHED_RESULT = FALSE; --> will disable the cache
SELECT * FROM ORDERS ;

select operator_type, * from table(get_query_operator_stats(last_query_id(-1)));


## Metadata Cache

In [None]:
%%sql -r dataframe_16
-- 1. Statistics about table objects
SELECT COUNT(*) FROM ORDERS;
select operator_type, * from table(get_query_operator_stats(last_query_id(-1)));



In [None]:
%%sql -r dataframe_17
SHOW ROLES;
select operator_type, * from table(get_query_operator_stats(last_query_id(-1)));



In [None]:
%%sql -r dataframe_19
DESC TABLE ORDERS;

## Virtual Warehouse caching

A running warehouse maintains a cache of table data that can be accessed by queries running on the same warehouse. 
This can improve the performance of subsequent queries if they are able to read from the cache instead of from tables.

In [None]:
%%sql -r dataframe_23
SELECT warehouse_name
  ,COUNT(*) AS query_count
  ,SUM(bytes_scanned) AS bytes_scanned
  ,SUM(bytes_scanned*percentage_scanned_from_cache) AS bytes_scanned_from_cache
  ,SUM(bytes_scanned*percentage_scanned_from_cache) / SUM(bytes_scanned) AS percent_scanned_from_cache
FROM snowflake.account_usage.query_history
WHERE start_time >= dateadd(month,-1,current_timestamp())
  AND bytes_scanned > 0
GROUP BY 1
ORDER BY 5;

In [None]:
%%sql -r dataframe_18
SELECT query_id, SUBSTR(query_text, 1, 50) partial_query_text, user_name, warehouse_name,
  bytes_spilled_to_local_storage, bytes_spilled_to_remote_storage
FROM  snowflake.account_usage.query_history
WHERE (bytes_spilled_to_local_storage > 0
  OR  bytes_spilled_to_remote_storage > 0 )
  AND start_time::date > dateadd('days', -45, current_date)
ORDER BY bytes_spilled_to_remote_storage, bytes_spilled_to_local_storage DESC
LIMIT 10;


In [None]:
%%sql -r dataframe_21
SELECT query_id, SUBSTR(query_text, 1, 50) partial_query_text, user_name, warehouse_name,
  bytes_spilled_to_local_storage, bytes_spilled_to_remote_storage
FROM  snowflake.account_usage.query_history
WHERE (bytes_spilled_to_local_storage > 0
  OR  bytes_spilled_to_remote_storage > 0 )
  AND start_time::date > dateadd('days', -45, current_date)
ORDER BY bytes_spilled_to_remote_storage, bytes_spilled_to_local_storage DESC
LIMIT 10;
