# **Module 3 - Lesson 5 - Demo 1 (M03\_L05\_Demo01)**

## **Objective:** The goal of this demo is to demonstrate how CCI works at the various stages of the CCI lifecycle

## **Setup:**

Before starting, resume your Synapse SQL Pool if it is paused 

These demos are writen to run at DW1000c, runtimes will be longer at smaller SLOs

_Note: this demo uses data loaded in a precious demo - M03\_L02\_Demo01_

<span style="font-size: 14.6667px; font-variant-ligatures: none; white-space: pre-wrap; background-color: rgb(255, 255, 255);">Drop any tables by the same name of the tables we will be creating for this demo</span>

In [9]:
--Drop tables if they already exist
IF OBJECT_ID('fctTrip_CCI') IS NOT NULL		DROP TABLE [fctTrip_CCI] 
IF OBJECT_ID('fctTrip_CCI_Ordered') IS NOT NULL	DROP TABLE [fctTrip_CCI_Ordered] 

## Create the CCI version of the demo table, review the CCI health stats, then explain the relevance of the reported metrics

_Note - this query runs against the original trip talbe, not the fctTrip table_

This query runs for ~1min 45sec at DW1000c, ~3min at DW500c

In [10]:
--Create a new table copy of the trip table with a Columnstore index and distributed on DateID
CREATE TABLE [fctTrip_CCI] 
WITH 
(DISTRIBUTION = HASH(DateID), 
 CLUSTERED COLUMNSTORE INDEX ) 
AS 
SELECT * 
FROM trip 

## Check the CCI health of this table using the CCIHealth view we previously created. Take note of: 

- Open rows - rows that are in the delta store and have not yet reached the criteria to be compressed automatically into the columnstore
- Compressed rows - number of rows in compressed rowgroups
- avg compressed rows - average number of rows in each compressed rowgroup. Ideal is around 1 million rows

In [11]:
--View cci health of the factTrip_CCI table
SELECT * 
FROM vCCIHealth 
WHERE Table_Name = 'fctTrip_CCI'

Schema_Name,Table_Name,Distribution_type,Total_Rows,Column_Count,OPEN_Row_Groups,OPEN_rows,MIN OPEN Row Group Rows,MAX OPEN_Row Group Rows,AVG OPEN_Row Group Rows,COMPRESSED_Row_Groups,COMPRESSED_Rows,Deleted_COMPRESSED_Rows,MIN COMPRESSED Row Group Rows,MAX COMPRESSED Row Group Rows,AVG_COMPRESSED_Rows,CLOSED_Row_Groups,CLOSED_Rows,MIN CLOSED Row Group Rows,MAX CLOSED Row Group Rows,AVG CLOSED Row Group Rows
dbo,fctTrip_CCI,HASH,170261325,23,7,333376,2715,81387,47625,347,169927949,0,107374,524289,489705,0,0,,,


## Add new data from the fctTrip table(300k rows) and review the CCI stats again

The small insert of 300k rows will add the new rows to open rowgroups instead of to compressed rowgroups. For a CCI table to perform well we want all of the rows to be in compressed rowgroups.

In [12]:
--Insert 300k rows into fctTrip_CCI
INSERT [fctTrip_CCI] 
SELECT top 300000 * 
FROM Trip 

In [13]:
--Review CCI health
SELECT * 
FROM vCCIHealth 
WHERE Table_Name = 'fctTrip_CCI' 

Schema_Name,Table_Name,Distribution_type,Total_Rows,Column_Count,OPEN_Row_Groups,OPEN_rows,MIN OPEN Row Group Rows,MAX OPEN_Row Group Rows,AVG OPEN_Row Group Rows,COMPRESSED_Row_Groups,COMPRESSED_Rows,Deleted_COMPRESSED_Rows,MIN COMPRESSED Row Group Rows,MAX COMPRESSED Row Group Rows,AVG_COMPRESSED_Rows,CLOSED_Row_Groups,CLOSED_Rows,MIN CLOSED Row Group Rows,MAX CLOSED Row Group Rows,AVG CLOSED Row Group Rows
dbo,fctTrip_CCI,HASH,170561325,23,59,633376,818,88543,10735,347,169927949,0,107374,524289,489705,0,0,,,


## Reorganize the index and review CCI stats

Open rowgroups remain unchanged. Reorganize runs a subset of the work that rebuild does, it will not move open rows into compressed rowgroups.

_Note: an index REORGANIZE is an ONLINE operation_

In [14]:
--Run the index reorganize for fctTrip_CCI
ALTER INDEX ALL ON [fctTrip_CCI] 
REORGANIZE 

In [15]:
--Check CCI health again
SELECT * 
FROM vCCIHealth 
WHERE Table_Name = 'fctTrip_CCI'

Schema_Name,Table_Name,Distribution_type,Total_Rows,Column_Count,OPEN_Row_Groups,OPEN_rows,MIN OPEN Row Group Rows,MAX OPEN_Row Group Rows,AVG OPEN_Row Group Rows,COMPRESSED_Row_Groups,COMPRESSED_Rows,Deleted_COMPRESSED_Rows,MIN COMPRESSED Row Group Rows,MAX COMPRESSED Row Group Rows,AVG_COMPRESSED_Rows,CLOSED_Row_Groups,CLOSED_Rows,MIN CLOSED Row Group Rows,MAX CLOSED Row Group Rows,AVG CLOSED Row Group Rows
dbo,fctTrip_CCI,HASH,212064739,23,59,633376,818,88543,10735,296,169927949,0,498019,1018074,574080,0,0,,,


### What happened?
The REORGANIZE command did make some changes, but the index would not be considered fully healthy yet. 
NOTE ABOUT TOTAL ROWS INCREASING
Here are the things that changed:
- Number of compressed rowgroups decreased
    - smaller rowgroups were combined together to form fewer, larger rowgroups
- Min compressed rows increased
    - The smallest rowgroups are now smaller thanks to combining smaller rowgroups together
- Max compressed rows increased
    - the largest rowgroup(s) are now around the most ideal size of 1 million rows
- Avg compressed rows increased
    - This is the key metric we are looking at - on average the number of rows in compressed rowgroups increased, which will lead to better compression within each rowgroup
Here is what did not change:
- Rows in OPEN rowgroups
    - the REORGANIZE command does not impact rows that are still in the delta store
- Rows in CLOSED rowgroups
    - you will rarely see rows in closed rowgroups because these rowgroups are just wating to be compressed
_NOTE: If we had deleted rows in compressed rowgroups, then the reorganize command will drop these IF the percentage deleted within the individual rowgroup is above 10%._

## Rebuild the index and review CCI stats

You should see now that all OPEN rowgroups are compressed. A rebuild will take all open rows and compress them into open rowgroups, as well as combine smaller rowgroups into larger rowgroups where possible.

_Note: An index REBUILD is an OFFLINE operation_

In [17]:
--Run a rebuild on the index
ALTER INDEX ALL ON [fctTrip_CCI] 
REBUILD 

In [18]:
--Check CCI health again
SELECT * 
FROM vCCIHealth 
WHERE Table_Name = 'fctTrip_CCI'

Schema_Name,Table_Name,Distribution_type,Total_Rows,Column_Count,OPEN_Row_Groups,OPEN_rows,MIN OPEN Row Group Rows,MAX OPEN_Row Group Rows,AVG OPEN_Row Group Rows,COMPRESSED_Row_Groups,COMPRESSED_Rows,Deleted_COMPRESSED_Rows,MIN COMPRESSED Row Group Rows,MAX COMPRESSED Row Group Rows,AVG_COMPRESSED_Rows,CLOSED_Row_Groups,CLOSED_Rows,MIN CLOSED Row Group Rows,MAX CLOSED Row Group Rows,AVG CLOSED Row Group Rows
dbo,fctTrip_CCI,HASH,170561325,23,0,0,,,,191,170561325,0,36768,1048576,892991,0,0,,,


## Create an ordered CCI version of the demo table

This query runs for ~2min 10sec at DW1000c

Runtime for this query will be longer than the previous CTAS with an un-ordered index. It takes longer to create an ordered table because the data has to be sorted before it can be inserted, but certain queries can benefit from the index being ordered so the extra load time may be worth it for the increased query performance. 

  

_Note: this query runs against the original trip table, not the fctTrip table_

In [19]:
--Create the table with ordered CCI
CREATE TABLE [fctTrip_CCI_Ordered] 
WITH 
(DISTRIBUTION = HASH(DateID), 
 CLUSTERED COLUMNSTORE INDEX ORDER (DateID, MedallionID)) 
AS 
SELECT * 
FROM trip 

In [20]:
--Check CCI health of the table
SELECT * 
FROM vCCIHealth 
WHERE Table_Name = 'fctTrip_CCI_Ordered'

Schema_Name,Table_Name,Distribution_type,Total_Rows,Column_Count,OPEN_Row_Groups,OPEN_rows,MIN OPEN Row Group Rows,MAX OPEN_Row Group Rows,AVG OPEN_Row Group Rows,COMPRESSED_Row_Groups,COMPRESSED_Rows,Deleted_COMPRESSED_Rows,MIN COMPRESSED Row Group Rows,MAX COMPRESSED Row Group Rows,AVG_COMPRESSED_Rows,CLOSED_Row_Groups,CLOSED_Rows,MIN CLOSED Row Group Rows,MAX CLOSED Row Group Rows,AVG CLOSED Row Group Rows
dbo,fctTrip_CCI_Ordered,HASH,170261325,23,21,1163680,1125,101717,55413,656,169097645,0,107369,262145,257770,0,0,,,


## Rebuild the CCI index and notice the changes vs before the rebuild

In [21]:
--Run an index rebuild on the ordered table
ALTER INDEX ALL ON [fctTrip_CCI_Ordered] 
REBUILD 

In [22]:
 
--Check CCI health of the table again
SELECT * 
FROM vCCIHealth 
WHERE Table_Name = 'fctTrip_CCI_Ordered'

Schema_Name,Table_Name,Distribution_type,Total_Rows,Column_Count,OPEN_Row_Groups,OPEN_rows,MIN OPEN Row Group Rows,MAX OPEN_Row Group Rows,AVG OPEN_Row Group Rows,COMPRESSED_Row_Groups,COMPRESSED_Rows,Deleted_COMPRESSED_Rows,MIN COMPRESSED Row Group Rows,MAX COMPRESSED Row Group Rows,AVG_COMPRESSED_Rows,CLOSED_Row_Groups,CLOSED_Rows,MIN CLOSED Row Group Rows,MAX CLOSED Row Group Rows,AVG CLOSED Row Group Rows
dbo,fctTrip_CCI_Ordered,HASH,170261325,23,0,0,,,,191,170261325,0,32841,1048576,891420,0,0,,,


### What happened?