# Lab 1 - Dedupe & Eliminate (Sample Answers)

## Step 0. Run sp_BlitzIndex

In [None]:
EXEC dbo.sp_BlitzIndex @DatabaseName='StackOverflow', @SchemaName='dbo', @TableName='Posts';

<img src="C:\Users\hartleyg\Desktop\Training\sqltraining\Brent Ozar\Mastering Index Tuning\1.2 Lab - DEATH Method\Lab1.1.png" ></img>


## Step 1. Eliminate Unused

1. Hide the name column of the indexes. Only the names of they keys are important
2. Review the Usage stats. Add change / undo scripts for the indexes that have 0 reads

**Side Notes**

    Q: In the case of duplicate indexes, we may find that one index is getting reads, and the other one isnt... how does SQL Server know which one to choose?
    A: SQL Server can potentially use both, and could choose either one to use against different execution plans. These potentially compete with each other within the RAM cache, so the best idea is to delete it.
    
    Q: How do we choose which one to delete?
    A: Keep the one with the better (most descriptive) index name



In [None]:
/* Dropping this because it's a duplicate of IX_AcceptedAnswerId: */
DROP INDEX dbo.Posts._dta_index_Posts_5_85575343__K2;

/* Undo script: 
CREATE INDEX [_dta_index_Posts_5_85575343__K2] ON [dbo].[Posts] ( [AcceptedAnswerId] ) WITH (FILLFACTOR=100, ONLINE=?, SORT_IN_TEMPDB=?, DATA_COMPRESSION=?);
*/


## Step 1 - Drop Unused cont.

Note that you can used a comma delimited list to drop multiple indexes

In [None]:
/* Drop these because they're unused: */
DROP INDEX dbo.Posts.IX_LastActivityDate_Includes,
dbo.Posts.IX_LastEditorUserId,
dbo.Posts.IX_ParentId,
dbo.Posts.IX_PostTypeId,
dbo.Posts.IX_ViewCount_Includes;


/* Undo script: 
CREATE INDEX [IX_LastActivityDate_Includes] ON [dbo].[Posts] ( [LastActivityDate] ) INCLUDE ( [ViewCount]) WITH (FILLFACTOR=100, ONLINE=?, SORT_IN_TEMPDB=?, DATA_COMPRESSION=?);
CREATE INDEX [IX_LastEditorUserId] ON [dbo].[Posts] ( [LastEditorUserId] ) WITH (FILLFACTOR=100, ONLINE=?, SORT_IN_TEMPDB=?, DATA_COMPRESSION=?);
CREATE INDEX [IX_ParentId] ON [dbo].[Posts] ( [ParentId] ) WITH (FILLFACTOR=100, ONLINE=?, SORT_IN_TEMPDB=?, DATA_COMPRESSION=?);
CREATE INDEX [IX_PostTypeId] ON [dbo].[Posts] ( [PostTypeId] ) WITH (FILLFACTOR=100, ONLINE=?, SORT_IN_TEMPDB=?, DATA_COMPRESSION=?);
CREATE INDEX [IX_ViewCount_Includes] ON [dbo].[Posts] ( [ViewCount] ) INCLUDE ( [LastActivityDate]) WITH (FILLFACTOR=100, ONLINE=?, SORT_IN_TEMPDB=?, DATA_COMPRESSION=?);
*/


## Step 2 - Deduplicate

Now we can deduplicate indexes which are subsets for existing indexes.

In this case, we have 2 indexes against the dbo.Posts table:

    Index 1 Key Columns: AcceptedAnswerId
    Index 2 Key Columns: AcceptedAnswerId, OwnerUserId

Index 1 is a narrower subset of Index 1, so we can simply drop Index 1:

In [None]:
/* Dropping this because it's a narrower subset of dbo.Posts._dta_index_Posts_5_85575343__K2_K14 (4) */
DROP INDEX dbo.Posts.IX_AcceptedAnswerId;
GO

/* Undo script: 
CREATE INDEX [IX_AcceptedAnswerId] ON [dbo].[Posts] ( [AcceptedAnswerId] ) WITH (FILLFACTOR=100, ONLINE=?, SORT_IN_TEMPDB=?, DATA_COMPRESSION=?);
GO
*/

/* Dropping this because it's a narrower subset of these: dbo.Posts._dta_index_Posts_5_85575343__K14_K16_K7_K1_K2_17 (9)
dbo.Posts._dta_index_Posts_5_85575343__K14_K16_K1_K2 (10) */
DROP INDEX dbo.Posts.IX_OwnerUserId;
GO

/* Undo script:
CREATE INDEX [IX_OwnerUserId] ON [dbo].[Posts] ( [OwnerUserId] ) WITH (FILLFACTOR=100, ONLINE=?, SORT_IN_TEMPDB=?, DATA_COMPRESSION=?);
*/



## Step 3 - Harder Ones

In our case, we've picked off the low-hanging fruit, and are now down to our recommended 5x5. Potentially we can stop here, unless there's a specific index at fault.

In the following example, sql_blitz has identified a potential duplicate:

<img src="C:\Users\hartleyg\Desktop\Training\sqltraining\Brent Ozar\Mastering Index Tuning\1.2 Lab - DEATH Method\Lab1.2.png"></img>

In this case, Index 2 contains a narrower subset of columns from Index 1, but both indexes have reads against them. Why is that?

Index 1 includes CommunityOwnedDate, Id, and AcceptedAnswerId *in that order*. Because these fields are part of the index key, the ordering of these two indexes will be vastly different. Each index may perform better across different execution plans.

However if there's are acting up, we can investigate to see whether we can combine these indexes to provide enough coverage. This depends on the uniqueness of the leading columns, OwnerUserId and PostTypeId. Lets do a query to find the uniqueness of these fields:

In [None]:
SELECT TOP 100 OwnerUserId, PostTypeId, COUNT(*) AS recs
FROM dbo.Posts
GROUP BY OwnerUserId, PostTypeId
ORDER BY COUNT(*) DESC;

At a rough guess, if the maximum count is around 10,000, the index with just the two leading fields may be sufficient. However, our results tell a different story:

<img src="C:\Users\hartleyg\Desktop\Training\sqltraining\Brent Ozar\Mastering Index Tuning\1.2 Lab - DEATH Method\Lab1.3.png"></img>

The combination of OwnerUserId = 0 and PostTypeId = 1 returns A quarter of a million records! It would matter *a lot* if we were to dedupe these while someone was running with these parameters. 

If we really wanted to dig down, we'd to run through the steps from Fundamentals of Index Tuning, and start to trial certain index combinations, and measure logical reads, cpu, i/o etc.

Lets continue our work for the dbo.Badges table:

In [None]:
EXEC dbo.sp_BlitzIndex @DatabaseName='StackOverflow', @SchemaName='dbo', @TableName='Badges';

<img src="C:\Users\hartleyg\Desktop\Training\sqltraining\Brent Ozar\Mastering Index Tuning\1.2 Lab - DEATH Method\Lab1.4.png"></img>

Already, we have a couple that we can get rid of: <br>
- The non-clustered index that contains only Id is duplicated against the clustered index, and can be dropped. 
    - However, doing a COUNT(*) results in the optimizer using this index, as it's the smallest object which contains all rows. In this case, since our reads are 0, it's not required for our purpose
- Badges has a wider index which uses UserId as its leading key column; the non-clustered index which contains only UserId can be dropped


In [None]:

/* Drop these because they're unused: */
DROP INDEX dbo.Badges.IX_Id,
dbo.Badges.IX_UserId;
GO
/* Undo script:
CREATE INDEX [IX_Id] ON [dbo].[Badges] ( [Id] ) WITH (FILLFACTOR=100, ONLINE=?, SORT_IN_TEMPDB=?, DATA_COMPRESSION=?);
CREATE INDEX [IX_UserId] ON [dbo].[Badges] ( [UserId] ) WITH (FILLFACTOR=100, ONLINE=?, SORT_IN_TEMPDB=?, DATA_COMPRESSION=?);
*/



EXEC dbo.sp_BlitzIndex @DatabaseName='StackOverflow', @SchemaName='dbo', @TableName='Comments';