# 1.5 The D.E.A.T.H. Method: Tuning Indexes for Specific Queries

We covered the D.E. parts of the D.E.A.T.H. Method, and if we were going in order, we’d tackle the A part next: using Clippy’s index recommendations from the missing index DMVs. However, Clippy can be a little misleading, so just for the purpose of training, we’re going to tackle the T first: tuning indexes for these specific queries.</p>


## Reminder - D.E.A.T.H Method
<img src="C:\Users\hartleyg\Desktop\Training\SQL Server\Brent Ozar\Mastering Index Tuning\1.5 DEATH - Tuning to Specific Queries\1.1.png" width=700></img>

**Dedupe & Eliminate** - Matter of hours of focused work

**Adding indexes** - Weekly, requires more thought
- Requires close examination of existing indexes
- Thinking about key order, selectivity
- Interpreting the ideas from SQL recommendations (don't take recommendation as gospel, but interpret the clues)


**Tuning indexes for specific queries** - Even more involved effort, typically 1-4 hours **per query**
- Finding the right queries to tune
- Ongoing monitoring (make sure it gets used)
- A/B testing for effectiveness
- Tuning the query itself


The following is an example using a query with only equality operators:

In [None]:
/* Both Equality Searches - Order doesnt matter*/
SELECT Id
  FROM dbo.Users
  WHERE DisplayName = 'Brent Ozar'
  AND WebsiteUrl = 'https://www.brentozar.com';
GO

Note that because both filters are equality searches, the order doesn't matter for this query. 

However, lets run an example using inequality operators... 

In [None]:
/* Turn on actual plans (control-M) and: */
SET STATISTICS IO, TIME ON;
GO

CREATE OR ALTER PROC [dbo].[usp_Q6925] @UserId INT AS
BEGIN
/* Source: http://data.stackexchange.com/stackoverflow/query/6925/newer-users-with-more-reputation-than-me */
 
    SELECT u.Id as [User Link], u.Reputation, u.Reputation - me.Reputation as Difference
    FROM dbo.Users me 
    INNER JOIN dbo.Users u 
        ON u.CreationDate > me.CreationDate
        AND u.Reputation > me.Reputation
    WHERE me.Id = @UserId
 
END
GO

EXEC usp_Q6925 @UserId = 26837
GO

<img src="C:\Users\hartleyg\Desktop\Training\SQL Server\Brent Ozar\Mastering Index Tuning\1.5 DEATH - Tuning to Specific Queries\1.3.png" width=900></img>

SQL Server starts with a Clustered Index Seek for the 'me' (PK_Users_Id) part of the join, directly finding the ID of the row that was specified.

<img src="C:\Users\hartleyg\Desktop\Training\SQL Server\Brent Ozar\Mastering Index Tuning\1.5 DEATH - Tuning to Specific Queries\1.4.png" width=900></img>

Now it scans the Users table, looking for everyone who has a higher Creation Date and higher Reputation than the specified user. 

The recommendation suggests that we add an index on CreationDate and Reputation to the Users table, but why is that? If we right-click the recommendation and scan the XML, we'll see the following:

<img src="C:\Users\hartleyg\Desktop\Training\SQL Server\Brent Ozar\Mastering Index Tuning\1.5 DEATH - Tuning to Specific Queries\1.5.png" width = 500></img>

The recommendation simply orders the key columns as they are in the table... which may or may not be right? For an equality search, this doesnt matter so much, but for an INEQUALITY search? Matters a lot...

Now if we go with Clippy's recommendation:

In [None]:
-- Clippy's recommendation
CREATE INDEX IX_CreationDate_Reputation ON dbo.Users(IX_CreationDate_Reputation);


We have the following outcomes:

1. Logical Reads with no indexes:<br>
<img src="C:\Users\hartleyg\Desktop\Training\SQL Server\Brent Ozar\Mastering Index Tuning\1.5 DEATH - Tuning to Specific Queries\1.6.png" width = 500></img>
2. Query plan with IX_CreatedDate_Reputation<br>
<img src="C:\Users\hartleyg\Desktop\Training\SQL Server\Brent Ozar\Mastering Index Tuning\1.5 DEATH - Tuning to Specific Queries\1.7.png" width = 700></img>
 <---- It's doing a scan on our index, which is great, but check out that chunky arrow! 
3. Number of rows read against our index<br>
<img src="C:\Users\hartleyg\Desktop\Training\SQL Server\Brent Ozar\Mastering Index Tuning\1.5 DEATH - Tuning to Specific Queries\1.8.png" width = 300></img>
4. Yikes, 8.9 mil reads from a 9mil table! That's quite a lot... No surprise the logical reads look like this:<br>
<img src="C:\Users\hartleyg\Desktop\Training\SQL Server\Brent Ozar\Mastering Index Tuning\1.5 DEATH - Tuning to Specific Queries\1.9.png" width = 500></img>
5. This is still a lot of logical reads, but we have reduced the number of reads compared to earlier. However, we can do better! Lets flip this ish!

In [None]:
/* Joan Jett don't give a damn about her Reputation... but we do ;) */
CREATE INDEX IX_Reputation_CreationDate ON dbo.Users(IX_CreationDate_Reputation)

1. Lets run the query again, and inspect the query plan:<br>
<img src="C:\Users\hartleyg\Desktop\Training\SQL Server\Brent Ozar\Mastering Index Tuning\1.5 DEATH - Tuning to Specific Queries\1.10.png" width = 800></img>
2. It decided to use our query plan (note: sometimes it doesn't...). Well check out the logical reads on this bad boy! <br>
<img src="C:\Users\hartleyg\Desktop\Training\SQL Server\Brent Ozar\Mastering Index Tuning\1.5 DEATH - Tuning to Specific Queries\1.11.png" width = 500></img>
3. So why did it choose us over Clippy? Selectivity. In this instance, Reputation is the more selective of the two fields:

In [None]:
SELECT * FROM dbo.Users WHERE Id =26837
SELECT COUNT(*) FROM dbo.Users WHERE CreationDate > '2008-10-10'    -- 8903829
SELECT COUNT(*) FROM dbo.Users WHERE Reputation > 11825             -- 11213

## Which should go first? 

In this instance, narrowing down the search space using Reputation is more effective. Keep in mind that different parameters can result in different indexes making sense (eg. ORDER BY, TOP operations can make indexes more or less effective).

Remember that only one query plan is generated, and then reused. Knowing what parameters are being used, and what need to be tuned are also important.

## Exercise
Find ONE index that can best accommodate BOTH stored procedures: 

In [None]:


CREATE OR ALTER   PROC [dbo].[usp_PostsByCommentCount] @PostTypeId INT
AS
SELECT TOP 10 CommentCount, Score, ViewCount
FROM dbo.Posts
WHERE PostTypeId = @PostTypeId
ORDER BY CommentCount DESC;
GO

CREATE OR ALTER   PROC [dbo].[usp_PostsByScore] @PostTypeId INT, @CommentCountMinimum INT
AS
SELECT TOP 10 Id, CommentCount, Score
FROM dbo.Posts
WHERE CommentCount >= @CommentCountMinimum
AND PostTypeId = @PostTypeId
ORDER BY Score DESC;
GO

/* Create one index to improve both of these: */
EXEC usp_PostsByCommentCount @PostTypeId = 2;
GO
EXEC usp_PostsByScore @PostTypeId = 2, @CommentCountMinimum = 2;
GO

## Considerations
- The index we come up with may not be the best index for each individual query. Our goal is to find 1 index that improves the performance of both, as best it can
- Selectivity matters

## Strategy
1. Select the simplest stored proc of the two - in this instance, usp_PostsByCommentCount
2. Generate a script of the ideal index for this query
3. Review the next stored proc, and make adjustments to suit
4. Be careful if you need to change the key order. Placing a more selective field at the front of the index may have a downstream effect on the previous stored proc

In [None]:
/*  Possible Keys: CommentCount or PostTypeId
    Possible includes: Score, ViewCount
*/
CREATE OR ALTER   PROC [dbo].[usp_PostsByCommentCount] @PostTypeId INT
AS
SELECT TOP 10 CommentCount, Score, ViewCount
FROM dbo.Posts
WHERE PostTypeId = @PostTypeId
ORDER BY CommentCount DESC;
GO

/*  Possible Keys: CommentCount or PostTypeId or Score
    Possible includes: none
*/
CREATE OR ALTER   PROC [dbo].[usp_PostsByScore] @PostTypeId INT, @CommentCountMinimum INT
AS
SELECT TOP 10 Id, CommentCount, Score
FROM dbo.Posts
WHERE CommentCount >= @CommentCountMinimum
AND PostTypeId = @PostTypeId
ORDER BY Score DESC;
GO

/* Create one index to improve both of these: */
EXEC usp_PostsByCommentCount @PostTypeId = 2;
GO
EXEC usp_PostsByScore @PostTypeId = 2, @CommentCountMinimum = 2;
GO

## Results

Our final index leads with CommentCount, then PostTypeId and Score. ViewCount is an INCLUDE in the index:

In [None]:
-- Final Index
CREATE INDEX CommentCount_PostTypeId_Score_Includes
ON dbo.Posts(CommentCount, PostTypeId, Score) INCLUDE (ViewCount)

### [dbo].[usp_PostsByCommentCount]

For our first stored proc, it will:
1. Scan the index with a reverse order on the CommentCount field.
2. Keep scanning until it finds the  10 records that match the @PostTypeId specified in the WHERE clause.

So how did it perform?

<img src="C:\Users\hartleyg\Desktop\Training\sqltraining\Brent Ozar\Mastering Index Tuning\1.5 DEATH Method - Tuning Indexes to Specific Queries\1.12.png" width = 500></img>

<img src="C:\Users\hartleyg\Desktop\Training\sqltraining\Brent Ozar\Mastering Index Tuning\1.5 DEATH Method - Tuning Indexes to Specific Queries\1.13.png" width = 800></img>

17 rows read (out of the 10 we asked for), 4 logical page reads - this isnt too bad! Given the parameters we provided it didn't have to scan too far to find the results we wanted, despite the fact it ran a so-called 'evil' Table Scan...

However, if we decided to change the parameters we used (eg. choosing a rarely-used PostTypeId), the order of the result set, or number of results we wanted, we start impacting the effectiveness of our index.


### [dbo].[usp_PostsByScore]

This one is a little different. This time:
1. It will do an Index Seek to the first CommentCount that matches our parameter (ie. it will seek to the first Post that has 2 comments)
2. ... and then it reads ALL of the rest. Yes, all of them. Why you ask?
3. Since we have an ORDER BY on Score, SQL doesn't know where the highest ranked Score's will be

How did our index do against this stored proc?

<img src="C:\Users\hartleyg\Desktop\Training\sqltraining\Brent Ozar\Mastering Index Tuning\1.5 DEATH Method - Tuning Indexes to Specific Queries\1.141.png" width = 800></img> <--- Check out that thicc-boi arrow on the index seek!

Number of Rows Read: 12mil, ouch!
Actual Number of Rows going into the Sort operation: 7 Mil, double ouch!

<img src="C:\Users\hartleyg\Desktop\Training\sqltraining\Brent Ozar\Mastering Index Tuning\1.5 DEATH Method - Tuning Indexes to Specific Queries\1.5.15.png" width = 500></img>

Logical Reads: 40724... this is not great!

<img src="C:\Users\hartleyg\Desktop\Training\sqltraining\Brent Ozar\Mastering Index Tuning\1.5 DEATH Method - Tuning Indexes to Specific Queries\1.5.16.png" width = 400></img>

CPU Time = 5 seconds... this aint great either!

Back to the drawing board! Lets hit the code again!

In [None]:
/*  [dbo].[usp_PostsByCommentCount] @PostTypeId INT

    Possible Keys: CommentCount or PostTypeId
    Possible includes: Score, ViewCount
*/

/*  [dbo].[usp_PostsByScore] @PostTypeId INT, @CommentCountMinimum INT

    Possible Keys: CommentCount or PostTypeId or Score
    Possible includes: none
*/

-- Index Attempt #1
CREATE INDEX CommentCount_PostTypeId_Score_Includes
ON dbo.Posts(CommentCount, PostTypeId, Score) INCLUDE (ViewCount)

-- Index Attempt #2
CREATE INDEX PostTypeId_CommentCount_Score_Includes
ON dbo.Posts(PostTypeId, CommentCount, Score) INCLUDE (ViewCount)

Note that both stored procs required CommentCount or PostTypeId as key columns. While the top query performed awesomely with CommentCount then PostTypeId, the second one didn't.

Lets create another index with these two fields swapped around - **PostTypeId_CommentCount_Score_Includes**

>**Really Interesting Note**: In order to create this new index, SQL will find and use the smallest copy of our Posts table that has the columns needed to build our new index. In our case, it will actually use the index we have already created! Trippy!

>**Really Really Interesting Note**: There is a way <a href="https://dba.stackexchange.com/questions/139191/sql-server-how-to-track-progress-of-create-index-command">to track progress of CREATE INDEX command</a>! Includes percent completed, number of rows processed, number of rows completed, estimated seconds etc. Hallelujah! 

Now to test our queries. Lets start with the first stored proc **[dbo].[usp_PostsByCommentCount]**

<img src="C:\Users\hartleyg\Desktop\Training\sqltraining\Brent Ozar\Mastering Index Tuning\1.5 DEATH Method - Tuning Indexes to Specific Queries\1.5.17.png" width = 800></img>

So far so good. Even though we haven't dropped our old index, it's chosen to use the index on PostType. How are our logical reads doing?

<img src="C:\Users\hartleyg\Desktop\Training\sqltraining\Brent Ozar\Mastering Index Tuning\1.5 DEATH Method - Tuning Indexes to Specific Queries\1.5.18.png" width = 500></img>

Same as last time, good! So lets see how Stored Proc #2 fares:

<img src="C:\Users\hartleyg\Desktop\Training\sqltraining\Brent Ozar\Mastering Index Tuning\1.5 DEATH Method - Tuning Indexes to Specific Queries\1.5.19.png" width = 800></img> <--- That chungus arrow still around back...

Still not great... however we're doing better:

|                           | Number of Rows Read   | Actual Number of Rows | Logical Reads |
| ---                       | ---                   | ---                   | ---           |
| **Index on CommentCount** | 12 million            | 7 million             | 40724         |
| **Index on PostCode**     | 7 million             | 7 million             | 25441         |

As a balance across the two, the index on Post Count seems to be faring better. However, we do have one last index combination to fix up our second query:



In [None]:
CREATE INDEX PostTypeId_Score_CommentCount_Includes
ON dbo.Posts(PostTypeId, Score, CommentCount) INCLUDE (ViewCount);

/* Just to make sure SQL uses our new index */
DROP INDEX CommentCount_PostTypeId_Score_Includes;
DROP INDEX PostTypeId_CommentCount_Score_Includes;


First up, Stored Proc #1!

<img src="C:\Users\hartleyg\Desktop\Training\sqltraining\Brent Ozar\Mastering Index Tuning\1.5 DEATH Method - Tuning Indexes to Specific Queries\1.5.20.png" width = 800></img> <--- Oh no, Thiccy McThiccFace is in this one!

This isn't great... performance way down with this index

<img src="C:\Users\hartleyg\Desktop\Training\sqltraining\Brent Ozar\Mastering Index Tuning\1.5 DEATH Method - Tuning Indexes to Specific Queries\1.5.21.png" width = 500>

Okay, logical reads are waaaay up too... from 4 to 80k! At this rate, the second stored proc better reach lightning speeds! Lets see:

<img src="C:\Users\hartleyg\Desktop\Training\sqltraining\Brent Ozar\Mastering Index Tuning\1.5 DEATH Method - Tuning Indexes to Specific Queries\1.5.22.png" width = 500>

And as expected, it actually does! It seeks in to the right value, and no sort to perform thanks to the index ordering. As for logical reads:

<img src="C:\Users\hartleyg\Desktop\Training\sqltraining\Brent Ozar\Mastering Index Tuning\1.5 DEATH Method - Tuning Indexes to Specific Queries\1.5.23.png" width = 500>

Only 4. Looks like performance has flipped between the two...






## Review

| Index                             | Stored Proc 1 Performance | Stored Proc 2 Performance |
| ---                               | ---                       | ---                       |
| **CommentCount_PostTypeId_Score** | Excellent                 | Poor                      |
| **PostTypeId_CommentCount_Score** | Excellent                 | Average                   |
| **PostTypeId_Score_CommentCount** | Poor                      | Excellent                 |

### Considerations
There isn't a single index that will result in Excellent performance for both stored procs - there needs to be a compromise here

- We could add 2 seperate indexes for these queries, but would we start to violate the 5x5 rule?
- Which of these stored procs has bigger business value?
- Which of these stored procs is run the most?
- Who is running this query? Perhaps bias performance towards the human users, rather than the system queries (if application performance is acceptable)
- Can one of the queries be cached in the app?
- How many other read / write queries will be run?

In general, our index on **PostTypeId_CommentCount_Score** is the best option to satisfy both... think Prisoner's Dillemma here!




## Field order guidelines

Remember, these a guidelines, NOT rules!

- Fields you use the most often should go first
- When doing range scans, selectivity matters
- Comprimise involves:
    - prioritizing reads vs writes
    - prioritizing which queries need to be the fastest
    - caching data in the application
    - spending more money on hardware

Remember, the better you know your workload, the better your decision become.

### So how do we get this so-called Workload?

- Ask the users
- Ask your gut
- Capture in Extended Events
- Use a Monitoring Tool
- read the plan cache with sp_BlitzCache (more on this soon...)

### Note the blind spots

- Recent Server Restart
- Azure SQL DB; random, unpredictable restarts
- Memory pressure (1TB data on 16GB ram)
- Poison waits (resource_semaphore)
- Apps running unparameterized strings
- Option RECOMPILE
- Wildly different workloads at diffent times of day

### Find Resource Intensive Queries using sp_BlitzCache 
- The sp_BlitzCache result set include a Warning, and also provides a glossary of warnings which include a link for reference
- For troubleshooting indexes, in our case we want a SortOrder by 'reads' or 'avgreads'
- Review the Total Reads columns, and find the biggest culprits. Here's where we get our bang for buck
- Compare and contrast number of executions. Determine if:
    - the query is run frequently and produces a large number of total reads, or
    - the query is run infrequently but produces large number of total reads
    - also use the Avg Total reads column to find the average reads per execution

### sp_BlitzCache also finds... ###
- Missing indexes
- Implicit conversion
- Forced serialization
- Table variables
- Expensive sorts
- Expensive key lookups
- Columnstore indexes not in batch mode


### Aside: Sort Order for different wait types

| Wait Type               | Sort Order    |
| ---                     | ---           |
| CXPACKET, CXCONSUMER    | reads, cpu    |
| SOS_SCHEDULER_YIELD     | cpu           |
| RESOURCE_SEMAPHORE      | memory grant  |
| PAGEIOLATCH             | reads         |


### Query Tuning 
- Now we check our Warning column, and look for any cases of Missing Indexes
- Head to the Query Plan column and open up the current query. 
    - Note that the query plan only shows one index recommendation at a time, so view the XML to see them all
    - Potentially repeats the same query more than once, especially in cases where the index has been used more than once
- Once we start going down this path, we can get sucked down the wormhole of diminishing returns. Break out the hourglass for this one, we're aiming for bang for buck - stop if you're time vs effort ratio starts to plateau

### The T part of D.E.A.T.H
1. Find the top resource consuming queries with sp_BlitzCache
2. Acknowledge recommendations, but dont take as gospel
3. Equality fields first, inequality second (BUT TEST!)
4. ORDER BY + JOINs also affected
5. Few indexes as possible to get 'most' queries running happily - aim for the best-case performance for as many as you can, but not necessarily all.
6. Bonus: SentryOne Plan Explorer

> **Prioritise** 
>- focus your workload on specific user issues / identified performance issues / business criticality / user-interaction<br>
>- a poorly performing but critical query that isn't performing at its best with our current indexes may just require its own index - use your judgement here, but remember to test your cases.


## G's Notes
Thinking about a possible process for identifying query performance. 

1. Map all functions, stored procs to their relevant query plans, and identify performance metrics (most used, total / average reads).
2. Map the query plans to their relevant indexes / tables, and identify all the fields that are being used
3. Identify all key columns for a query, notably foreign key fields, or those being used in a WHERE, JOIN, ORDER BY, GROUPING clause
4. Calculate the uniqueness of values for each of these fields, and the uniqueness based on the combinations of fields used per query
5. **Prioritise** - Assess query performance, start tuning the problematic / worst performing / business critical queries
6. Identify other objects or queries using this index - how is it perfoming?  
7. Repeat this process, and ensure we can strile the lowest amount of indexes that can best provide coverage for most queries! (max ~ 5 -7)


