# Lab 1.5

### 1. Troubleshooting Index Query #1



In [None]:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

ALTER PROC [dbo].[usp_Q7521] @UserId INT AS
BEGIN
/* Source: http://data.stackexchange.com/stackoverflow/query/7521/how-unsung-am-i */

-- How Unsung am I?
-- Zero and non-zero accepted count. Self-accepted answers do not count.

select
    count(a.Id) as [Accepted Answers],
    sum(case when a.Score = 0 then 0 else 1 end) as [Scored Answers],  
    sum(case when a.Score = 0 then 1 else 0 end) as [Unscored Answers],
    sum(CASE WHEN a.Score = 0 then 1 else 0 end)*1000 / count(a.Id) / 10.0 as [Percentage Unscored]
from
    Posts q
  inner join
    Posts a
  on a.Id = q.AcceptedAnswerId
where
      a.CommunityOwnedDate is null
  and a.OwnerUserId = @UserId
  and q.OwnerUserId != @UserId
  and a.PostTypeId = 2
END
GO


Brent introduces a strategy to identify the selectivity of a JOIN query by doing the following:

1. Extract both tables into seperate SELECT statements, including the where clause against the relevant tables
2. Add the JOIN condition to the WHERE clause, WHERE table.Id IN (Id's from Join Table)

Unpacking the above query, we have:

In [None]:
-- Query A
SELECT *
FROM dbo.Posts a
WHERE 
    a.CommunityOwnedDate IS NULL
    AND a.OwnerUserID = @UserId
    AND a.PostTypeId = 2
    AND a.Id IN (...)

-- Query B
SELECT * 
FROM dbo.Posts q
WHERE q.OwnerUserID = @UserId
AND a.PostTypeId = 2

Lets start with Query A and ask - How selective are the fields in the WHERE clause?

We want to find which process is going to be executed first (ie. the one with the least overhead), and how much work that creates downstream.

In [None]:
-- Find COUNT for the hardcoded values
SELECT COUNT(*) FROM dbo.Posts WHERE CommunityOwnedDate IS NULL; -- 40mil rows... Not very selective, lots of people match
SELECT COUNT() FROM dbo.Posts WHERE PostTypeId = 2; -- 24mil rows... Neither is this

-- Who are the users who own the most posts?
SELECT TOP 100 OwnerUserID, COUNT(*) AS recs -- by itself, not very selective
FROM dbo.Posts 
GROUP BY OwnerUserID
ORDER BY COUNT(*) DESC;


Now we create Key on OwnerUserId and PostTypeId, doesn't matter what goes first

We should create a covering index by including CommunityOwnedDate (WHERE clause), and Score (calculated field in SELECT). This saves SQL from having to go back to the table to lookup these rows.


In [None]:
CREATE INDEX IX_OwnerUserId_PostTypeId_Includes (OwnerUserId, PostTypeId)
ON dbo.Posts INCLUDE (CommunityOwnedDate, Score)

Now look at Query B, we have the following:

In [None]:
-- Query B
SELECT * 
FROM dbo.Posts q
WHERE q.OwnerUserID != @UserId
AND q.AcceptedAnswerId IN (...)

We might think to put AcceptedAnswer in our key column due to its equality search, but hang on a sec...

Look at the filter OwnerUserID. It's asking for everything that ISNT in UserId. If we were to add AcceptedAnswerId in our current Key, the ordering on that would be all sorts of messed up, and no where near optimal for this particular query.

If we wanted to make this run optimally, make this work, we would need a **SEPERATE** index for AcceptedAnswerID

> **So would we need to add OwnerUserId to the key?** <br>
>  
>* Start by creating both indexes, running the queries against each index (WITH INDEX hint). 
>* If there is a significant difference in reads or IO, AND the query is important enough to warrant optimal performance - then add the key.<br>
>* If it's not much of a difference, leave it off. We may find another query later down the track that actually needs an ordering applied. If you do go an change the index later, just remember to keep an eye on the other query plans that use this index.

Now continue this for as many missing indexes as we can (identified by BlitzCache) for the next 30 mins.



## Process so far...

Note that at this stage:
- We did NOT look at their execution plans
- We did NOT look at existing indexes on the tables (does it make sense to merge)
- We did NOT execute the queries to see their logical reads before / after (ie. parameter sniffing)
- We did NOT compare the indexes
- We did NOT the change afterwards to make sure the indexes got picked up

Instead, we did the following:
- Identified the queries that need help (blitz cache)
- Decomposed the query
- Scripted index based on what it needs

Out in the wild, our 30 mins will be used up with the following process per query:
1.  Find the query
2. Run the query /w actual plans
3. Hand craft the index design
4. Create the index
5. Execute the query to see if the new index works, monitor its metrics
6. Compare other indexes that already exist, merge 'em

> **Protip #1** - Use sp_who2 to find what index is currently building, plus the SPID to run in the index progress query
> **Protip #2** - When comparing BlitzCache results, split the tab screens horizontally / vertically. Compare the Avg Reads columns of both result sets to ensure the new index has been picked up. 

## Questions

Q: Is the proc cache cleared when you create a new index for that object?
A: No, it doesnt always pick up new indexes. You'll need to run the sp_recompile command to free the proc cache against the object



