# Demo 6 - Cohort Analysis

In the last notebook, we noticed that there were two beats with a significantly larger number of MISC instances than anyone else.  Let's dig into this!

In [None]:
!pip install pyodbc

To load pyodbc, we can use the **import** statement.

In [None]:
import pyodbc

In [None]:
!pip install ipython-sql

To use SQL magic, we will need to run the following load command.

In [None]:
%load_ext sql

We next need to connect to the OutlierDetection ODBC source.

In [None]:
%sql mssql+pyodbc://OutlierDetection

This query gives me the number of incidents by beat where the incident type fits into the category "miscellaneous."

In [None]:
%%sql
SELECT
	i.BeatID,
	COUNT(1) AS NumberOfIncidents
FROM Raleigh2014.Incident i
	INNER JOIN Raleigh2014.IncidentCode ic
		ON i.IncidentCode = ic.IncidentCode
	INNER JOIN Raleigh2014.IncidentType it
		ON ic.IncidentTypeID = it.IncidentTypeID
WHERE
	it.IncidentType = 'MISC'
GROUP BY
	i.BeatID
ORDER BY
	NumberOfIncidents DESC;

We can now see that beats 2403 and 2402 are huge outliers:  2403 is 3x as big as the third-biggest row, and 2402 is about 1.5 times bigger than the third-biggest row.  This...seems odd.  The next question is, what is so special about beat 2403?  Let's take a closer look at the individual incident descriptions to see if we can figure something out.

In [None]:
%%sql
SELECT
	ic.IncidentDescription,
	COUNT(1) AS NumberOfIncidents
FROM Raleigh2014.Incident i
	INNER JOIN Raleigh2014.IncidentCode ic
		ON i.IncidentCode = ic.IncidentCode
	INNER JOIN Raleigh2014.IncidentType it
		ON ic.IncidentTypeID = it.IncidentTypeID
WHERE
	i.BeatID = 2403
	AND it.IncidentType = 'MISC'
GROUP BY
	ic.IncidentDescription
ORDER BY
	NumberOfIncidents DESC,
	IncidentDescription;


Now it makes sense:  almost all of the miscellaneous types involve mental commmitment.  4543 records is, by itself, the difference between being an extreme outlier versus being within the inter-quartile range.

So what about beat 2402?  Will we see something similar?

In [None]:
%%sql
SELECT
	ic.IncidentDescription,
	COUNT(1) AS NumberOfIncidents
FROM Raleigh2014.Incident i
	INNER JOIN Raleigh2014.IncidentCode ic
		ON i.IncidentCode = ic.IncidentCode
	INNER JOIN Raleigh2014.IncidentType it
		ON ic.IncidentTypeID = it.IncidentTypeID
WHERE
	i.BeatID = 2402
	AND it.IncidentType = 'MISC'
GROUP BY
	ic.IncidentDescription
ORDER BY
	NumberOfIncidents DESC,
	IncidentDescription;

As we can see here, mental commitment is definitely the cause of these two beats being extreme outliers.  My conjecture is that there is a mental health facility somewhere around beats 2402 and 2403.  The way we can get latitudes and longitudes for various incidents on this beat.  We are going to focus only on the mental committment cases. 

In [None]:
%%sql
SELECT
	CAST(i.IncidentLocation.Lat AS DECIMAL(5,2)) AS Latitude,
	CAST(i.IncidentLocation.Long AS DECIMAL(5,2)) AS Longitude,
	COUNT(1) AS NumberOfIncidents
FROM Raleigh2014.Incident i
	INNER JOIN Raleigh2014.IncidentCode ic
		ON i.IncidentCode = ic.IncidentCode
	INNER JOIN Raleigh2014.IncidentType it
		ON ic.IncidentTypeID = it.IncidentTypeID
WHERE
	i.BeatID IN (2403, 2402)
	AND it.IncidentType = 'MISC'
	AND ic.IncidentDescription = 'MISC/MENTAL COMMITMENT'
GROUP BY
	CAST(i.IncidentLocation.Lat AS DECIMAL(5,2)),
	CAST(i.IncidentLocation.Long AS DECIMAL(5,2))
ORDER BY
	NumberOfIncidents DESC;

Now let's plot a Google map centered around 35.78, -78.59.  There should probably be a mental health facility here.

![Google map showing pinpoints at (35.78, -78.59) and surroundings.](Images/MentalHealthMapSmall.png)

As we can see, Wake County Human Services is right between our two main lat-long pairs, and there is an addiction treatment center slightly north.  This explains the massive discrepancy between these two beats and the rest.