# Demo 3 - Durham 2017 Crime Analysis

In this demo, we will look at some basic date-related measures relating to crime in Durham, North Carolina.  Our data set comes from the city of Durham, and we will analyze the data in the rest of this notebook.

For this demo, we will use pyodbc and ipython-sql.  pyodbc is an ODBC driver for Python, whereas ipython-sql allows you to use "sql magic" in Jupyter.  You can just as easily run the queries in SQL Server Management Studio if you prefer.

First, let's use pip to install pyodbc and ipython-sql and prep them for load.

In [None]:
!pip install pyodbc

To load pyodbc, we can use the **import** statement.

In [None]:
import pyodbc

In [None]:
!pip install ipython-sql

To use SQL magic, we will need to run the following load command.

In [None]:
%load_ext sql

From here on out, I can use the *%sql* command to run a single-line SQL command.  I can also use the *%%sql* command to run multi-line SQL commands.

The first thing I want to connect to the OutlierDetection database.  I have already created an ODBC connection pointing to localhost.OutlierDetection.  You do not need to use a pre-defined ODBC connection, but when connecting to SQL Server, I've found it easier to use a pre-defined connection.

In [None]:
%sql mssql+pyodbc://OutlierDetection

Let's start by looking at a few records from the data set to give us an idea of what's available.

**NOTE:** the geography data type is not supported within IPython-SQL, so we will not include that column in this notebook.

In [None]:
%%sql
SELECT TOP 5
	c.IncidentID,
	c.DateReported,
	c.DateOccurred,
	c.DateFound,
	c.IncidentReportCategory,
	c.UCRCode,
	c.ReportedToUCR,
	c.ChargeDescription,
	c.CSStatus,
	c.CSStatusDate,
	c.District,
	c.Zone
FROM Durham2017.Crime c;

We see a few date columns in this data set, so we will focus on those next.  First up, I want to see if there are any strange date orders, starting with whether there were any events reported after the occurrence date or the found date.

In [None]:
%%sql
SELECT COUNT(1) AS NumberOfOutliers
FROM Durham2017.Crime c
WHERE
	c.DateReported < c.DateOccurred;

No incidents were reported before their occurrence date, so how about found date?

In [None]:
%%sql
SELECT COUNT(1) AS NumberOfOutliers
FROM Durham2017.Crime c
WHERE
	c.DateFound < c.DateOccurred;

Similarly, we have no records where the date occurred is before the date found.  This is good because it helps us understand that the data set follows the chronological order we would expect.

What about old incidents?  Let's look at incidents which were reported at least three months after they took place.

In [None]:
%%sql
SELECT COUNT(1) AS LateReportedCrimes
FROM Durham2017.Crime c
WHERE
	c.DateReported > DATEADD(MONTH, 3, c.DateOccurred);

We have nearly 1800 crimes which took place at least three months before reporting.  This bears further investigation, so let's look at it by incident report category.

In [None]:
%%sql
SELECT
	c.IncidentReportCategory,
	COUNT(1) AS NumberOfIncidents
FROM Durham2017.Crime c
WHERE
	c.DateReported > DATEADD(MONTH, 3, c.DateOccurred)
GROUP BY
	c.IncidentReportCategory
HAVING
    COUNT(1) > 50
ORDER BY
	NumberOfIncidents DESC;

Based on this, fraud appears to be the most common long-lasting problem, where it might take more than three months for news of the fraud to get back to the victim.  We can also look at the eventual charged crime category to see if the results look similar.

In [None]:
%%sql
SELECT
	c.ChargeDescription,
	COUNT(1) AS NumberOfIncidents
FROM Durham2017.Crime c
WHERE
	c.DateReported > DATEADD(MONTH, 3, c.DateOccurred)
GROUP BY
	c.ChargeDescription
HAVING
	COUNT(1) > 50
ORDER BY
	NumberOfIncidents DESC;

Fraud, larceny, and burglary are the three most common at the 3+ month range.

There are some reports years after the incident, and we can see the breakdown here.

In [None]:
%%sql
SELECT
	DATEPART(YEAR, c.DateReported) AS ReportYear,
	DATEDIFF(DAY, c.DateOccurred, c.DateReported) AS DaysBetweenOccurrenceAndReport,
	DATEDIFF(DAY, c.DateFound, c.DateReported) AS DaysBetweenFindingAndReport,
	c.IncidentReportCategory,
	c.ChargeDescription
FROM Durham2017.Crime c
WHERE
	DATEDIFF(DAY, c.DateOccurred, c.DateReported) > 3650

In this data set, there are 55 such incidents.  Fraud is still the most common, but sexual assault and rape appear with high frequency.