# Demo 9 - Round Numbers

In this demo, we will perform round number analysis on the Wake County transactions data set.

For this demo, we will use pyodbc and ipython-sql.  pyodbc is an ODBC driver for Python, whereas ipython-sql allows you to use "sql magic" in Jupyter.  You can just as easily run the queries in SQL Server Management Studio if you prefer.

First, let's use pip to install pyodbc and ipython-sql and prep them for load.

In [None]:
!pip install pyodbc

To load pyodbc, we can use the **import** statement.

In [None]:
import pyodbc

In [None]:
!pip install ipython-sql

To use SQL magic, we will need to run the following load command.

In [None]:
%load_ext sql

From here on out, I can use the *%sql* command to run a single-line SQL command.  I can also use the *%%sql* command to run multi-line SQL commands.

The first thing I want to connect to the OutlierDetection database.  I have already created an ODBC connection pointing to localhost.OutlierDetection.  You do not need to use a pre-defined ODBC connection, but when connecting to SQL Server, I've found it easier to use a pre-defined connection.

In [None]:
%sql mssql+pyodbc://OutlierDetection

Round Number analysis looks for the number of trailing zeroes before the decimal.  The idea here is that people might be rounding off values and pocketing the remainder, so a bill of $41.08 might be rounded up to $50.

We will break down transactions into types:  type 0, 1, 2, 3, and 4+.  A type 0 has zero trailing 0s, whereas a 4+ would have at least four trailing 0s.

**Examples:**

\$58 is a type 0.

\$108 is a type 0.

\$110 is a type 1.

\$34,000 is a type 3.

This particular query uses the CROSS APPLY operator to make the query a bit easier to understand.

In [None]:
%%sql
WITH records AS
(
	SELECT
		t.VendorName,
		ROUND(t.ActualAmount, 0) AS ActualAmount
	FROM Wake.WakeTransaction t
	WHERE
		t.ActualAmount > 0
)
SELECT
	r.VendorName,
	SUM(t4.IsType4) AS Type4,
	SUM(t3.IsType3) AS Type3,
	SUM(t2.IsType2) AS Type2,
	SUM(t1.IsType1) AS Type1,
	SUM(t0.IsType0) AS Type0,
	COUNT(1) AS NumberOfInvoices,
	CAST(100.0 * SUM(t0.IsType0) / COUNT(1) AS DECIMAL(5,2)) AS PercentType0
FROM records r
	CROSS APPLY(SELECT CASE WHEN r.ActualAmount % 10000 = 0 THEN 1 ELSE 0 END AS IsType4) t4
	CROSS APPLY(SELECT CASE WHEN t4.IsType4 = 0 AND r.ActualAmount % 1000 = 0 THEN 1 ELSE 0 END AS IsType3) t3
	CROSS APPLY(SELECT CASE WHEN t3.IsType3 = 0 AND r.ActualAmount % 100 = 0 THEN 1 ELSE 0 END AS IsType2) t2
	CROSS APPLY(SELECT CASE WHEN t2.IsType2 = 0 AND r.ActualAmount % 10 = 0 THEN 1 ELSE 0 END AS IsType1) t1
	CROSS APPLY(SELECT CASE WHEN t4.IsType4 = 0 AND t3.IsType3 = 0 AND t2.IsType2 = 0 AND t1.IsType1 = 0 THEN 1 ELSE 0 END AS IsType0) t0
GROUP BY
	r.VendorName
ORDER BY
	Type2 DESC;

This is sorted by the number of type 2 records.  In a realistic data set (like the one we have), there is a natural spread, and sometimes you will see "big round numbers" like in Fringe YTD.

Next up, lets look at the high-level stats.

In [None]:
%%sql
WITH records AS
(
	SELECT
		t.VendorName,
		ROUND(t.ActualAmount, 0) AS ActualAmount
	FROM Wake.WakeTransaction t
	WHERE
		t.ActualAmount > 0
)
SELECT
	SUM(t4.IsType4) AS Type4,
	SUM(t3.IsType3) AS Type3,
	SUM(t2.IsType2) AS Type2,
	SUM(t1.IsType1) AS Type1,
	SUM(t0.IsType0) AS Type0,
	COUNT(1) AS NumberOfInvoices,
	CAST(100.0 * SUM(t0.IsType0) / COUNT(1) AS DECIMAL(5,2)) AS PercentType0
FROM records r
	CROSS APPLY(SELECT CASE WHEN r.ActualAmount % 10000 = 0 THEN 1 ELSE 0 END AS IsType4) t4
	CROSS APPLY(SELECT CASE WHEN t4.IsType4 = 0 AND r.ActualAmount % 1000 = 0 THEN 1 ELSE 0 END AS IsType3) t3
	CROSS APPLY(SELECT CASE WHEN t3.IsType3 = 0 AND r.ActualAmount % 100 = 0 THEN 1 ELSE 0 END AS IsType2) t2
	CROSS APPLY(SELECT CASE WHEN t2.IsType2 = 0 AND r.ActualAmount % 10 = 0 THEN 1 ELSE 0 END AS IsType1) t1
	CROSS APPLY(SELECT CASE WHEN t4.IsType4 = 0 AND t3.IsType3 = 0 AND t2.IsType2 = 0 AND t1.IsType1 = 0 THEN 1 ELSE 0 END AS IsType0) t0;

The percentage of type 0 records is roughly 15%.  If this were a true uniform distribution, we'd expect 10%.

Also interesting is the type 4, where we'd expect 1/10^4 = 1/10,000 = 0.01% if digit endings were strictly uniform.  0.01% of 77,578 is 7-8 transactions.  The fact that we have 273 of these might seem a bit weird, and we can investigate further.

In [None]:
%%sql
SELECT
	t.VendorName,
	ROUND(t.ActualAmount, 0) AS ActualAmount
FROM Wake.WakeTransaction t
WHERE
	t.ActualAmount > 0
	AND t.ActualAmount % 10000 = 0
ORDER BY
	t.VendorName,
    ActualAmount;

Prior Year Expense pops up pretty regularly.  We also see a few vendors who have received several big-round-number payouts.

For a data set like this, where we are looking at county government payments, we can expect some big round numbers based on the way grant-writing works.