# Demo 7 - Time Series With Transactions

In this demo, we will look at the basics of time series, using Wake County transactions as a sample.  We want to look at some basic details by day, starting at a very high level and digging down as we see interesting things.

For this demo, we will use pyodbc and ipython-sql.  pyodbc is an ODBC driver for Python, whereas ipython-sql allows you to use "sql magic" in Jupyter.  You can just as easily run the queries in SQL Server Management Studio if you prefer.

First, let's use pip to install pyodbc and ipython-sql and prep them for load.

In [None]:
!pip install pyodbc

To load pyodbc, we can use the **import** statement.

In [None]:
import pyodbc

In [None]:
!pip install ipython-sql

To use SQL magic, we will need to run the following load command.

In [None]:
%load_ext sql

From here on out, I can use the *%sql* command to run a single-line SQL command.  I can also use the *%%sql* command to run multi-line SQL commands.

The first thing I want to connect to the OutlierDetection database.  I have already created an ODBC connection pointing to localhost.OutlierDetection.  You do not need to use a pre-defined ODBC connection, but when connecting to SQL Server, I've found it easier to use a pre-defined connection.

In [None]:
%sql mssql+pyodbc://OutlierDetection

Our first query will get a simple number of number of transactions (with actuals) by day, ordered by record date.  This is an easy query, but gives us a good idea of how the data is distributed.  We can scan through the results to get ideas of magnitude.

A good idea here would be to think about how we could visualize this data!

In [None]:
%%sql
SELECT
	t.RecordDate,
	COUNT(1) AS NumberOfTransactions
FROM Wake.WakeTransaction t
WHERE
	t.ActualAmount > 0
GROUP BY
	t.RecordDate
ORDER BY
	t.RecordDate;

We can see that there is a pretty wide variance in the number of events by day, so let's then look at the big days, that is cases where the number of positive-sum transactions is greater than 1000 for a single day.

In [None]:
%%sql
WITH records AS
(
	SELECT
		t.RecordDate,
		COUNT(1) AS NumberOfTransactions
	FROM Wake.WakeTransaction t
	WHERE
		t.ActualAmount > 0
	GROUP BY
		t.RecordDate
)
SELECT
	r.RecordDate,
	DATENAME(WEEKDAY, r.RecordDate) AS DayOfWeek,
	r.NumberOfTransactions
FROM records r
WHERE
	r.NumberOfTransactions > 1000
ORDER BY
	NumberOfTransactions DESC;

There are a couple of patterns we can eyeball.  First, Fridays are pretty popular.  Second, we can see a lot of results for the second of each month.  Let's look further into the cases on the second of the month, using September 2nd as an example.

In [None]:
%%sql
SELECT
	t.CostCenter,
	t.VendorName,
	et.ExpenditureTypeName,
	ecat.ExpenditureCategoryName,
	ec.ExpenditureClassName,
	eli.ExpenditureLineItemName,
	t.ActualAmount
FROM Wake.WakeTransaction t
	INNER JOIN Wake.ExpenditureClass ec
		ON t.ExpenditureClassCode = ec.ExpenditureClassCode
	INNER JOIN Wake.ExpenditureLineItem eli
		ON t.ExpenditureLineItemCode = eli.ExpenditureLineItemCode
	INNER JOIN Wake.ExpenditureCategory ecat
		ON eli.ExpenditureCategoryCode = ecat.ExpenditureCategoryCode
	INNER JOIN Wake.ExpenditureType et
		ON ecat.ExpenditureTypeCode = et.ExpenditureTypeCode
WHERE
	t.RecordDate = '2016-09-02'
	and t.ActualAmount > 0
ORDER BY
	t.ActualAmount;

Look at the vendor names, we can see a *lot* of PCard (payment card) records.  Is it possible that these are the big difference between the 2nd of the month and the other days?

In [None]:
%%sql
SELECT
	CASE WHEN t.VendorName LIKE '%PCard%' THEN 1 ELSE 0 END AS IsPaymentCard,
	COUNT(1) AS NumberOfTransactions
FROM Wake.WakeTransaction t
WHERE
	t.RecordDate = '2016-09-02'
	and t.ActualAmount > 0
GROUP BY
	CASE WHEN t.VendorName LIKE '%PCard%' THEN 1 ELSE 0 END;

It certainly looks like a significant percentage of payments are for payment cards.  So let's look at the total set of results.

In [None]:
%%sql
SELECT
	t.RecordDate,
	SUM(CASE WHEN t.VendorName LIKE '%PCard%' THEN 1 ELSE 0 END) AS IsPaymentCard,
	SUM(CASE WHEN t.VendorName LIKE '%PCard%' THEN 0 ELSE 1 END) AS NotPaymentCard,
	COUNT(1) AS NumberOfTransactions
FROM Wake.WakeTransaction t
WHERE
	t.ActualAmount > 0
GROUP BY
	t.RecordDate
ORDER BY
	t.RecordDate;

The pattern does appear to hold, at least for July through December.  But in the data set that we currently have (which is a work in progress), we do not have any details about payment card transactions after December.

How about negative amounts?

In [None]:
%%sql
SELECT
	t.RecordDate,
	COUNT(1) AS NumberOfTransactions
FROM Wake.WakeTransaction t
WHERE
	t.ActualAmount < 0
GROUP BY
	t.RecordDate
ORDER BY
	t.RecordDate;

It's interesting that we often see negative transactions cluster around the 15th and the end of each month.  My first guess is that it is related to employee benefits.  So let's test that conjecture!  We will pick a day in February and check it out.

In [None]:
%%sql
SELECT
	t.CostCenter,
	t.VendorName,
	et.ExpenditureTypeName,
	ecat.ExpenditureCategoryName,
	ec.ExpenditureClassName,
	eli.ExpenditureLineItemName,
	t.ActualAmount
FROM Wake.WakeTransaction t
	INNER JOIN Wake.ExpenditureClass ec
		ON t.ExpenditureClassCode = ec.ExpenditureClassCode
	INNER JOIN Wake.ExpenditureLineItem eli
		ON t.ExpenditureLineItemCode = eli.ExpenditureLineItemCode
	INNER JOIN Wake.ExpenditureCategory ecat
		ON eli.ExpenditureCategoryCode = ecat.ExpenditureCategoryCode
	INNER JOIN Wake.ExpenditureType et
		ON ecat.ExpenditureTypeCode = et.ExpenditureTypeCode
WHERE
	t.RecordDate = '2017-02-26'
	and t.ActualAmount < 0
ORDER BY
	CostCenter;

In fact, most of the results here are dental and medical plan benefits.  These are the employee and employer shares of dental and medical plans.  Let's break it out by share.

In [None]:
%%sql
SELECT
	t.RecordDate,
	SUM(CASE WHEN eli.ExpenditureLineItemName = 'Contra-Employer Share-Dental' THEN ActualAmount ELSE 0 END) AS EmployerDental,
	SUM(CASE WHEN eli.ExpenditureLineItemName = 'Contra-Employee Share-Dental' THEN ActualAmount ELSE 0 END) AS EmployeeDental,
	SUM(CASE WHEN eli.ExpenditureLineItemName = 'Contra-Employer Share-Health' THEN ActualAmount ELSE 0 END) AS EmployerHealth,
	SUM(CASE WHEN eli.ExpenditureLineItemName = 'Contra-Employee Share-Health' THEN ActualAmount ELSE 0 END) AS EmployeeHealth
FROM Wake.WakeTransaction t
	INNER JOIN Wake.ExpenditureClass ec
		ON t.ExpenditureClassCode = ec.ExpenditureClassCode
	INNER JOIN Wake.ExpenditureLineItem eli
		ON t.ExpenditureLineItemCode = eli.ExpenditureLineItemCode
	INNER JOIN Wake.ExpenditureCategory ecat
		ON eli.ExpenditureCategoryCode = ecat.ExpenditureCategoryCode
	INNER JOIN Wake.ExpenditureType et
		ON ecat.ExpenditureTypeCode = et.ExpenditureTypeCode
WHERE
	t.ActualAmount < 0
GROUP BY
	t.RecordDate
ORDER BY
	t.RecordDate;

It looks like for the dental plan, it's almost a 53-47 split.  For the health plan, it's about an 80-20 split.