## Budget Classifier

#### Setup and Context

In this notebook, I will be using transaction history data pulled from a Nigerian bank's API to train a budget classifier. The aim is to try and understand trends in spending and if possible, eventually be able to provide some sort of trained model per account that predicts spending trends.

This seems pretty trivial in the western world where majority of spending is done via credit and debit cards. Most transaction history data has enough information to adequately classify a transaction into buckets such as transportation, entertainment, groceries, etc. A couple of banks in canada have even rolled out this feature as some sort of monthly account summary. The biggest challenge i presume they would have is classifying things bought at mega stores such as Walmart or Costco. They would generally be classified as groceries when infact, one could have been shopping for clothes or other household items.

Another challenge would be the miscellaneous cash withdrawals that cannot be classified because no one really knows what withdrawn cash is used for. Majority of the average Nigerian's spending is based on cash withdrawals so one would doubt the potency of classification techniques to build a sort of monthly summary for an account. Is there a way to extract more information from transaction histories? Could features like ATM location crossed with time of transaction be useful to predict trends in withdrawals? These are the questions im hoping to answer with this project.

first import necessary dependencies

In [6]:
import requests
import json
import math
from IPython import display
from matplotlib import cm
from matplotlib import gridspec
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from sklearn import metrics
import tensorflow as tf
from tensorflow.python.data import Dataset

pd.options.display.max_rows = 10

Import json data from rest API

In [2]:
payload = "{\"Envelope\":{\"xmlns\":\"http://schemas.xmlsoap.org/soap/envelope/\",\"Body\":{\"GetData\":{\"xmlns\":\"http://tempuri.org/\",\"AccountNumber\":\"0122863683\",\"StartDate\":\"2015-05-30T09:00:00\",\"EndDate\":\"2017-05-30T09:00:00\"}}}}"

headers = {
    'x-ibm-client-id': "6846f45b-129c-44cc-9a3a-6d37c35a7a89",
    'content-type': "application/json",
    'accept': "application/json"
    }

r = requests.post('https://api.gtbank.com/GetTransactionHistory', data=payload, headers=headers)
trans_dict = json.loads(r.text)

arrayy = trans_dict["Envelope"]["Body"]["GetDataResponse"]["GetDataResult"]["ResponseData"]["ALLTRANS"]["TransactionDetails"]["TransactionDetail"]

Fancy visualization of the Json data

In [5]:
data = pd.DataFrame(arrayy)
#just show the first 5
for i in range(5):
    for a, b in arrayy[i].items():
        print("{}: \t{}".format(a, b))
    print("\n")

Acctno: 	0122863683
Channel: 	Spend to Save  Contribution
Cr_Amt: 	0
Dr_Amt: 	36
Remark: 	Savings from expenses from 205/152573/1/1
Tra_Bal: 	225006.35
Tra_Date: 	5/31/2017 4:03:01 AM
Trans_Type: 	Spend to Save  Contribution
Val_Date: 	5/30/2017 4:03:01 AM


Acctno: 	0122863683
Channel: 	MEDICAL BILL/REFUND
Cr_Amt: 	50466.97
Dr_Amt: 	0
Remark: 	Medrefund May 2017
Tra_Bal: 	225042.35
Tra_Date: 	5/30/2017 3:37:21 PM
Trans_Type: 	MEDICAL BILL/REFUND
Val_Date: 	5/31/2017 3:37:21 PM


Acctno: 	0122863683
Channel: 	MISC.
Cr_Amt: 	106161.85
Dr_Amt: 	0
Remark: 	May 2017 Allow
Tra_Bal: 	174575.38
Tra_Date: 	5/30/2017 3:36:03 PM
Trans_Type: 	MISC.
Val_Date: 	5/31/2017 3:36:03 PM


Acctno: 	0122863683
Channel: 	MTHLY SALARY
Cr_Amt: 	68353.56
Dr_Amt: 	0
Remark: 	May 2017 Sal
Tra_Bal: 	68413.53
Tra_Date: 	5/30/2017 3:34:13 PM
Trans_Type: 	MTHLY SALARY
Val_Date: 	5/31/2017 3:34:13 PM


Acctno: 	0122863683
Channel: 	USSD
Cr_Amt: 	0
Dr_Amt: 	200
Remark: 	USSD-101CT0000000000361857077-2348081467539
Tra

Convert data to pandas dataframe

In [7]:
data = pd.DataFrame(arrayy)
data

Unnamed: 0,Acctno,Channel,Cr_Amt,Dr_Amt,Remark,Tra_Bal,Tra_Date,Trans_Type,Val_Date
0,0122863683,Spend to Save Contribution,0,36,Savings from expenses from 205/152573/1/1,225006.35,5/31/2017 4:03:01 AM,Spend to Save Contribution,5/30/2017 4:03:01 AM
1,0122863683,MEDICAL BILL/REFUND,50466.97,0,Medrefund May 2017,225042.35,5/30/2017 3:37:21 PM,MEDICAL BILL/REFUND,5/31/2017 3:37:21 PM
2,0122863683,MISC.,106161.85,0,May 2017 Allow,174575.38,5/30/2017 3:36:03 PM,MISC.,5/31/2017 3:36:03 PM
3,0122863683,MTHLY SALARY,68353.56,0,May 2017 Sal,68413.53,5/30/2017 3:34:13 PM,MTHLY SALARY,5/31/2017 3:34:13 PM
4,0122863683,USSD,0,200,USSD-101CT0000000000361857077-2348081467539,59.97,5/29/2017 1:53:08 PM,Airtime Purchase,5/30/2017 1:53:08 PM
...,...,...,...,...,...,...,...,...,...
1466,0122863683,CASH WITHDRAWAL FROM OUR ATM,0,8000,-009732- -GTBank 163/165 APAPA RDEBUTE METTA ...,41889.89,6/1/2015 7:43:22 AM,CASH WITHDRAWAL FROM OUR ATM,6/1/2015 7:43:22 AM
1467,0122863683,POS,0,1300,-000688- -SWEET SENSATION CONFECTTIONARY LIMIT...,49889.89,6/1/2015 3:43:43 PM,POS/WEB PURCHASE TRANSACTION,6/1/2015 3:43:43 PM
1468,0122863683,POS,0,3389.84,-031704- -MSFT *WINDOWS STORE BILL.MS.NET ...,51189.89,6/1/2015 3:39:31 AM,POS/WEB PURCHASE TRANSACTION,6/1/2015 3:39:31 AM
1469,0122863683,Mobile Banking,0,500,airtime recharge via MBANKING-101C000000000006...,54579.73,6/1/2015 4:51:54 PM,Airtime Purchase,6/1/2015 4:51:54 PM


First thing to note is that this is near perfect data because there is a remark for every transaction, this sort of makes our task easy. In the real world, most people dont annotate a remark on every transaction (or at least i speak for myself). 
