## __About__

The dataset containing 3 months’ worth of transactions for 100 hypothetical customers. It contains purchases, recurring transactions, and salary transactions.

## Objectives

- start by doing some basic checks – are there any data issues? Does the data need to be cleaned?

- Gather some interesting overall insights about the data. For example -- what is the average transaction amount? How many transactions do customers make each month, on average?

- Segment the dataset by transaction date and time. Visualise transaction volume and spending over the course of an average day or week. Consider the effect of any outliers that may distort your analysis.

- For a challenge – what insights can you draw from the location information provided in the dataset?

- Put together 2-3 slides summarising your most interesting findings to ANZ management.

In [1]:
import pandas as pd

In [2]:
master_df = pd.read_excel("ANZ synthesised transaction dataset.xlsx")

In [3]:
pd.set_option('display.max_columns', None)
master_df.head()

Unnamed: 0,status,card_present_flag,bpay_biller_code,account,currency,long_lat,txn_description,merchant_id,merchant_code,first_name,balance,date,gender,age,merchant_suburb,merchant_state,extraction,amount,transaction_id,country,customer_id,merchant_long_lat,movement
0,authorized,1.0,,ACC-1598451071,AUD,153.41 -27.95,POS,81c48296-73be-44a7-befa-d053f48ce7cd,,Diana,35.39,2018-08-01,F,26,Ashmore,QLD,2018-08-01T01:01:15.000+0000,16.25,a623070bfead4541a6b0fff8a09e706c,Australia,CUS-2487424745,153.38 -27.99,debit
1,authorized,0.0,,ACC-1598451071,AUD,153.41 -27.95,SALES-POS,830a451c-316e-4a6a-bf25-e37caedca49e,,Diana,21.2,2018-08-01,F,26,Sydney,NSW,2018-08-01T01:13:45.000+0000,14.19,13270a2a902145da9db4c951e04b51b9,Australia,CUS-2487424745,151.21 -33.87,debit
2,authorized,1.0,,ACC-1222300524,AUD,151.23 -33.94,POS,835c231d-8cdf-4e96-859d-e9d571760cf0,,Michael,5.71,2018-08-01,M,38,Sydney,NSW,2018-08-01T01:26:15.000+0000,6.42,feb79e7ecd7048a5a36ec889d1a94270,Australia,CUS-2142601169,151.21 -33.87,debit
3,authorized,1.0,,ACC-1037050564,AUD,153.10 -27.66,SALES-POS,48514682-c78a-4a88-b0da-2d6302e64673,,Rhonda,2117.22,2018-08-01,F,40,Buderim,QLD,2018-08-01T01:38:45.000+0000,40.9,2698170da3704fd981b15e64a006079e,Australia,CUS-1614226872,153.05 -26.68,debit
4,authorized,1.0,,ACC-1598451071,AUD,153.41 -27.95,SALES-POS,b4e02c10-0852-4273-b8fd-7b3395e32eb0,,Diana,17.95,2018-08-01,F,26,Mermaid Beach,QLD,2018-08-01T01:51:15.000+0000,3.25,329adf79878c4cf0aeb4188b4691c266,Australia,CUS-2487424745,153.44 -28.06,debit


In [4]:
master_df.tail()

Unnamed: 0,status,card_present_flag,bpay_biller_code,account,currency,long_lat,txn_description,merchant_id,merchant_code,first_name,balance,date,gender,age,merchant_suburb,merchant_state,extraction,amount,transaction_id,country,customer_id,merchant_long_lat,movement
12038,authorized,0.0,,ACC-3021093232,AUD,149.83 -29.47,POS,32aa73dc-b7c2-4161-b14d-6271b96ce792,,Melissa,14054.14,2018-10-31,F,30,Ringwood,VIC,2018-10-31T23:09:06.000+0000,9.79,f2e3e695c2ee4c50a4c8747f852cbe2e,Australia,CUS-55310383,145.23 -37.81,debit
12039,authorized,1.0,,ACC-1608363396,AUD,151.22 -33.87,SALES-POS,296a0500-8552-48ac-ac81-ec37065b568e,,Robert,9137.79,2018-10-31,M,20,Casula,NSW,2018-10-31T23:21:46.000+0000,63.87,56e147e5485f4683b9076fcaaed76640,Australia,CUS-2688605418,150.88 -33.96,debit
12040,authorized,1.0,,ACC-3827517394,AUD,151.12 -33.89,POS,e5975ab4-08f7-4725-a369-24cc0e35ed6e,,Craig,45394.57,2018-10-31,M,28,Kings Park,NSW,2018-10-31T23:34:25.000+0000,43.96,2fdd4681827343f6af2e6519644a684a,Australia,CUS-2663907001,150.92 -33.74,debit
12041,authorized,1.0,,ACC-2920611728,AUD,144.96 -37.76,SALES-POS,af49051a-591d-4b08-bd3c-27730b70ed37,,Tyler,11350.67,2018-10-31,M,69,Oakleigh,VIC,2018-10-31T23:47:05.000+0000,30.77,74aa9cd7e4af4c6d9cd7dbd28e9aedc9,Australia,CUS-1388323263,145.09 -37.91,debit
12042,authorized,1.0,,ACC-1443681913,AUD,150.92 -33.77,SALES-POS,f31f4b14-2040-40ec-a120-b141bb274cbd,,Ryan,5517.91,2018-10-31,M,31,Mascot,NSW,2018-10-31T23:59:44.000+0000,22.36,6d5218e04e8040b9996850ce11a19426,Australia,CUS-3129499595,151.19 -33.93,debit


In [5]:
master_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12043 entries, 0 to 12042
Data columns (total 23 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   status             12043 non-null  object        
 1   card_present_flag  7717 non-null   float64       
 2   bpay_biller_code   885 non-null    object        
 3   account            12043 non-null  object        
 4   currency           12043 non-null  object        
 5   long_lat           12043 non-null  object        
 6   txn_description    12043 non-null  object        
 7   merchant_id        7717 non-null   object        
 8   merchant_code      883 non-null    float64       
 9   first_name         12043 non-null  object        
 10  balance            12043 non-null  float64       
 11  date               12043 non-null  datetime64[ns]
 12  gender             12043 non-null  object        
 13  age                12043 non-null  int64         
 14  mercha

In [6]:
master_df.describe()

Unnamed: 0,card_present_flag,merchant_code,balance,age,amount
count,7717.0,883.0,12043.0,12043.0,12043.0
mean,0.802644,0.0,14704.195553,30.58233,187.933588
std,0.398029,0.0,31503.722652,10.046343,592.599934
min,0.0,0.0,0.24,18.0,0.1
25%,1.0,0.0,3158.585,22.0,16.0
50%,1.0,0.0,6432.01,28.0,29.0
75%,1.0,0.0,12465.945,38.0,53.655
max,1.0,0.0,267128.52,78.0,8835.98


In [7]:
master_df.isnull().sum()

status                   0
card_present_flag     4326
bpay_biller_code     11158
account                  0
currency                 0
long_lat                 0
txn_description          0
merchant_id           4326
merchant_code        11160
first_name               0
balance                  0
date                     0
gender                   0
age                      0
merchant_suburb       4326
merchant_state        4326
extraction               0
amount                   0
transaction_id           0
country                  0
customer_id              0
merchant_long_lat     4326
movement                 0
dtype: int64

In [8]:
master_df['account'].nunique()

100

In [9]:
master_df['account'].value_counts()

ACC-1598451071    578
ACC-1222300524    303
ACC-182446574     292
ACC-4258502723    260
ACC-1037050564    259
                 ... 
ACC-1998563091     40
ACC-3881031190     37
ACC-721712940      34
ACC-4059612845     31
ACC-1217063613     25
Name: account, Length: 100, dtype: int64

In [10]:
master_df['customer_id'].value_counts()

CUS-2487424745    578
CUS-2142601169    303
CUS-3026014945    292
CUS-3378712515    260
CUS-1614226872    259
                 ... 
CUS-3395687666     40
CUS-3201519139     37
CUS-1646183815     34
CUS-495599312      31
CUS-1739931018     25
Name: customer_id, Length: 100, dtype: int64

In [11]:
master_df['txn_description'].nunique()

6

In [12]:
master_df['txn_description'].unique()

array(['POS', 'SALES-POS', 'PAYMENT', 'INTER BANK', 'PAY/SALARY',
       'PHONE BANK'], dtype=object)

In [13]:
master_df['txn_description'].value_counts()

SALES-POS     3934
POS           3783
PAYMENT       2600
PAY/SALARY     883
INTER BANK     742
PHONE BANK     101
Name: txn_description, dtype: int64

In [14]:
salary_data = master_df[master_df['txn_description'] == 'PAY/SALARY']

In [15]:
salary_data["account"].value_counts()

ACC-1973887809    14
ACC-819621312     14
ACC-1598451071    14
ACC-3326339947    14
ACC-2776252858    14
                  ..
ACC-2153562714     3
ACC-3100725361     2
ACC-1217063613     2
ACC-2673069055     2
ACC-211792489      2
Name: account, Length: 100, dtype: int64

In [17]:
salary_data[(salary_data['account'] == 'ACC-819621312')].head()

Unnamed: 0,status,card_present_flag,bpay_biller_code,account,currency,long_lat,txn_description,merchant_id,merchant_code,first_name,balance,date,gender,age,merchant_suburb,merchant_state,extraction,amount,transaction_id,country,customer_id,merchant_long_lat,movement
97,posted,,0,ACC-819621312,AUD,145.04 -37.85,PAY/SALARY,,0.0,Michele,1244.45,2018-08-01,F,30,,,2018-08-01T17:00:00.000+0000,1158.15,148a346f380948cfa1ee8e5166b56a6e,Australia,CUS-1271030853,,credit
977,posted,,0,ACC-819621312,AUD,145.04 -37.85,PAY/SALARY,,0.0,Michele,1951.42,2018-08-08,F,30,,,2018-08-08T17:00:00.000+0000,1158.15,d6bd2c15b9b84ab093d22bb63cff37c1,Australia,CUS-1271030853,,credit
1880,posted,,0,ACC-819621312,AUD,145.04 -37.85,PAY/SALARY,,0.0,Michele,2869.72,2018-08-15,F,30,,,2018-08-15T17:00:00.000+0000,1158.15,1a3f9d04937b4569a0bd71a4aa8c4cde,Australia,CUS-1271030853,,credit
2670,posted,,0,ACC-819621312,AUD,145.04 -37.85,PAY/SALARY,,0.0,Michele,3533.73,2018-08-22,F,30,,,2018-08-22T17:00:00.000+0000,1158.15,7e24a84354ea4777aaa0a036835094ba,Australia,CUS-1271030853,,credit
3606,posted,,0,ACC-819621312,AUD,145.04 -37.85,PAY/SALARY,,0.0,Michele,4537.68,2018-08-29,F,30,,,2018-08-29T17:00:00.000+0000,1158.15,6213c383935345edac8126f91327f0e5,Australia,CUS-1271030853,,credit


In [18]:
salary_data.loc[]

SyntaxError: invalid syntax (<ipython-input-18-d3ac63590a6b>, line 1)

In [19]:
salary_data.groupby(["account"])['amount'].mean()

account
ACC-1037050564     892.09
ACC-1056639002    3195.01
ACC-1199531521    3785.78
ACC-1217063613    4863.62
ACC-1222300524    1002.13
                   ...   
ACC-854938045     2538.68
ACC-90814749      2308.67
ACC-958000567     3517.61
ACC-964839203     1560.20
ACC-966140392     2572.91
Name: amount, Length: 100, dtype: float64

In [None]:
#master_df[master_df['txn_description'] == 'PAY/SALARY' & master_df['account'] == 'ACC-588564840'] 

In [20]:
master_df.groupby(by="txn_description").count()

Unnamed: 0_level_0,status,card_present_flag,bpay_biller_code,account,currency,long_lat,merchant_id,merchant_code,first_name,balance,date,gender,age,merchant_suburb,merchant_state,extraction,amount,transaction_id,country,customer_id,merchant_long_lat,movement
txn_description,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
INTER BANK,742,0,0,742,742,742,0,0,742,742,742,742,742,0,0,742,742,742,742,742,0,742
PAY/SALARY,883,0,883,883,883,883,0,883,883,883,883,883,883,0,0,883,883,883,883,883,0,883
PAYMENT,2600,0,0,2600,2600,2600,0,0,2600,2600,2600,2600,2600,0,0,2600,2600,2600,2600,2600,0,2600
PHONE BANK,101,0,0,101,101,101,0,0,101,101,101,101,101,0,0,101,101,101,101,101,0,101
POS,3783,3783,1,3783,3783,3783,3783,0,3783,3783,3783,3783,3783,3783,3783,3783,3783,3783,3783,3783,3783,3783
SALES-POS,3934,3934,1,3934,3934,3934,3934,0,3934,3934,3934,3934,3934,3934,3934,3934,3934,3934,3934,3934,3934,3934


In [21]:
master_df.groupby(["txn_description", "account"])['amount'].mean()

txn_description  account       
INTER BANK       ACC-1056639002     56.812500
                 ACC-1199531521    167.857143
                 ACC-1217063613     39.285714
                 ACC-1243371644     20.000000
                 ACC-1279356312    164.000000
                                      ...    
SALES-POS        ACC-854938045      55.727231
                 ACC-90814749       32.724889
                 ACC-958000567      89.348039
                 ACC-964839203      29.187813
                 ACC-966140392      25.216522
Name: amount, Length: 483, dtype: float64

In [23]:
master_df["movement"].unique()

array(['debit', 'credit'], dtype=object)

In [24]:
master_df["movement"].value_counts()

debit     11160
credit      883
Name: movement, dtype: int64

In [25]:
master_df.groupby(["movement","txn_description"])['txn_description'].count()

movement  txn_description
credit    PAY/SALARY          883
debit     INTER BANK          742
          PAYMENT            2600
          PHONE BANK          101
          POS                3783
          SALES-POS          3934
Name: txn_description, dtype: int64

In [26]:
master_df.groupby(["movement","txn_description"])['amount'].mean()

movement  txn_description
credit    PAY/SALARY         1898.728029
debit     INTER BANK           86.699461
          PAYMENT              77.613077
          PHONE BANK          106.099010
          POS                  40.407412
          SALES-POS            39.909789
Name: amount, dtype: float64

In [28]:
![](trans_count_daily.PNG)

'["picture"]' is not recognized as an internal or external command,
operable program or batch file.
