<div class='alert' style='background-color: #1c1a1e; color: #f5f4f0; padding:16px 26px; border-radius:20px; font-size:40px;'><B>MONZO</b> - Bank Statement Rule Based Classification </div>
<div style='margin:0px 26px; color:#1c1a1e; font-size:16px;'>
<center>
    <img src="https://github.com/janduplessis883/Money-Mate/raw/master/images/private.png" width="250">
</center> 
    
## Introduction

In this notebook, we will explore the process of classifying transactions from a Monzo Bank statement using rule-based classification. Monzo is a popular digital bank, and its statements often contain detailed transaction data, which can be categorized for better financial management and analysis.

The primary objective of this notebook is to demonstrate how to classify bank transactions into predefined categories such as "Groceries," "Travel," "Eating Out," and others. This classification will help in understanding spending patterns, budgeting, and financial planning.

### Key Steps in this Notebook:
1. **Data Loading**: Importing the Monzo Bank statement data.
2. **Data Preprocessing**: Cleaning and preparing the data for analysis.
3. **Rule-Based Categorization**: Applying predefined rules to classify transactions into different categories.
4. **Analysis and Visualization**: Summarizing and visualizing the categorized data to gain insights into spending behavior.

By the end of this notebook, you will have a clear understanding of how rule-based classification can be applied to bank transaction data to facilitate better financial insights and management. Let's get started!
</div>

# Libraries & Data

In [1]:
# Importing default Libraries
import matplotlib.pyplot as plt
import pandas as pd 
import numpy as np
import seaborn as sns
import warnings
import datetime 
import os 

from params import DATA_PATH

pd.options.display.max_rows = 1000
pd.options.display.max_columns = 1000

# Hi-resolution Plots and Matplotlib inline
%config InlineBackend.figure_format = 'retina'
%matplotlib inline

# Set the maximum number of rows and columns to be displayed
warnings.filterwarnings('ignore')

# "magic commands" to enable autoreload of your imported packages
%load_ext autoreload
%autoreload 2

## Loading Data

In [4]:
data = pd.read_csv(f'../data/monzo.csv')
data.head(2)

Unnamed: 0,Transaction ID,Date,Time,Type,Name,Emoji,Category,Amount,Currency,Local amount,Local currency,Notes and #tags,Address,Receipt,Description,Category split
0,tx_00009jGereHTyV50ElCRLl,28/05/2019,11:30:19,Faster payment,DU PLESSIS J V B,,Income,150.0,GBP,150.0,GBP,BARCLAYS,,,BARCLAYS,
1,tx_00009jGsehBRGqJ8N1IJlp,28/05/2019,14:04:51,Card payment,Boots,💊,Medical,-2.79,GBP,-2.79,GBP,💊,198-200 Fulham Palace Road,,BOOTS FULHAM GBR,


In [5]:
data.shape

(6442, 16)

In [6]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6442 entries, 0 to 6441
Data columns (total 16 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Transaction ID   6442 non-null   object 
 1   Date             6442 non-null   object 
 2   Time             6442 non-null   object 
 3   Type             6442 non-null   object 
 4   Name             6437 non-null   object 
 5   Emoji            5394 non-null   object 
 6   Category         6442 non-null   object 
 7   Amount           6442 non-null   float64
 8   Currency         6442 non-null   object 
 9   Local amount     6442 non-null   float64
 10  Local currency   6442 non-null   object 
 11  Notes and #tags  1600 non-null   object 
 12  Address          4394 non-null   object 
 13  Receipt          7 non-null      object 
 14  Description      6238 non-null   object 
 15  Category split   0 non-null      float64
dtypes: float64(3), object(13)
memory usage: 805.4+ KB


# Exploratory Analysis

In [7]:
data['Type'].value_counts()

Type
Card payment            5457
Faster payment           557
Pot transfer             142
Direct Debit             110
Monzo-to-Monzo            62
Monzo Paid                32
Flex                      30
Account interest          23
overdraft                 18
Bacs (Direct Credit)       5
wise_cashback              5
ledger_adjustment          1
Name: count, dtype: int64

In [8]:
data['Category'].value_counts()

Category
Groceries              1548
Travel                 1329
General                 515
Subscriptions           515
Eating out              509
Eating Out              401
Smoking                 292
Income                  248
Transfers               218
Other                   174
Entertainment           134
Bills                    98
Shopping                 95
Medical                  70
Transport                59
Credit Cards             38
Savings                  31
Holidays                 29
Stuff                    28
Telephone                21
Revolut                  16
PayPal                   14
Rent                     12
Online Subscription       9
Family                    7
Tax                       7
Bank Charges              5
Finances                  4
Loans                     4
Personal care             3
Gifts                     3
Holiday                   3
Charity                   2
Expenses                  1
Name: count, dtype: int64

In [None]:
data['Type'].value_counts()