# Transaction Parser Example

This notebook demonstrates how to use the `TransactionParser` class to parse bank statements from PDF files.


In [1]:
import sys
sys.path.append('..')

from src.utils.transaction_parser import TransactionParser


## Basic Usage

Create a parser instance and process a PDF file:


In [7]:
# Initialize parser with default directories
parser = TransactionParser(input_dir="../data/raw", output_dir="../data/processed")

# Process a specific file
df = parser.process_file('dbs_unbilled.pdf')

# Display the results
print(f"Found {len(df)} transactions\n")
display(df.head())


Found 8 transactions



Unnamed: 0,transaction_date,description,amount_sgd
0,2025-08-12,MALAY AIR2322479723900 PTB 2 SINGAPO SG,14.0
1,2025-08-12,MALAY AIR2322479723901 PTB 2 SINGAPO SG,248.8
2,2025-08-12,MALAY AIR2322479723902 PTB 2 SINGAPO SG,248.8
3,2025-08-15,CARDUP-PRUDENTIAL ASSU SINGAPORE SG,125.72
4,2025-08-25,PRUDENTIAL 733 SINGAPORE SG,17.77


## Custom Directory Paths

You can specify custom input and output directories:


In [13]:
# Initialize parser with custom directories
custom_parser = TransactionParser(input_dir="../data/raw", output_dir="../data/processed")


# Process without saving CSV
df = custom_parser.process_file('dbs_prev.pdf', save_csv=False)


In [14]:
df

Unnamed: 0,transaction_date,description,amount_sgd
0,2025-06-17,CARDUP-PRUDENTIAL ASSU SINGAPORE SG,125.72
1,2025-06-23,PASIR RIS-PUNGGOL TOWN SINGAPORE SG,69.4
2,2025-06-24,PRUDENTIAL 73373305 SINGAPORE SG,17.77
3,2025-06-27,CARDUP-PRUDENTIAL ASSU SINGAPORE SG,69.39
4,2025-06-27,CARDUP-PRUDENTIAL ASSU SINGAPORE SG,305.55
5,2025-06-30,PRUDENTIAL 73373207 SINGAPORE SG,28.82
6,2025-07-02,OPENAI *CHATGPT SUBSCR OPENAI.COM CA USD21.80,28.7
7,2025-07-12,02COURTS - MEGASTORE 12,163.53
8,2025-07-12,01CHALLENGER SUNTEC CITY 24,82.87


In [15]:
# Initialize parser with custom directories
custom_parser = TransactionParser(input_dir="../data/raw", output_dir="../data/processed")


# Process without saving CSV
df = custom_parser.process_file('dbs_current.pdf', save_csv=False)


In [16]:
df

Unnamed: 0,transaction_date,description,amount_sgd
0,2025-07-15,CARDUP-PRUDENTIAL ASSU SINGAPORE SG,125.72
1,2025-07-23,PASIR RIS-PUNGGOL TOWN SINGAPORE SG,34.7
2,2025-07-24,PRUDENTIAL 73373305 SINGAPORE SG,17.77
3,2025-07-29,CARDUP-PRUDENTIAL ASSU SINGAPORE SG,69.39
4,2025-07-29,CARDUP-PRUDENTIAL ASSU SINGAPORE SG,305.55
5,2025-07-30,PRUDENTIAL 73373207 SINGAPORE SG,28.82
6,2025-08-02,OPENAI *CHATGPT SUBSCR OPENAI.COM CA USD21.80,29.27
7,2025-08-12,02COURTS - MEGASTORE 12,163.53
8,2025-08-12,01CHALLENGER SUNTEC CITY 24,82.87


In [19]:

import pdfplumber
with pdfplumber.open('../data/raw/dbs_unbilled.pdf') as pdf:
    text_content = []
    for page in pdf.pages:
        text = page.extract_text() or ""
        # Normalize whitespace for consistent parsing
        lines = [ln.strip() for ln in text.splitlines() if ln.strip()]
        text_content.extend(lines)


In [20]:
text_content

['This is a print preview page',
 'Close this window.',
 'View Transaction History',
 '30 Aug 2025 07:10 AM Singapore',
 'Modify Search',
 'DBS Altitude Visa Signature Card4119-1100-9323-8894',
 'Credit Limit Available Limit DBS Points',
 'S$19,100.00 S$16,351.00 19950',
 'Unbilled Transactionsas per statement printed on 30 Aug 2025',
 'Transaction Date Description Amount',
 '24 Aug 2025 PAYMENT - DBS INTERNET/WIRELESS S$857.62 cr',
 'Sub-Total -S$ 857.62',
 'Transaction Date Description Amount',
 'DBS Altitude Visa Signature Card4119-1100-9323-8894',
 '12 Aug 2025 MALAY AIR2322479723900 PTB 2 SINGAPO SG S$14.00',
 '12 Aug 2025 MALAY AIR2322479723901 PTB 2 SINGAPO SG S$248.80',
 '12 Aug 2025 MALAY AIR2322479723902 PTB 2 SINGAPO SG S$248.80',
 '15 Aug 2025 CARDUP-PRUDENTIAL ASSU SINGAPORE SG S$125.72',
 '25 Aug 2025 PRUDENTIAL 733 SINGAPORE SG S$17.77',
 '25 Aug 2025 PUNGGOL TOWN COUNCIL SINGAPORE SG S$69.40',
 '28 Aug 2025 CARDUP-PRUDENTIAL ASSU SINGAPORE SG S$305.55',
 '28 Aug 2025 CA

## Direct PDF Processing

You can also process a PDF file directly by providing its full path:


In [None]:
# Process a PDF file using its full path
df = parser.parse_pdf('/full/path/to/statement.pdf')
