# Financial Accounting ETL

This notebook transforms the [Financial Accounting](https://www.kaggle.com/datasets/jazidesigns/financial-accounting) dataset into a double-entry compliant bookkeeping format. Realistically, financial data would come from an ERP system (Workday, SAP) in this exact format.

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('../data/financial_accounting.csv')
df.head()

Unnamed: 0,Date,Account,Description,Debit,Credit,Category,Transaction_Type,Customer_Vendor,Payment_Method,Reference
0,2023-08-21,Accounts Payable,Transaction 1,112.56,112.56,Asset,Sale,Customer 39,Cash,67471
1,2023-08-13,Accounts Receivable,Transaction 2,775.86,775.86,Revenue,Purchase,Customer 3,Check,92688
2,2023-05-11,Accounts Receivable,Transaction 3,332.81,332.81,Revenue,Transfer,Customer 36,Check,72066
3,2023-02-26,Accounts Receivable,Transaction 4,203.71,203.71,Asset,Purchase,Customer 57,Check,27973
4,2023-11-06,Accounts Receivable,Transaction 5,986.26,986.26,Asset,Expense,Customer 92,Check,29758


## Unnecessary Columns: Description and Reference

In [None]:
df.sort_values(by=['Reference','Customer_Vendor'], inplace=True)
df

In [None]:
df[df['Description'].str.contains('Transaction') == False]

As you can see in the above output, the reference column is not a unique identifier corresponding to any transaction or bookkeeping entry. The description column is also unnecessary, given that each value follows placeholder format "Transaction #". We can safely drop these two placeholder columns as they are irrelevant to the bookkeeping process and exploratory data analysis.

In [None]:
df.drop(['Reference', 'Description'], axis=1, inplace=True)

In [None]:
new_df = df.iloc[0:0]

# Enforcing Double-Entry Bookkeeping Compliant Format

The dataset is not compliant with double-entry bookkeeping standards. Firstly, each transaction is recorded as a single entry with equal debit and credit amounts. Secondly, accounts are often miscategorized. The below transformations for each of the four observed account types will enforce double-entry bookkeeping standards and record any outliers for further analysis.

In [3]:
print(df['Account'].unique())
print(df['Category'].unique())
print(df['Payment_Method'].unique())

['Accounts Payable' 'Accounts Receivable' 'Cash' 'Inventory']
['Asset' 'Revenue' 'Expense' 'Liability']
['Cash' 'Check' 'Credit Card' 'Bank Transfer']


## Transformation 1: Accounts Receivable 

In [None]:
for index, row in df[(df['Account'] == 'Accounts Receivable')].iterrows():
    #Debit Entry
    debit_entry = row.copy()
    debit_entry['Credit'] = 0.0
    debit_entry['Category'] = 'Asset'
    new_df = new_df.append(debit_entry, ignore_index=True)

    #Credit Entry
    credit_entry = row.copy()
    if row['Category'] == 'Revenue':
        credit_entry['Account'] = 'Sales Revenue'
        credit_entry['Category'] = 'Revenue'
    elif row['Category'] == 'Asset':
        credit_entry['Account'] = 'Asset Account'
        credit_entry['Category'] = 'Asset'
    elif row['Category'] == 'Expense':
        credit_entry['Account'] = 'Expense Account'
        credit_entry['Category'] = 'Expense'
    elif row['Category'] == 'Liability':
        credit_entry['Account'] = 'Liability Account'
        credit_entry['Category'] = 'Liability'

    credit_entry['Debit'] = 0.0
    new_df = new_df.append(credit_entry, ignore_index=True)

In [None]:
df = df[~(df['Account'] == 'Accounts Receivable')]

## Transformation 2: Accounts Payable

In [None]:
def is_cash_equivalent(payment_method):
    return payment_method in ["Cash", "Bank Transfer"]

In [None]:
for index, row in df[(df['Account'] == 'Accounts Payable')].iterrows():
    #Debit Entry
    debit_entry = row.copy()
    debit_entry['Credit'] = 0.0
    
    if row['Category'] == 'Asset':
        debit_entry['Account'] = 'Inventory'

    elif row['Category'] == 'Liability':
        debit_entry['Account'] = 'Accounts Payable'

    elif row['Category'] == 'Revenue':
        debit_entry['Account'] = 'Accounts Payable'

    elif row['Category'] == 'Expense':
        debit_entry['Account'] = 'Expense Account'
    
    new_df = new_df.append(debit_entry, ignore_index=True)
    

    #Credit Entry
    credit_entry = row.copy()
    credit_entry['Debit'] = 0.0
    
    if row['Category'] == 'Revenue':
        credit_entry['Account'] = 'Sales Revenue'
        
    elif row['Category'] == 'Expense' and is_cash_equivalent(row['Payment_Method']):
        credit_entry['Account'] = 'Cash'
    
    elif row['Category'] == 'Liability':
        #Settling an existing liability: credit Accounts Payable
        credit_entry['Account'] = 'Accounts Payable'
        
    else:
        credit_entry['Account'] = 'Accounts Payable' if row['Category'] in ['Asset', 'Expense'] else 'Other Payable Account'
    
    new_df = new_df.append(credit_entry, ignore_index=True)

In [None]:
df = df[~(df['Account'] == 'Accounts Payable')]

## Transformation 3: Cash

In [None]:
for index, row in df[df['Account'] == 'Cash'].iterrows():
    if row['Category'] == 'Revenue':
        #Debit Entry - Increasing Cash (Asset) for Revenue
        debit_entry = row.copy()
        debit_entry['Credit'] = 0.0 
        debit_entry['Category'] = 'Asset'
        new_df = new_df.append(debit_entry, ignore_index=True)

        #Credit Entry - Increasing Revenue
        credit_entry = row.copy()
        credit_entry['Debit'] = 0.0
        credit_entry['Account'] = 'Sales Revenue'
        credit_entry['Category'] = 'Revenue'
        new_df = new_df.append(credit_entry, ignore_index=True)

    elif row['Category'] == 'Expense':
        #Debit Entry
        debit_entry = row.copy()
        debit_entry['Credit'] = 0.0
        debit_entry['Account'] = 'Expense Account'
        debit_entry['Category'] = 'Expense'
        new_df = new_df.append(debit_entry, ignore_index=True)

        #Credit Entry - Decreasing Cash (Asset)
        credit_entry = row.copy()
        credit_entry['Debit'] = 0.0
        credit_entry['Account'] = 'Cash'
        credit_entry['Category'] = 'Asset'
        new_df = new_df.append(credit_entry, ignore_index=True)
    elif row['Category'] == 'Liability':
        #Debit Entry - Decreasing Liability
        debit_entry = row.copy()
        debit_entry['Credit'] = 0.0
        debit_entry['Account'] = 'Liability Account'
        debit_entry['Category'] = 'Liability'
        new_df = new_df.append(debit_entry, ignore_index=True)

        #Credit Entry - Decreasing Cash (Asset)
        credit_entry = row.copy()
        credit_entry['Debit'] = 0.0
        credit_entry['Account'] = 'Cash'
        credit_entry['Category'] = 'Asset'
        new_df = new_df.append(credit_entry, ignore_index=True)
    elif row['Category'] == 'Asset':
        #Debit Entry - Acquiring Asset
        debit_entry = row.copy()
        debit_entry['Credit'] = 0.0
        debit_entry['Account'] = 'Asset Account'
        debit_entry['Category'] = 'Asset'
        new_df = new_df.append(debit_entry, ignore_index=True)

        #Credit Entry - Decreasing Cash (Asset)
        credit_entry = row.copy()
        credit_entry['Debit'] = 0.0
        credit_entry['Account'] = 'Cash'
        credit_entry['Category'] = 'Asset'
        new_df = new_df.append(credit_entry, ignore_index=True)
    else:
        print("Error, unknown category for cash account:", row['Category'])


In [None]:
df = df[~(df['Account'] == 'Cash')]

## Transformation 4: Inventory

In [None]:
for index, row in df[df['Account'] == 'Inventory'].iterrows():
    if row['Category'] == 'Expense':
        #Debit Entry - Increasing Expense
        debit_entry = row.copy()
        debit_entry['Credit'] = 0.0
        debit_entry['Account'] = 'Expense Account'
        new_df = new_df.append(debit_entry, ignore_index=True)

        #Credit Entry - Decreasing Inventory (Asset)
        credit_entry = row.copy()
        credit_entry['Debit'] = 0.0
        credit_entry['Account'] = 'Inventory'
        credit_entry['Category'] = 'Asset'
        new_df = new_df.append(credit_entry, ignore_index=True)

    elif row['Category'] == 'Asset':
        #Debit Entry - Increasing Inventory (Asset)
        debit_entry = row.copy()
        debit_entry['Credit'] = 0.0
        new_df = new_df.append(debit_entry, ignore_index=True)

        #Credit Entry
        credit_entry = row.copy()
        credit_entry['Debit'] = 0.0
        credit_entry['Account'] = 'Cash'
        new_df = new_df.append(credit_entry, ignore_index=True)

    elif row['Category'] == 'Liability':
        #Debit Entry - Decreasing Liability
        debit_entry = row.copy()
        debit_entry['Credit'] = 0.0
        debit_entry['Account'] = 'Liability Account'
        new_df = new_df.append(debit_entry, ignore_index=True)

        # Credit Entry - Decreasing Inventory (Asset)
        credit_entry = row.copy()
        credit_entry['Debit'] = 0.0
        credit_entry['Account'] = 'Inventory'
        credit_entry['Category'] = 'Asset'
        new_df = new_df.append(credit_entry, ignore_index=True)

    elif row['Category'] == 'Revenue':
        #Debit Entry - Decreasing Inventory (Asset)
        debit_entry = row.copy()
        debit_entry['Credit'] = 0.0
        debit_entry['Account'] = 'Inventory'
        debit_entry['Category'] = 'Asset'
        new_df = new_df.append(debit_entry, ignore_index=True)

        #Credit Entry - Increasing Revenue
        credit_entry = row.copy()
        credit_entry['Debit'] = 0.0
        credit_entry['Account'] = 'Revenue Account'
        credit_entry['Category'] = 'Revenue'
        new_df = new_df.append(credit_entry, ignore_index=True)

In [None]:
df = df[~(df['Account'] == 'Inventory')]

In [None]:
#original dataframe should be empty
df

In [None]:
new_df.to_csv('../data/double_entry_financial_accounting.csv', index=False)