# Trial Balance Formatting: Preparing QuickBooks Export Data for Import into CCH Engagement

**Part 1 of this notebook will contain an example of code used to format trial balance data with limited comments. Part 1 can also be found as a .py file in this same repository (trial_balance_etl.py) For a more detailed explanation of the process and instructions, see Part 2.**

## Part 1: Example Code

### Import Libraries

In [11]:
import pandas as pd
import os

### Helper Functions

In [12]:
# Create a new dataframe column from splitting another column. 4 arguments:df, column to split, delimiter to split on, and index of the item we want from the .split() method
# Set index default to -1 if no argument is given to select the last item in the list

def new_col_from_split(df, split_col, delim, index = -1):
    return [x[index] for x in df[split_col].astype(str).str.split(delim)]

In [13]:
# This function creates a dictionary with files to format and account suffixes that will be added onto the account numbers in the file

def create_entity_dict(df, entity_column, suffix_column, file_list=os.listdir('./quickbooks_data/')):
    return {x + '.xlsx': '.' + y for x, y in zip(df[entity_column], df[suffix_column]) if x + '.xlsx' in file_list}

In [18]:
# This function performs all of the standard formatting changes necessary to prepare quickbooks exports for CCH Engagement
# Minor Changes may be necessary from project to project

def format_tbs(entities, data_folder='./quickbooks_data/'):
    
    if 'ready_for_tb_import' not in os.listdir():
        os.mkdir('./ready_for_import/')
        
    if 'processed_quickbooks_files' not in os.listdir():
        os.mkdir('./processed_quickbooks_files/')
        
    for entity, suffix in entities.items():
        
        # Print statement to help with debugging if one of the QuickBooks files is formatted differently
        print(f'formatting {entity}')
        
        # Create a dataframe from the QuickBooks export file
        df = pd.read_excel(f'{data_folder}{entity}', sheet_name='Sheet1', skiprows=4)
        
        # Drop the unneeded Total row
        if 'total' in df.iloc[len(df) - 1, 0].lower():
            df.drop(index=len(df) - 1, inplace=True)
            
        # Replace nan values with 0 
        df.fillna(0, inplace=True)
        
        # Split combined name and account coloumn into separate name and account column, adding the suffix to the end of the account numbers.
       
        df['_col'] = new_col_from_split(df, 'Unnamed: 1', ':')
        df['account_number'] = [account + suffix for account  in new_col_from_split(df, '_col', ' · ', index=0)]
        df['account_name'] = new_col_from_split(df, '_col', ' · ')
        
        # Combine debit and credit columns into a single balance column
        df['balance'] = df['Debit'] - df['Credit']
        
        # Export account number, account name, and balance columns to a new excel file in ready_for_import folder
        df[['account_number', 'account_name', 'balance']].to_excel(f'./ready_for_tb_import/formatted_tb_{entity}', index=False)
        
        # Move QuickBooks excel file to import_file_created folder
        os.rename(f'./{data_folder}/{entity}', f'./processed_quickbooks_files/{entity}')
        
        # Print statement confirming successful formatting to help with debugging
        print(f'formatted_tb_{entity} successfully created')
        

### Create Dictionary of Files to Format

In [15]:
df_keys = pd.read_excel('account_keys.xlsx')
df_keys.head()

Unnamed: 0,Acronym,Trial Balance,Entity
0,ABC,34-ABC,ABC Subsidiary
1,DEF,34-DEF,DEF Subsidiary
2,GHI,34-GHI,GHI Subsidiary
3,JKL,34-GPD,JKL Subsidiary
4,MNO,34-MNO,MNO Subsidiary


In [16]:
entity_dict = create_entity_dict(df_keys, entity_column='Entity', suffix_column='Acronym')
entity_dict

{'ABC Subsidiary.xlsx': '.ABC', 'DEF Subsidiary.xlsx': '.DEF'}

### Format and Export TBs

In [19]:
format_tbs(entity_dict)

formatting ABC Subsidiary.xlsx
formatted_tb_ABC Subsidiary.xlsx successfully created
formatting DEF Subsidiary.xlsx
formatted_tb_DEF Subsidiary.xlsx successfully created


## Part 2:

Move files back to current directory

In [12]:
for file in entity_dict:
    if file not in os.listdir():
        os.rename(f'./processed_quickbooks_files/{file}', f'./quickbooks_data/{file}')

### Create a DataFrame from the Excel File containing Account Suffixes (Acronyms)

This excel file will need at least the two following columns:

|Suffix|Entity|
|---|---|
|ABC|ABC Subsidiary|
|DEF|DEF Subsidiary|
|...|...|

The **Suffix** column will be needed for appending account suffixes to account numbers before consolidation. General Ledger account numbers are typically repeated across subsidiaries of a parent corporation, thus it is necessary to append a unique suffix to the existing GL account numbers so that accounts belong to different subsidiaries can be differentiated while the root account numbers are still preserved for the sake of consistent account grouping and sub-grouping after consolidation.

The **Entity** Column will be used to generate dataframes for every trial balance excel file that is in the same folder as your jupyter notebook. ***It is imperitive that the values in the 'Entity' column match the names of the excel files EXACTLY (minus the .xlsx extension) for the code in this notebook to work properly.*** Best practice is to copy an paste directly from this list of Entities and rename your files as you gather QuickBooks export files for a given client.


In [13]:
# Create a DataFrame from the account_keys.xlsx file that contains my entity names and account suffixes
df_account_keys = pd.read_excel('account_keys.xlsx')

df_account_keys.head()

Unnamed: 0,Acronym,Trial Balance,Entity
0,ABC,34-ABC,ABC Subsidiary
1,DEF,34-DEF,DEF Subsidiary
2,GHI,34-GHI,GHI Subsidiary
3,JKL,34-GPD,JKL Subsidiary
4,MNO,34-MNO,MNO Subsidiary


As you see in the first few rows of the dataframe that was just created, if you are working with a pre-existing list used for formatting trial balances in excel, you may have a column with the CCH Engagement Binder Index. In fact, in many cases you may only have this index column with data in the format '34-ABC'. It is good practice to use these same indexes as the suffixes for you GL account numbers for the sake of consistency. To extract the 3 character suffix from the end on the Trial Balance Index, we will use the .split() method and assign the output to a new column.

In [14]:
# Call the .str.split() method on the column you want to split, and pass '-' as the argument
suffix_list = df_account_keys['Trial Balance'].str.split('-')

# .split() method will output a series of lists of strings split on every instance of the '-'
suffix_list

0     [34, ABC]
1     [34, DEF]
2     [34, GHI]
3     [34, GPD]
4     [34, MNO]
5     [34, PQR]
6     [34, STU]
7     [34, VWX]
8      [34, YZ]
9     [34, 123]
10    [34, 234]
11    [34, 345]
12    [34, 456]
13    [34, 567]
14    [34, 678]
15    [34, 789]
16    [34, 890]
17    [34, 999]
Name: Trial Balance, dtype: object

In [15]:
# Access the second item in each list by indexing x[1] for each list inside of the series (python indexes start at 0)
for x in suffix_list:
    print(x[1])

ABC
DEF
GHI
GPD
MNO
PQR
STU
VWX
YZ
123
234
345
456
567
678
789
890
999


In [16]:
# If you index the lists as they are created inside of a list comprehension, you can avoid the creation of the series altogether and assign the second value directly to new column in the dataframe
# We'll call that new column 'Suffixes'

df_account_keys['Suffixes'] = [x[1] for x in df_account_keys['Trial Balance'].str.split('-')]
df_account_keys.head()

Unnamed: 0,Acronym,Trial Balance,Entity,Suffixes
0,ABC,34-ABC,ABC Subsidiary,ABC
1,DEF,34-DEF,DEF Subsidiary,DEF
2,GHI,34-GHI,GHI Subsidiary,GHI
3,JKL,34-GPD,JKL Subsidiary,GPD
4,MNO,34-MNO,MNO Subsidiary,MNO


***We'll be doing lots of splitting later, we can take this one step further and create a helper function that will allow us to choose a column to split, tell it which item in the resulting list we want, and the name of the new column that we want to create.***

In [25]:
# Define a function that takes 4 arguments:df, column to split, delimiter to split on, and index of the item we want from the .split() method
# Set index default to -1 if no argument is given to select the last item in the list

def new_col_from_split(df, split_col, delim, index = -1):
    return [x[index] for x in df[split_col].astype(str).str.split(delim)]

In [18]:
# Assign the output of the function to a new column
df_account_keys['Suffix_2'] = new_col_from_split(df_account_keys, 'Trial Balance', '-') 

df_account_keys.head()

Unnamed: 0,Acronym,Trial Balance,Entity,Suffixes,Suffix_2
0,ABC,34-ABC,ABC Subsidiary,ABC,ABC
1,DEF,34-DEF,DEF Subsidiary,DEF,DEF
2,GHI,34-GHI,GHI Subsidiary,GHI,GHI
3,JKL,34-GPD,JKL Subsidiary,GPD,GPD
4,MNO,34-MNO,MNO Subsidiary,MNO,MNO


### Create Dictionary of Files to Format and Suffixes to Add to Account Numbers

In [33]:
# create empty dictionary to store excel file names and suffixes
entity_dictionary = {}

# loop through Entity column and Suffixes column of df_account_keys using .zip() method:
for entity, suffix in zip(df_account_keys['Entity'], df_account_keys['Suffixes']):
    
    # check if the excel file for each entity is in the working directory
    if entity + '.xlsx' in os.listdir():
        
        # add excel file names as keys and suffixes as values to entity_dict
        entity_dictionary.update({entity + '.xlsx': '.' + suffix})
        
        

In [34]:
# View entries in entity_dictionary
entity_dictionary

{'ABC Subsidiary.xlsx': '.ABC', 'DEF Subsidiary.xlsx': '.DEF'}

***Refactoring the above for loop as a function that takes a dataframe and the columns that contain the entities and suffixes will make the code easier to reuse. It is also a good opportunity to use a dictionary comprehension to make the code more concise.***

In [None]:
# Takes 4 arguments: df, entity_column ('string'), suffix_column ('string'), file_list (can take any iterable, but defaults to a list of the current directory)
def create_entity_dict(df, entity_column, suffix_column, file_list=os.listdir()):
    return {entity + '.xlsx': '.' + suffix for entity, suffix in zip(df[entity_column], df[suffix_column]) if entity + '.xlsx' in file_list}

### Use os library to create a list of files to format and create folders to store finished files
#### Create New Folders:

os.mkdir() will be used to make new folders to store processed files to keep everything organized. If the folder already exists, an error will be thrown. To avoid this, first use an if statement to check if the folders already exist iterating through os.listdir()

In [20]:
# Check that folder is not in the current directory
if 'ready_for_import' not in os.listdir():
    
    # Create import folder
    os.mkdir('./ready_for_import')

In [21]:
# Check that folder is not in the current directory
if 'import_file_created' not in os.listdir():
    
    # Create folder for processed QuickBook files
    os.mkdir('./import_file_created')

### Import one trial balance file to see data format

In [23]:
# Import ABC Subsidiary.xlsx
abc_df = pd.read_excel('ABC Subsidiary.xlsx', sheet_name='Sheet1', skiprows=4)

# View first five rows
abc_df.head()

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Debit,Unnamed: 3,Credit
0,,1010 · Example Bank - Business Banking,700.0,,
1,,1250 · Accounts Receivable,0.0,,
2,,12000 · Undeposited Funds,0.0,,
3,,1261 · Loan Receivable - ABC,10000000.0,,
4,,20100 · Due To/(From) Related Entities:20102 ·...,,,30000.0


In [24]:
# View last 5 rows
abc_df.iloc[-5:]

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Debit,Unnamed: 3,Credit
6,,30300 · Capital Account - Jeff:30302 · Member ...,100000.0,,
7,,60400 · Bank Service Charges,200.0,,
8,,67300 · Management Fee Exp.,30000.0,,
9,,70000 · Interest Income,,,100900.0
10,TOTAL,,10130900.0,,10130900.0


In [20]:

def create_entity_dict(df, entity_column, suffix_column, file_list=os.listdir()):
    return {x + '.xlsx': '.' + y for x, y in zip(df[entity_column], df[suffix_column]) if x + '.xlsx' in file_list}

In [21]:

def format_tbs(entities):
    for entity, suffix in entities.items():
        
        #Print statement to help with debugging if one of the QuickBooks files is formatted differently
        print(f'formatting {entity}')
        
        #Create a dataframe from the QuickBooks export file
        df = pd.read_excel(entity, sheet_name='Sheet1', skiprows=4)
        
        #Drop the unneeded Total row
        if 'total' in df.iloc[len(df) - 1, 0].lower():
            df.drop(index=len(df) - 1, inplace=True)
            
        #Replace nan values with 0 
        df.fillna(0, inplace=True)
        
        #Split combined name and account coloumn into separate name and account column, adding the suffix to the end of the account numbers.
       
        df['_col'] = new_col_from_split(df, 'Unnamed: 1', ':')
        df['account_number'] = [account + suffix for account  in new_col_from_split(df, '_col', ' · ', index=0)]
        df['account_name'] = new_col_from_split(df, '_col', ' · ')
        
        #Combine debit and credit columns into a single balance column
        df['balance'] = df['Debit'] - df['Credit']
        
        #export account number, account name, and balance columns to a new excel file
        df[['account_number', 'account_name', 'balance']].to_excel(f'formatted_tb_{entity}', index=False)
        
        
        print(f'formatted_tb_{entity} successfully created')
        
        
        

In [22]:
entity_dict = create_entity_dict(df_account_keys, 'Entity', 'Suffixes')
entity_dict

{'ABC Subsidiary.xlsx': '.ABC', 'DEF Subsidiary.xlsx': '.DEF'}

In [23]:
format_tbs(entity_dict)

formatting ABC Subsidiary.xlsx
formatted_tb_ABC Subsidiary.xlsx successfully created
formatting DEF Subsidiary.xlsx
formatted_tb_DEF Subsidiary.xlsx successfully created


In [31]:
os.rename('./ABC Subsidiary.xlsx', './import_file_created/ABC Subsidiary.xlsx')