# Import Modules

In [132]:
import pandas as pd
import os

# Preparing Directories
* Before running any other code, make sure output folders are created to help keep everything organized. 
* use os.listdir() to generate a list of the working directory to check if folders already exist 
* use os.mkdir() to make new folders

***tip: use an if statement to check if output folders already exist to avoid raising an error when you try to create a folder that already exists***

In [133]:
if 'ready_for_tb_import' not in os.listdir():
    os.mkdir('./ready_for_tb_import/')

if 'processed_quickbooks_files' not in os.listdir():
    os.mkdir('./processed_quickbooks_files/')

# Creating a Dictionary of Files to Format
## You will need:
1) An excel file with entity names that match the quickbooks file names (minus the .xlsx extension) and corresponding account suffixes to use for CCH Engagement
2) All of your quickbooks data collected into a single folder within your working directory

*if you need a refresher on why these suffixes are needed, see the project README file on github [Trial Balance Formatting](https://github.com/jacxson/Trial-Balance-Formatting)*

## Steps:
1) Create a dataframe from the excel file with entity names and suffixes (account_keys.xlsx)
2) Create a list of files in the folder that has the excel files exported from quickbooks (./quickbooks_data/)
3) Loop through the entities in the dataframe and compare it to the list of quickbooks files. Add any matches to a dictionary that contains the file name and the corresponding account suffix from the dataframe.



#### 1) Create dataframe of account_key.xlsx

In [134]:
entities_df = pd.read_excel('account_keys.xlsx')

# view first 5 rows
entities_df.head()

Unnamed: 0,Acronym,Trial Balance,Entity
0,ABC,34-ABC,ABC Subsidiary
1,DEF,34-DEF,DEF Subsidiary
2,GHI,34-GHI,GHI Subsidiary
3,JKL,34-GPD,JKL Subsidiary
4,MNO,34-MNO,MNO Subsidiary


#### 2. Create list of files in the quickbooks_data folder

In [135]:
data = os.listdir('./quickbooks_data/')

#### 3) Loop through the entities in the dataframe and compare it to the list of quickbooks files. Add any matches to a dictionary that contains the file name and the corresponding account suffix from the dataframe.

***tip: use the zip() function to loop through multiple dataframe columns simultaneously***

In [136]:
# Create an empty dictionary
entity_dict = {}

# use zip() function to loop through entity names and suffixes in the entities dataframe simulaneously
for entity, suffix in zip(entities_df['Entity'], entities_df['Acronym']):
    
    # if a match occurs between the files in the data list and the entities column, add the filename as key and the suffix as value to the empty dictionary
    if entity + '.xlsx' in data:
        entity_dict.update({f'{entity}.xlsx':f'.{suffix}'})
        
# view the entity_dict
entity_dict

{'ABC Subsidiary.xlsx': '.ABC',
 'DEF Subsidiary.xlsx': '.DEF',
 'GHI Subsidiary.xlsx': '.GHI'}

# Applying Formatting to a Single Excel File
## Steps:
1) Import a single excel file as a dataframe
2) Drop the 'TOTAL' row
3) Replace all null values with zeros
4) Split account names and numbers into new separate columns
5) Add suffixes to account numbers
6) Combine debit and credit columns into a single balance column
7) Export a dataframe of only account numbers, account names, and balances to a new excel file
8) Move processed files to the appropriate output folder

#### 1) Import a single excel file as a dataframe

**Notes:**
* QuickBooks exports typically have the default sheet of the workbook dedicated to tips on updating the report in excel. To access the data itself, you must specify sheet_name='Sheet1' as a keyword argument in the pandas.read_excel function, otherwise it will return an empty dataframe
* QuickBooks exports also typically have the table starting on row 6 with headers on row 5. Pass skiprows=4 as a keyword argument in oder to read in the data correctly. 

In [137]:
abc = pd.read_excel('./quickbooks_data/ABC Subsidiary.xlsx', sheet_name='Sheet1', skiprows=4)

# view first 5 rows of data
abc.head()

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Debit,Unnamed: 3,Credit
0,,1010 · Example Bank - Business Banking,716.8,,
1,,1210 · Cash In Transit,0.0,,
2,,13010 · Land Held for Investment:13011 · Land ...,3561613.12,,
3,,13010 · Land Held for Investment:13011 · Land ...,131260.0,,
4,,13010 · Land Held for Investment:13011 · Land ...,48155.45,,


In [138]:
# view last 5 rows of data
abc.iloc[-5:]

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Debit,Unnamed: 3,Credit
25,,20400 · Notes Payable:20401 · Note Payable - J...,,,1000000.0
26,,60400 · Bank Service Charges,515.0,,
27,,67300 · Management Fee Exp.,15400.0,,
28,,69000 · Political Contribution,0.0,,
29,TOTAL,,5879393.52,,5879393.52


#### 2) Drop the 'TOTAL' row.

* The total row should be excluded from the trial balance import into CCH Engagement as the program will automatically calculate the total as a test of whether the trial balance is in balance. 

***tip: use an if statement to double check whether 'TOTAL' is the last column of the dataframe to avoid accidentally deleting a row of data***

In [139]:
# Check if last row is total, use .lower() to coerce all characters to be lowercase
if 'total' in str(abc.iloc[-1, 0]).lower():
    abc.drop(index=len(abc) - 1, inplace=True)

In [140]:
abc.iloc[-5:]

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Debit,Unnamed: 3,Credit
24,,20100 · Due To/(From) Related Entities:20101 ·...,1043861.84,,
25,,20400 · Notes Payable:20401 · Note Payable - J...,,,1000000.0
26,,60400 · Bank Service Charges,515.0,,
27,,67300 · Management Fee Exp.,15400.0,,
28,,69000 · Political Contribution,0.0,,


#### 3) Replace all null values with zeros
To help avoid issues with adding and substracting later, we can go ahead and replace missing values with zeroes.

In [141]:
abc.fillna(0, inplace=True)

#### 4) Split account names and numbers into new separate columns
*This is the most invloved section of the notebook, and where the most time is saved*

1) Observe how account names and numbers are nested, and more importantly, how they are separated
2) Use .split() method with list comprehensions to select only account names and only account numbers
3) Create a function that makes it easier to apply splitting to make new columns

In [142]:
abc.head()

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Debit,Unnamed: 3,Credit
0,0,1010 · Example Bank - Business Banking,716.8,0.0,0.0
1,0,1210 · Cash In Transit,0.0,0.0,0.0
2,0,13010 · Land Held for Investment:13011 · Land ...,3561613.12,0.0,0.0
3,0,13010 · Land Held for Investment:13011 · Land ...,131260.0,0.0,0.0
4,0,13010 · Land Held for Investment:13011 · Land ...,48155.45,0.0,0.0


**Note:**
* Account names and numbers are separarated by " · "
* But the accounts are also listed nested inside of account groups and subgroups separated by ":"
* To complicate things further, some accounts do not have account groups and/or subgroups included (so we cannot just choose the same index every time)

**See the account at index 2 as an example:**
* The account name is 'Land' and the account number is '13011.1'
* The account sub-group is 'Land Basis' and the sub-group account number is '13011'
* The account group in 'Land Help for Investment' and the group account number is '13010'

***tip: to consistently select only 'Land' and '13011.1' whether or not the account group or subgroup is present, use a list comprehension that selects only the last element of each list of splits***

In [143]:
print(f"String before splitting: {abc['Unnamed: 1'][2]}\n")
print(f"List after splitting: {abc['Unnamed: 1'][2].split(':')}\n")
print(f"Last element of split list: {abc['Unnamed: 1'][2].split(':')[-1]}")

String before splitting: 13010 · Land Held for Investment:13011 · Land Basis:13011.1 · Land

List after splitting: ['13010 · Land Held for Investment', '13011 · Land Basis', '13011.1 · Land']

Last element of split list: 13011.1 · Land


In [144]:
# Create list comprehension of every last element of split lists and assign it to a new column
abc['accts_names'] = [x[-1] for x in abc['Unnamed: 1'].str.split(':')]

# View list of account names and numbers
abc['accts_names'].head()

0    1010 · Example Bank - Business Banking
1                    1210 · Cash In Transit
2                            13011.1 · Land
3                  13011.3 · Assignment Fee
4                      13012 · Closing Cost
Name: accts_names, dtype: object

***tip: since we can use essentially the same process to access the account names and numbers from the new column, we can write a function to make repeating this process easier.***

In [145]:
def new_col_from_split(df, split_col, delim, index=-1):
    
    return [x[index] for x in df[split_col].astype(str).str.split(delim)]

In [146]:
# Compare the first 5 outputs of our function to the first 5 lines of the acct_names column already created
new_col_from_split(abc, 'Unnamed: 1', delim=':')[:5]


['1010 · Example Bank - Business Banking',
 '1210 · Cash In Transit',
 '13011.1 · Land',
 '13011.3 · Assignment Fee',
 '13012 · Closing Cost']

In [150]:
# Apply the new function to the acct_names column (only viewing first 5 outputs)
abc[:5]

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Debit,Unnamed: 3,Credit,accts_names,account_number,account_name
0,0,1010 · Example Bank - Business Banking,716.8,0.0,0.0,1010 · Example Bank - Business Banking,1010.0,Example Bank - Business Banking
1,0,1210 · Cash In Transit,0.0,0.0,0.0,1210 · Cash In Transit,1210.0,Cash In Transit
2,0,13010 · Land Held for Investment:13011 · Land ...,3561613.12,0.0,0.0,13011.1 · Land,13011.1,Land
3,0,13010 · Land Held for Investment:13011 · Land ...,131260.0,0.0,0.0,13011.3 · Assignment Fee,13011.3,Assignment Fee
4,0,13010 · Land Held for Investment:13011 · Land ...,48155.45,0.0,0.0,13012 · Closing Cost,13012.0,Closing Cost


**Note:**
* Since our 'index' parameter is set to a default of 0, this function will always grab the last item in each list unless instructed otherwise
* To access the account name, no index argument needs to be passed. To access account numbers, the index=0 argument will need to be passed

In [151]:
# Access account numbers wiht index=0 (only viewing first 5 outputs)
new_col_from_split(abc, 'accts_names', delim=' · ', index=0)[:5]

['1010', '1210', '13011.1', '13011.3', '13012']

In [152]:
# Assign account name and number columns to be equal to respective outputs of the new_col_from_split function
abc['account_number'] = new_col_from_split(abc, 'accts_names', delim=' · ', index=0)
abc['account_name'] = new_col_from_split(abc, 'accts_names', delim=' · ')

abc.head()

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Debit,Unnamed: 3,Credit,accts_names,account_number,account_name
0,0,1010 · Example Bank - Business Banking,716.8,0.0,0.0,1010 · Example Bank - Business Banking,1010.0,Example Bank - Business Banking
1,0,1210 · Cash In Transit,0.0,0.0,0.0,1210 · Cash In Transit,1210.0,Cash In Transit
2,0,13010 · Land Held for Investment:13011 · Land ...,3561613.12,0.0,0.0,13011.1 · Land,13011.1,Land
3,0,13010 · Land Held for Investment:13011 · Land ...,131260.0,0.0,0.0,13011.3 · Assignment Fee,13011.3,Assignment Fee
4,0,13010 · Land Held for Investment:13011 · Land ...,48155.45,0.0,0.0,13012 · Closing Cost,13012.0,Closing Cost


#### 5) Add suffixes to account numbers

In [153]:
# assign suffix variable to the value of the entity_dict belonging to the filename as key
suffix = entity_dict['ABC Subsidiary.xlsx']

# to avoid adding the suffix multiple time, check if the suffix is in 1 entry of the account numbers column before appending it to all of the account numbers
if suffix not in abc['account_number'][0]:
    abc['account_number'] += suffix

In [154]:
abc['account_number'].head()

0       1010.ABC
1       1210.ABC
2    13011.1.ABC
3    13011.3.ABC
4      13012.ABC
Name: account_number, dtype: object

#### 6) Combine debit and credit columns into a single balance column
The balance column can be easily created by subtracting the credit column from the debit column

In [155]:
abc['balance'] = abc['Debit'] - abc['Credit']

abc.head()

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Debit,Unnamed: 3,Credit,accts_names,account_number,account_name,balance
0,0,1010 · Example Bank - Business Banking,716.8,0.0,0.0,1010 · Example Bank - Business Banking,1010.ABC,Example Bank - Business Banking,716.8
1,0,1210 · Cash In Transit,0.0,0.0,0.0,1210 · Cash In Transit,1210.ABC,Cash In Transit,0.0
2,0,13010 · Land Held for Investment:13011 · Land ...,3561613.12,0.0,0.0,13011.1 · Land,13011.1.ABC,Land,3561613.12
3,0,13010 · Land Held for Investment:13011 · Land ...,131260.0,0.0,0.0,13011.3 · Assignment Fee,13011.3.ABC,Assignment Fee,131260.0
4,0,13010 · Land Held for Investment:13011 · Land ...,48155.45,0.0,0.0,13012 · Closing Cost,13012.ABC,Closing Cost,48155.45


#### 7) Export a dataframe of only account numbers, account names, and balances to a new excel file in the output folder
Trial Balance imports in CCH Engagement take only 3 columns: account numbers, account names, and balances.

In [160]:
# Assign account_number, account_name, and balance to a list that will serve as a list of columns to export
export_cols = ['account_number', 'account_name', 'balance']

#filter the dataframe by those columns
abc[export_cols].head()

Unnamed: 0,account_number,account_name,balance
0,1010.ABC,Example Bank - Business Banking,716.8
1,1210.ABC,Cash In Transit,0.0
2,13011.1.ABC,Land,3561613.12
3,13011.3.ABC,Assignment Fee,131260.0
4,13012.ABC,Closing Cost,48155.45


In [159]:
# set index = False to avoid exporting the indexes along with the data
abc[export_cols].to_excel('./ready_for_tb_import/formatted_tb_ABC Subsidiary.xlsx', index=False)

#### 8) Move processed files to the appropriate output folder

In [166]:
# Set filename, source folder, and destination folder variables
file = 'ABC Subsidiary.xlsx'
source_folder = './quickbooks_data/'
dest_folder = './processed_quickbooks_files/'

# Use os.rename to replace the source folder with the destination folder in the file path (use the if statement to avoid errors if you run this cell twice!)
if file in os.listdir(source_folder):    
    os.rename(source_folder + file, dest_folder + file)

# Putting it All Together
* To make this process scalable, we can write a few functions that allow us to loop through the files in the dictionary of entities and apply all of the transformations that were performed on 'ABC Subsidiary.xlsx' to every file in our source folder. 
* Once these functions are written, there is very little difference in the time it takes to format 1 trial balance or 100. 
* The use of functions also helps make this code easily adaptable to trial balance formatting for other companies. Quickbooks data is generally exported in similar formats, usually just with different delimiters between account names and numbers. Adapting this approach to another client is often as easy as replacing the delimiter in your function and the names of the source folders and running your script!