# Budget Analysis using Python and Google Spreadsheets

## Settings / configuration

In [1]:
# the folder in which we put our exported bank transaction files
clerkai_folder = "~/Documents/Clerk.ai"

# the name of the google spreadsheet we use to annotate transactions
transactions_gsheet_export_title = "Clerk.ai - Transactions"

# custom columns that you want to annotate your transactions with (optional: can be left empty)
additional_transactions_editable_columns = []

## Import libraries and set up some global helpers

In [2]:
# an authorized gspread client is required for gsheets import/export
import gspread
gsheets_client = gspread.oauth()

In [3]:
# import and init clerk.ai notebook helpers / functions
from clerkai.nb_helpers import init_notebook_and_return_helpers
helpers = init_notebook_and_return_helpers(clerkai_folder)
transactions = helpers["transactions"]
download_and_store_gsheets_edits = helpers["download_and_store_gsheets_edits"]
from clerkai.utils import export_to_gsheets

In [4]:
# a general notebook helper function
from IPython.display import display
def display_full_df(df):
    with pd.option_context('display.max_rows', None, 'display.max_columns', None, 'display.max_colwidth', -1):  # more options can be specified also
        display(df)

## Import annotations and edits that we have done in Google Spreadsheets (if any)

In [5]:
from gspread import SpreadsheetNotFound, WorksheetNotFound
try:
    download_and_store_gsheets_edits(gsheets_client, transactions_gsheet_export_title, "Transactions", "transactions")
except SpreadsheetNotFound:
    print("Spreadsheet not found (it will be created later)")
except WorksheetNotFound:
    print("WorksheetNotFound not found (it will be created later)")

Creating '2020-05-29 1628 (bb49)/Transactions.gsheets.Clerk.ai - Transactions.Transactions.2020-05-29 182338062.xlsx'


## Scan transactions files

On the first run of the cell below, Clerk.ai will create a file under the Edits-folder called `Transaction Files.xlsx` and then quit. Open the file using Excel/LibreOffice and fill the following columns:

* `Account provider` - what you call the bank or service that exported the file (eg "Bank of America")
* `Account`- what you call the account
* `Content type` - one of the content types listed [here](https://github.com/clerklabs/python-clerkai/blob/master/clerkai/transactions/parse.py). If none match, please submit an issue at https://github.com/clerklabs/python-clerkai/issues and request it (or create it and then submit a PR).

In [6]:
(transaction_files_df, possibly_edited_transaction_files_df, unsuccessfully_parsed_transaction_files, successfully_parsed_transaction_files, all_parsed_transactions_df, transactions_df, possibly_edited_transactions_df, transaction_files_editable_columns, transactions_editable_columns) = transactions(failfast=False,keep_unmerged_previous_edits=False,additional_transactions_editable_columns=additional_transactions_editable_columns)

print("")
print(".:: Parsing done")
print("Amount of non-ignored transaction files:", len(unsuccessfully_parsed_transaction_files)+len(successfully_parsed_transaction_files))
print("Unsuccessfully parsed files:", len(unsuccessfully_parsed_transaction_files))
print("Successfully parsed files:", len(successfully_parsed_transaction_files))
print("Transactions including duplicates:", len(all_parsed_transactions_df))
print("Transactions:", len(possibly_edited_transactions_df))
print("")

# to see which files are not parsed - due to errors
if len(unsuccessfully_parsed_transaction_files) > 0:
    print("Some transaction files were not parsed:")
    display_full_df(unsuccessfully_parsed_transaction_files)
else:
    print("All transaction files parsed successfully")

Returning existing Transaction files.xlsx (ignoring currently parsed data)
Merging edits from 1 edit file(s) and Transactions.xlsx into a new Transactions.xlsx (ignoring currently parsed data)
Creating '2020-05-29 1628 (bb49)/Transactions.xlsx'

.:: Parsing done
Amount of non-ignored transaction files: 2
Unsuccessfully parsed files: 0
Successfully parsed files: 2
Transactions including duplicates: 62
Transactions: 62

All transaction files parsed successfully


## (Optional) Run classifiers for automatic annotation of transactions

In [7]:
# run classifiers here

## Export results to Google Spreadsheets for manual annotations

In [8]:
export_to_gsheets(gsheets_client, possibly_edited_transactions_df, transactions_gsheet_export_title, "Transactions", "transactions", create_if_not_exists=True, editable_columns=transactions_editable_columns)

'https://docs.google.com/spreadsheets/d/1r8iqQWH0VyAOzs7wVprgijw6WQlNQkxk8FM_XVq6IDs'

## Next steps

Now go to the Google Spreadsheet and annotate all editable columns (marked with a white background instead of grey), then re-run all cells in this notebook to import those changes back and re-run automatic classifiers.

You can also run this notebook via the command line (launching the notebook only if errors occurred):

```
./run.sh
```