# Financial Statements Analyzer

By Ken Burchfiel

Released under the MIT license

This file allows you to analyze and visualize a coded set of transactions for a given time period. This file runs off an edited copy of your financial transactions data generated by the financial_statements_coding_tool Jupyter Notebook file. In other words, you'll want to modify and run that script **before** running this one.

In [None]:
import pandas as pd
import numpy as np
import os
import time
from IPython.display import Image
import plotly.express as px
start_time = time.time()
import matplotlib.pyplot as plt
import kaleido
# Note: the most recent version of kaleido, a package used with Plotly to create
# screenshots of graphs, didn't work for me. However,
# specifying an older version by entering conda install python-kaleido=0.1.0
# worked great. See https://github.com/plotly/Kaleido/issues/120 )
from generate_screenshot import generate_screenshot

This script first imports a record of your financial transactions. This is a manually updated copy of the .csv file that was created by the financial_statements_coding_tool script. (Manual edits to this file will normally be necessary in order to add codes for expenses that weren't picked up by that script.)

In [None]:
transactions_year = 2022 # Make sure this variable matches its corresponding
# variable in the financial_statements_coding_tool script.

In [None]:
coded_transactions_folder = 'coded_transactions' # The name of
# the folder that stores your coded financial data

expenses_path = coded_transactions_folder + '\\'+str(
    transactions_year)+'_finances_updated_in_python_edited.csv' 
# The above path points to an edited copy of finances_updated_in_python.csv
# in which I filled in missing subcodes.

summarized_data_folder = 'summarized_data' # This folder will store summarized
# transactions data. 

expense_codes_path = 'sample_finance_codes.csv' # The script will use the codes
# and subcodes stored in this folder to categorize your transactions. Make sure
# to update this file so that the codes and subcodes match your own spending
# patterns.
interactive_charts_folder = 'interactive_charts' 
# Interactive .html charts created by this script will go here.
static_charts_folder = 'static_charts' # .png versions of the interactive
# charts will go here.

In [None]:
# pd.set_option('display.max_rows', 300)

The script will now access the expense codes stored in the file located at expense_codes_path so that they can be merged with the edited version of your financial records. Although your edited financial records already contain subcode values, this merge operation will allow you to view each subcode's corresponding code, subcode description, and code description.

In [None]:
expense_codes = pd.read_csv(expense_codes_path).drop('Notes', axis = 1)

expense_codes

The script next imports the aforementioned edited copy of your financial transactions:

In [None]:
df_expenses = pd.read_csv(expenses_path)
df_expenses['Amount'] = df_expenses['Amount'].astype('float') # Ensures that
# transaction amounts are stored as numbers (which is essential for 
# our analyses)
df_expenses

### Merging in expense codes, subcodes, and descriptions:

In [None]:
df_expenses_merged = df_expenses.merge(expense_codes, 
    on = 'Subcode', how = 'left')
subcode_col = df_expenses_merged.columns.get_loc('Subcode') # get_loc returns
# an integer corresponding to the 'Subcode' column's index position.
df_expenses_merged.insert(subcode_col, 'Code', df_expenses_merged.pop('Code')) 
# The above line positions the Code column before the Subcode column.
df_expenses_merged

## Data analysis:

The script is now ready to analyze and visualize this financial data. It will first use the pivot_table() function within pandas to calculate total transaction amounts by month and subcode.

In [None]:
df_expenses_by_subcode_by_month = df_expenses_merged.pivot_table(
    index = ['Month', 'Code', 'Subcode', 'Code_Description', 
    'Subcode_Description'], values = 'Amount', aggfunc = 'sum').reset_index()
df_expenses_by_subcode_by_month

I'll also save this and other pivot tables to a .csv.

In [None]:
df_expenses_by_subcode_by_month.to_csv(summarized_data_folder + '\\' 
    + str(transactions_year)+'_monthly_spending_by_subcodes.csv')

I'll create a similar table that shows the annual spending amounts for all subcodes.

In [None]:
df_expenses_by_subcode_by_year = df_expenses_merged.pivot_table(
    index = ['Code', 'Subcode', 'Code_Description', 
    'Subcode_Description'], values = 'Amount', aggfunc = 'sum').reset_index()
df_expenses_by_subcode_by_year

In [None]:
df_expenses_by_subcode_by_year.to_csv(summarized_data_folder + '\\' 
    + str(transactions_year)+'_yearly_spending_by_subcodes.csv')

## Creating subsets of these lists that include only selected codes:

When analyzing your spending data, you'll likely want to exclude certain codes or subcodes from consideration. For instance, it makes sense to exclude credit card payments, since otherwise your credit card spending will be counted double (once when you make a credit card transaction, and again when you make the payment on that transaction.) 

Therefore, the following cell selects specific subcodes for analysis. It does so by using the ~ symbol to query all rows that *do not* belong to certain codes.

In [None]:
df_expenses_by_selected_subcodes_by_month = \
df_expenses_by_subcode_by_month.query(
    "~Code.isin(['I', 'G', 'J', 'S', 'V', 'Z'])").dropna(
        subset = 'Code').copy()
df_expenses_by_selected_subcodes_by_month

In [None]:
df_expenses_by_selected_subcodes_by_month.to_csv(summarized_data_folder + '\\' 
    + str(transactions_year)+'_monthly_spending_by_selected_subcodes.csv')

The following cell creates a table similar to the above table, except that it categorizes expenses by code rather than by subcode.

In [None]:
df_expenses_by_selected_codes_by_month = \
df_expenses_by_selected_subcodes_by_month.pivot_table(
    index = ['Month', 'Code', 'Code_Description'], 
    values = 'Amount', aggfunc = 'sum').reset_index()
df_expenses_by_selected_codes_by_month
# Useful for line charts

In [None]:
df_expenses_by_selected_codes_by_month.to_csv(summarized_data_folder + '\\' 
    + str(transactions_year)+'_monthly_spending_by_selected_codes.csv')

Next, the script generates annual transaction totals for each subcode of interest:

In [None]:
df_expenses_by_selected_subcodes_by_year = \
df_expenses_by_subcode_by_year.query(
    "~Code.isin(['I', 'G', 'J', 'S', 'V', 'Z'])").dropna(
        subset = 'Code').copy()
df_expenses_by_selected_subcodes_by_year

In [None]:
df_expenses_by_selected_subcodes_by_year.to_csv(summarized_data_folder + '\\' 
    + str(
        transactions_year)+'_annual_spending_by_selected_subcodes_by_year.csv')

In [None]:
sum(df_expenses_by_selected_subcodes_by_month['Amount']) # This line calculates
# the total amount of spending within the filtered list of subcodes.

## Visualizing spending through interactive Plotly bar charts

Plotly is a fantastic library for creating interactive bar charts. You can hover over each bar to see the actual value corresponding to that bar, and you can also zoom and pan within the charts to get a closer look at your data. 

Static charts are easier to share, but Plotly allows you to convert interactive charts to static ones, which I'll demonstrate soon.

First, I'll create a chart showing my annual spending for each subcode in my filtered subcode list. Each bar represents a particular code, and the bars themselves are made of 'stacked' sums for each transaction subcode.

In [None]:
# See https://plotly.com/python/bar-charts/
fig_yearly_spending_by_selected_subcodes = px.bar(
    df_expenses_by_selected_subcodes_by_year, 
    x = 'Code_Description', y = 'Amount', color = 'Subcode_Description')
fig_yearly_spending_by_selected_subcodes.show()

Note that the HTML output of this chart likely won't appear within GitHub. You'll need to download this project and run it locally on your computer in order to display the HTML chart output.

It's easy to export this chart to a locally stored .html file:

In [None]:
fig_yearly_spending_by_selected_subcodes.write_html(
    interactive_charts_folder + '\\'+str(
        transactions_year)+'_fig_yearly_spending_by_selected_subcodes.html') 

This HTML chart can also be rendered as a static .png image. I'll show two methods of creating this image file.

The first method uses kaleido, a library recommended by Plotly for image exports. As noted above, I wasn't able to get the latest version of kaleido to work on my computer, so I instead switched to an older verison.

In [None]:
image_width = 1500 # Larger image widths produce a higher-quality graph, 
# but they also make the text of the HTML page appear smaller by contrast. 
# I found 1500 pixels to be a good compromise. Interestingly, when I tried 
# setting the image width as 3840 (e.g. UHD resolution) at one point, 
# the x axis did not line up properly with the chart.)
image_height = image_width * 9/16 # Maintains the image ratio 
# used by most monitors

The next cell converts the HTML chart created above to a static .png file using the width and height paramaters specified earlier.

In [None]:
file_path = static_charts_folder+'\\'+str(
    transactions_year)+'_yearly_spending_by_selected_subcodes.png'

fig_yearly_spending_by_selected_subcodes.write_image(file_path,
    width = image_width, height = image_height, engine = 'kaleido')
# See https://plotly.com/python/static-image-export/

And here's a copy of this image:

In [None]:
Image(filename = file_path)

If you have trouble getting kaleido to work, you can also use the Selenium library to automatically load images within a web browser and take screenshots of them.

In [None]:
generate_screenshot(
path_to_html = os.path.join(os.getcwd(), interactive_charts_folder),
html_name = str(
    transactions_year)+'_fig_yearly_spending_by_selected_subcodes.html', 
path_to_image = os.path.join(os.getcwd(), static_charts_folder), 
image_name = str(transactions_year)+\
'_yearly_spending_by_selected_subcodes_using_selenium', 
image_extension = '.png',
window_width = 1500) 
# See https://docs.python.org/3/library/os.path.html for the use of os.path.join().

This image looks very similar to the one generated using kaleido:

In [None]:
Image(filename = static_charts_folder+'\\'+str(
    transactions_year
    )+'_yearly_spending_by_selected_subcodes_using_selenium.png')

To simplify my code, I'll use just kaleido to create images of later charts within this project.

The next chart displays transaction amounts for each subcode on a monthly basis, allowing you to analyze changes in your spending over time.

In [None]:
fig_monthly_spending_by_selected_subcodes = px.bar(
    df_expenses_by_selected_subcodes_by_month, x = 'Month', y = 'Amount', 
    color = 'Subcode_Description')
fig_monthly_spending_by_selected_subcodes.write_html(
    interactive_charts_folder + '\\'+str(
        transactions_year)+'_monthly_spending_by_subcode.html') 
fig_monthly_spending_by_selected_subcodes.show()

In [None]:
# Saving the above chart as a .png image using kaleido:

file_path =  static_charts_folder+'\\'+str(
    transactions_year)+'_monthly_spending_by_selected_subcodes.png'

fig_monthly_spending_by_selected_subcodes.write_image(
    file_path, width = image_width, height = image_height, engine = 'kaleido')
# See https://plotly.com/python/static-image-export/
Image(filename = file_path)

Here's a similar chart that visualizes monthly expenses by code rather than by subcode.

One benefit of Plotly is that it makes filtering charts pretty easy. Suppose that you want to see what your monthly expenses look like without certain discretionary categories (such as travel and dining out). On the HTML version of this chart, you can simply click on the 'Dining' and 'Travel' categories to exclude them. The chart will then update to display only your non-dining and non-travel expenses.

In [None]:
fig_monthly_spending_by_selected_codes = px.histogram(
    df_expenses_by_selected_subcodes_by_month, x = 'Month', y = 'Amount', 
    color = 'Code_Description', histfunc = 'sum')
# See https://plotly.com/python/histograms/ . I created a histogram instead 
# of a bar chart here in order to eliminate the lines in between different
# subcodes, which can be distracting.
fig_monthly_spending_by_selected_codes.update_layout(bargroupgap = 0.2)
fig_monthly_spending_by_selected_codes.write_html(
    interactive_charts_folder + '\\'+str(
        transactions_year)+'_fig_monthly_spending_by_selected_codes.html') 
fig_monthly_spending_by_selected_codes.show()

In [None]:
file_path = static_charts_folder+'\\'+str(
    transactions_year)+'_monthly_spending_by_selected_codes.png'

fig_monthly_spending_by_selected_subcodes.write_image(file_path,
    width = image_width, height = image_height, engine = 'kaleido')
# See https://plotly.com/python/static-image-export/
Image(filename = file_path)

This final chart visualizes monthly spending by code as a line chart.

In [None]:
fig_monthly_spending_by_selected_codes_line_chart = px.line(
    df_expenses_by_selected_codes_by_month, x = 'Month', y = 'Amount', 
    color = 'Code_Description')
# See https://plotly.com/python/histograms/

fig_monthly_spending_by_selected_codes_line_chart.write_html(
    interactive_charts_folder + '\\'+str(
        transactions_year
        )+'_fig_monthly_spending_by_selected_codes_line_chart.html') 
fig_monthly_spending_by_selected_codes_line_chart.show()

In [None]:
file_path = static_charts_folder+'\\'+str(
    transactions_year)+'_monthly_spending_by_selected_codes_line_chart.png'

fig_monthly_spending_by_selected_codes_line_chart.write_image(file_path, 
    width = image_width, height = image_height, engine = 'kaleido')
# See https://plotly.com/python/static-image-export/
Image(filename = file_path)

I may update this project in the future to show how to visualize transactions over multiple years. One option would be to use the monthly expense bar and line charts as a starting point (with the x axis representing years instead of months).

But for now, that's it for this script. I hope that you find this project to be a useful starting point for coding and visualizing your own financial transactions!

