As the year wraps up, your role as a business analyst at a rapidly growing start-up takes a new turn. The VP of Finance has reached out, seeking help with the massive general ledger (GL) data that has grown beyond what the financial analysts can manage in Excel. Your task? Automate the workflow to create a consolidated Profit and Loss (P&L) statement for monthly reporting.

You have been provided with 2024’s GL data and the chart of accounts. Now, it’s it is your job to organize the transaction data monthly according to the GL classes to streamline the financial reporting process.

In [2]:
import pandas as pd
import numpy as np
import os

path = r'G:\My Drive\Python Challenges\Alteryx Challenges in Python\457'

gl = pd.read_csv(os.path.join(path, 'General Ledger Data.csv'), index_col = False)

accounts = pd.read_csv(os.path.join(path, 'Chart of Accounts.csv'), index_col = False)

In [3]:
gl.head()

Unnamed: 0,EntryNo,Date,Territory_key,Account_key,Details,Amount
0,1.1,2024-01-01,1,230,Cost of Sales,-884
1,2.1,2024-01-01,1,230,Cost of Sales,-1120
2,3.1,2024-01-01,1,280,Credit Expenses,-2394
3,4.1,2024-01-01,1,210,Credit Sales,2948
4,5.1,2024-01-01,1,210,Cash Sales,3734


In [17]:
gl.value_counts()

EntryNo  Date        Territory_key  Account_key  Details                     Amount
1.1      2024-01-01  1              230          Cost of Sales               -884      1
1321.1   2019-12-31  4              380          Depreciation for the month  -5950     1
                     6              380          Depreciation for the month  -4900     1
                     7              380          Depreciation for the month  -4900     1
         2020-01-01  1              210          Cash Sales                   8065     1
                                                                                      ..
657.1    2019-01-01  1              230          Cost of Sales               -1370     1
                     2              230          Cost of Sales               -343      1
                     5              230          Cost of Sales               -445      1
                     6              230          Cost of Sales               -856      1
1992.3   2020-12-30  7    

In [7]:
accounts.head(20)

Unnamed: 0,Account_key,Report,Class,Account
0,210,Profit and Loss,Sales,Sales
1,220,Profit and Loss,Sales,Sales Return
2,230,Profit and Loss,Cost of Sales,Cost of Sales
3,240,Profit and Loss,Operating Expenses,Staff Costs
4,250,Profit and Loss,Operating Expenses,Bad Debt Expense
5,260,Profit and Loss,Operating Expenses,Commissions
6,270,Profit and Loss,Operating Expenses,Conferences
7,280,Profit and Loss,Operating Expenses,Advertisements
8,290,Profit and Loss,Operating Expenses,Travel
9,300,Profit and Loss,Operating Expenses,Entertainment


In [18]:
df = gl.merge(accounts, how='left', on=['Account_key'])

In [19]:
df.head()

Unnamed: 0,EntryNo,Date,Territory_key,Account_key,Details,Amount,Report,Class,Account
0,1.1,2024-01-01,1,230,Cost of Sales,-884,Profit and Loss,Cost of Sales,Cost of Sales
1,2.1,2024-01-01,1,230,Cost of Sales,-1120,Profit and Loss,Cost of Sales,Cost of Sales
2,3.1,2024-01-01,1,280,Credit Expenses,-2394,Profit and Loss,Operating Expenses,Advertisements
3,4.1,2024-01-01,1,210,Credit Sales,2948,Profit and Loss,Sales,Sales
4,5.1,2024-01-01,1,210,Cash Sales,3734,Profit and Loss,Sales,Sales


In [23]:
# Lazy way of taking month and year
df['Year'] = df['Date'].str[0:4]
df['Month'] = df['Date'].str[5:7]

In [29]:
# Filter results to just 2024 data
df24 = df.loc[df['Year']=='2024']

In [31]:
df24.head()

Unnamed: 0,EntryNo,Date,Territory_key,Account_key,Details,Amount,Report,Class,Account,Month Year,Year,Month
0,1.1,2024-01-01,1,230,Cost of Sales,-884,Profit and Loss,Cost of Sales,Cost of Sales,2024-01,2024,1
1,2.1,2024-01-01,1,230,Cost of Sales,-1120,Profit and Loss,Cost of Sales,Cost of Sales,2024-01,2024,1
2,3.1,2024-01-01,1,280,Credit Expenses,-2394,Profit and Loss,Operating Expenses,Advertisements,2024-01,2024,1
3,4.1,2024-01-01,1,210,Credit Sales,2948,Profit and Loss,Sales,Sales,2024-01,2024,1
4,5.1,2024-01-01,1,210,Cash Sales,3734,Profit and Loss,Sales,Sales,2024-01,2024,1


In [36]:
from datetime import datetime

In [42]:
# Deriving month names
df['Month Name'] = pd.to_datetime(df['Date'], format='%Y-%m-%d').dt.strftime('%B')

In [43]:
df.head()

Unnamed: 0,EntryNo,Date,Territory_key,Account_key,Details,Amount,Report,Class,Account,Month Year,Year,Month,Month Name
0,1.1,2024-01-01,1,230,Cost of Sales,-884,Profit and Loss,Cost of Sales,Cost of Sales,2024-01,2024,1,January
1,2.1,2024-01-01,1,230,Cost of Sales,-1120,Profit and Loss,Cost of Sales,Cost of Sales,2024-01,2024,1,January
2,3.1,2024-01-01,1,280,Credit Expenses,-2394,Profit and Loss,Operating Expenses,Advertisements,2024-01,2024,1,January
3,4.1,2024-01-01,1,210,Credit Sales,2948,Profit and Loss,Sales,Sales,2024-01,2024,1,January
4,5.1,2024-01-01,1,210,Cash Sales,3734,Profit and Loss,Sales,Sales,2024-01,2024,1,January


In [45]:
solution = df.pivot_table(values='Amount', index='Class', columns='Month Name', aggfunc='sum')

In [46]:
# Yeowch, they're totally out of order
solution

Month Name,April,August,December,February,January,July,June,March,May,November,October,September
Class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Cost of Sales,-362082.0,-539100.0,-588184.0,-212342.0,-333820.0,-558773.0,-339505.0,-286774.0,-345751.0,-735648.0,-591033.0,-522478.0
Depreciation & Amortization,-137825.0,-137825.0,-137825.0,-137825.0,-137825.0,-137825.0,-137825.0,-137825.0,-137825.0,-137825.0,-137825.0,-137825.0
Dividend,,,,,,,,,,67698.0,,
Exchange Loss/Gain,1163.0,-348.0,456.0,317.0,988.0,1072.0,487.0,1699.0,1368.0,1555.0,1758.0,420.0
Gain/Loss on Sales of Asset,,,16286.0,,,,,,,,,
Interest,637.0,-1889.0,110.0,3180.0,-657.0,-1203.0,-424.0,1794.0,-183.0,-798.0,-1161.0,-1265.0
Operating Expenses,-481339.0,-576509.0,-399224.0,-445218.0,-466506.0,-618477.0,-512179.0,-523234.0,-569877.0,-590561.0,-554415.0,-563058.0
Sales,1103670.0,1713521.0,1867480.0,651190.0,1046468.0,1777538.0,1067888.0,894417.0,1120884.0,2384673.0,1831216.0,1649697.0
Tax,-19486.0,-79868.0,-129231.0,33171.0,-9846.0,-86142.0,-8001.0,22898.0,-5809.0,-167649.0,-96507.0,-68297.0


In [56]:
# Taking list of unique months
month_order = df['Month Name'].unique()

In [58]:
month_order

array(['January', 'February', 'March', 'April', 'May', 'June', 'July',
       'August', 'September', 'October', 'November', 'December'],
      dtype=object)

In [59]:
# Doing some nonsense that ascribes the order they're currently in to the column itself
df['Month Name'] = pd.Categorical(df['Month Name'], categories=month_order, ordered=True)

In [60]:
# Now that the column has an "intrinsic" order, pivot_table sorts the columns by that order instead of alphabetical like default
solution = df.pivot_table(values='Amount', index='Class', columns='Month Name', aggfunc='sum')

In [61]:
solution

Month Name,January,February,March,April,May,June,July,August,September,October,November,December
Class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Cost of Sales,-333820,-212342,-286774,-362082,-345751,-339505,-558773,-539100,-522478,-591033,-735648,-588184
Depreciation & Amortization,-137825,-137825,-137825,-137825,-137825,-137825,-137825,-137825,-137825,-137825,-137825,-137825
Dividend,0,0,0,0,0,0,0,0,0,0,67698,0
Exchange Loss/Gain,988,317,1699,1163,1368,487,1072,-348,420,1758,1555,456
Gain/Loss on Sales of Asset,0,0,0,0,0,0,0,0,0,0,0,16286
Interest,-657,3180,1794,637,-183,-424,-1203,-1889,-1265,-1161,-798,110
Operating Expenses,-466506,-445218,-523234,-481339,-569877,-512179,-618477,-576509,-563058,-554415,-590561,-399224
Sales,1046468,651190,894417,1103670,1120884,1067888,1777538,1713521,1649697,1831216,2384673,1867480
Tax,-9846,33171,22898,-19486,-5809,-8001,-86142,-79868,-68297,-96507,-167649,-129231


Per Claude:
Is a DataFrame a collection of arrays?
Conceptually, yes! Under the hood, pandas stores each column as a NumPy array (or similar structure), but wraps them with additional functionality like labels, indexing, and data type handling. So a DataFrame is like a collection of Series (1D structures), each backed by arrays, all sharing the same index.