### Problem Description

Given a set of transaction records (in CSV format, for now), create distinct ledgers for the distinct entities indicated in the records. 

#### Example

Date; Description; Credit; Debit; Entity; Ledger

12-03-2022; GOOGLE SERVICES; ; 50; ANT; IRS  
11-03-2022; GOOGLE SERVICES; ; 50; ANT; IRS  
10-03-2022; GOOGLE SERVICES; ; 50; ANT; IRS  
09-03-2022; GOOGLE SERVICES; ; 50; ANT; IRS  
12-03-2022; ADOBE CREATIVE; ; 75; ANT; IRS  
11-03-2022; ADOBE CREATIVE; ; 75; ANT; IRS  
10-03-2022; ADOBE CREATIVE; ; 75; ANT; IRS  
09-03-2022; ADOBE CREATIVE; ; 75; ANT; IRS  

### Use Case

- The user can upload a csv or 'target' a data source
- Transactions that are 'similar' are automatically grouped together
- The user can apply labels once to a single member of a group and it will propagate to other group members

Example transaction files in biz-docs -> finances
Helper methods available at https://github.com/omegahorizontech/psihesion/blob/main/app/backend/helpers/readers.py

In [1]:
import pandas as pd


In [47]:
transactions = pd.read_excel("OmegaHorizon-checking-2022Transactions.ods", engine="odf")
print(transactions.head())

  POSTING DATE  DEPOSITS & OTHER CREDITS (+)  WITHDRAWALS & OTHER DEBITS (-)  \
0   2022-01-04                           NaN                            6.81   
1   2022-02-04                           NaN                            1.81   
2   2022-03-04                           NaN                            1.81   
3   2022-04-04                           NaN                            1.81   
4   2022-05-03                           NaN                            1.81   

               TRANSACTION DESCRIPTION                Type  
0  Amazon web services   aws.amazon.co  Service – Software  
1  Amazon web services   aws.amazon.co  Service – Software  
2  Amazon web services   aws.amazon.co  Service – Software  
3  Amazon web services   aws.amazon.co  Service – Software  
4  Amazon web services   aws.amazon.co  Service – Software  


In [48]:
# Group Similar Transactions
groups = transactions.groupby(["TRANSACTION DESCRIPTION"])
group_info = pd.DataFrame()

# Select Info to Display about Groups
sums = groups.sum()
group_info["COUNTS"] = groups.count()["POSTING DATE"]
group_info["DEPOSITS"] = sums["DEPOSITS & OTHER CREDITS (+)"]
group_info["DEBITS"] = sums["WITHDRAWALS & OTHER DEBITS (-)"]
group_info = group_info.assign(GROUP_ID=pd.RangeIndex(len(group_info.index)))

print(group_info)

                                            COUNTS  DEPOSITS  DEBITS  GROUP_ID
TRANSACTION DESCRIPTION                                                       
Amazon web services   aws.amazon.co             14       0.0  109.72         0
BUNDLE FEE WAIVER                               12     300.0    0.00         1
GITHUB                HTTPSGITHUB.C             12       0.0   96.00         2
INTUIT *QBooks Online CL.INTUIT.COM              6       0.0  180.00         3
INTUIT *QuickBooks OnlCL.INTUIT.COM              6       0.0  150.00         4
INTUIT *TURBOTAX      CL.INTUIT.COM              2       0.0  290.00         5
NOUNPROJECT.COM       THENOUNPROJEC              1       0.0   39.99         6
SERVICE CHARGE FOR ACCOUNT 000009872519609      12       0.0  300.00         7
WARREN COUUNTY ONLINE 540-6352215                2       0.0   19.99         8
WEB XFER FROM CHK 00009875147531                 2    1120.0    0.00         9
WEB XFER TO CHK   00009875147531                 1  

In [59]:
print(groups.groups)
print(dir(groups))

{'Amazon web services   aws.amazon.co': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], 'BUNDLE FEE WAIVER': [14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25], 'GITHUB                HTTPSGITHUB.C': [26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37], 'INTUIT *QBooks Online CL.INTUIT.COM': [38, 39, 40, 41, 42, 43], 'INTUIT *QuickBooks OnlCL.INTUIT.COM': [44, 45, 46, 47, 48, 49], 'INTUIT *TURBOTAX      CL.INTUIT.COM': [50, 51], 'NOUNPROJECT.COM       THENOUNPROJEC': [52], 'SERVICE CHARGE FOR ACCOUNT 000009872519609': [53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64], 'WARREN COUUNTY ONLINE 540-6352215': [65, 66], 'WEB XFER FROM CHK 00009875147531': [67, 68], 'WEB XFER TO CHK   00009875147531': [69]}
['Type', '__annotations__', '__class__', '__class_getitem__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattr__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__m

In [66]:
# group_info.assign(ID=pd.RangeIndex(len(group_info.index)))

# grouped_transactions = transactions.join(groups.index)
# grouped_transactions = transactions.assign(GROUP_ID=group_info[group_info.index == transactions["TRANSACTION DESCRIPTION"]]["GROUP_ID"])
grouped_transactions = transactions.join(group_info, on="TRANSACTION DESCRIPTION").drop(columns=["COUNTS", "DEPOSITS", "DEBITS"])

print(grouped_transactions)

   POSTING DATE  DEPOSITS & OTHER CREDITS (+)  WITHDRAWALS & OTHER DEBITS (-)  \
0    2022-01-04                           NaN                            6.81   
1    2022-02-04                           NaN                            1.81   
2    2022-03-04                           NaN                            1.81   
3    2022-04-04                           NaN                            1.81   
4    2022-05-03                           NaN                            1.81   
..          ...                           ...                             ...   
65   2022-01-06                           NaN                           10.20   
66   2022-06-21                           NaN                            9.79   
67   2022-03-02                         620.0                             NaN   
68   2022-12-29                         500.0                             NaN   
69   2022-12-19                           NaN                          250.00   

                TRANSACTION

In [52]:
# grouped_transactions.index = grouped_transactions["GROUP_ID"]

print(grouped_transactions.head(10))

  POSTING DATE  DEPOSITS & OTHER CREDITS (+)  WITHDRAWALS & OTHER DEBITS (-)  \
0   2022-01-04                           NaN                            6.81   
1   2022-02-04                           NaN                            1.81   
2   2022-03-04                           NaN                            1.81   
3   2022-04-04                           NaN                            1.81   
4   2022-05-03                           NaN                            1.81   
5   2022-06-06                           NaN                            1.81   
6   2022-07-05                           NaN                            1.81   
7   2022-08-04                           NaN                            1.81   
8   2022-09-06                           NaN                            1.81   
9   2022-09-30                           NaN                           71.00   

               TRANSACTION DESCRIPTION                Type  COUNTS  DEPOSITS  \
0  Amazon web services   aws.amazon.co 

In [77]:
{k: 0 for k in grouped_transactions.itertuples()}

AttributeError: 'DataFrame' object has no attribute 'records'