<a id='home'></a>
# Adventures in Data: Making a fake Balance Sheet and P&L data:

### Original Goal: I wanted to design cool applications for accounting/finance data.

I originally wanted to setup some pipelines to dashboards in Tableau and D3. Wanted build some interesting pipelines, visualizations and some data automation. We had one client with several billion rows of data, and was perfect of gaining insights and optimizing for large scale processing. No other finance guy in excel is going to be able to extract decent insights from a dataset that big. But it didn't happen...

### Problem: Companies are shy - no company wants to publish their transactional data. 

Essentially, the company said no. In fact 2-3 different clients also said no. Understandably, companies are sensitive especially for profitability information. When I worked a few projects with pre-earnings release data, they locked us in a conference room and covered the glass entrance door with paper and tape to ensure the revenue numbers didn't pre-announcement. All the data was heavily password protected and segregated from the rest of the analytics team.

### Solution: Make a random data generator to generate detailed 10k balance sheets and P&L. 

So instead of hoping and waiting that there would be a nice company that would be nice to open it's innards for experimentation. I decided to see if i could make a design generic company data. This would be nice for a few reasons:

Pros: 
- could generate data at will
- no privacy issues
- could share the piping / visualization code and the source data

Cons:
- have to design it
- have to consider some "reasonableness" to the data
- accounting data follows some specific rules
- make it varied enough and interesting enough that 1000 dollars doesn't = 100 x 10, it should vary
- stretch: design time-series incidents to the data
- stretch: auto generate accounting transactions at a document level

### Current Project Status:

- Oct 2016, prototyped the initial class, will generate annual data 

### To do

- Implement time-series by month
- implement performance events
- implement generation of document-level transactional data generation
- implement "bad data" factor - arbitrarily remove different fields to simulate real data issues

###  TLDR: Jump to the classes: RCU , FakeCompanyClass <a href=#class>link</a>

The completed class is long, and is located at the bottom of this page

# How I designed my class in Python

<a href='#home'>Back to the top</a>
## 1. A primer: Balance Sheets on company financials

Every company has one. Check on www.sec.gov to look up your favorite company. Look for either the 10Q or the 10K formal reports and lookup under "balance sheet". All companies have it. From a famous fruit company, take a look here:
http://investor.apple.com/financials.cfm


#### Definition
From investopedia: http://www.investopedia.com/terms/b/balancesheet.asp

"What is a 'Balance Sheet'
A balance sheet is a financial statement that summarizes a company's assets, liabilities and shareholders' equity at a specific point in time. These three balance sheet segments give investors an idea as to what the company owns and owes, as well as the amount invested by shareholders.

The balance sheet adheres to the following formula:

Assets = Liabilities + Shareholders' Equity"

<a href='#home'>Back to the top</a>
## 2. Some Target Goals

1. Make it scalable, want to make 1000 rows or up to 100M rows of randomized data
2. Data must still follow the accounting rules, specific accounts must add up, certain line items should be negative
3. Will need to invent hierarchies for products (for doing hierarchical analysis)
4. Will need to general account hierarchies... e.g. Accounts Receiveable, and Allowance for doubtful accounts roll up under "receivables"
5. ideally the numbers should vary. While Assets = Liabilties + Equity, it shouldn't be 0.5 = 0.25 + 0.25 every single time


<a href='#home'>Back to the top</a>
## 3. Import Python Libraries

As a note, pycountry is a great library for international country listings, codes, common numbers used. You may not know this, but this coding is used a lot in business for coding their data or legal entity reasons

In [31]:
# import libraries
import os
import pycountry
import random 
import string
import numpy as np
import pandas as pd
from random_words import RandomWords
from decimal import *

####  Key python library: randomwords

Instead of making everything boring with random numbers for IDs, will use words to make things more interesting. A sample is here below

In [32]:
for x in range (10):
    print RandomWords().random_word()

hums
official
picks
dents
prisons
date
peg
lighters
guys
goal


# Designing Fake Data: easier said then done

<a href='#home'>Back to the top</a>
## 4. Setting up some general parameters common in company financials (how random):
#### transactional detail, row count, legal entities, and accounts

From my consulting career, it's never certain what is available to be plotted. Ideally, will want to also randomize available transactional detail. I've seeded a complete listing of some common fields that companies keep in their data. We will later randomly sample from the total "field options" list

In [33]:
maxfields = random.randint(2,10)
fieldOptions = ('COSTCTR','PROFITCTR','PRODCODE','SEGMENT','BUSAREA','PROGRAM','PROJECT','REFE','BGROUP','SKU','SYSTEM')
maxrows = random.randint(100,400)
maxentities = random.randint(4,30)

# will have a minimum of 2 accounts per balance sheet line item
maxaccounts = random.randint(40,120)


### Setup standard balance sheet hierarchy

#### These are the standard financial statement line items : FSLI and sub-level items

    Balancesheet --> FSLI --> subFSLI --> Accounts --> Legal Entity --> Additional Product Detail

From http://www.accountingtools.com/balance-sheet

    Current Assets:

    1-Cash and cash equivalents (+)
    2-Trade and other receivables (+)
    3-Investments (+)
    4-Inventories (+)
    5-Assets held for sale (+)

    Non-Current Assets:
    6-Property, plant, and equipment (+)
    7-Intangible assets (+)
    8-Goodwill (+)

    Current Liabilities:

    9-Trade and other payables (-) 
    10-Accrued expenses (-)
    11-Current tax liabilities (-)
    12-Current portion of loans payable (-)
    13-Other financial liabilities (-)
    14-Liabilities held for sale (-)

    Non-Current Liabilities:
    15-Loans payable (-)
    16-Deferred tax liabilities (-)
    17-Other non-current liabilities (-)

    Equity:
    18-Capital stock (-)
    19-Additional paid-in capital (-)
    20-Retained earnings (-)

#### Make a python dictionary of the above structure

In [34]:
#make a dictionary:
bs_def ={
    'assets':{
        'current':{
            'cash':1
            ,'trade_receivables':2
            ,'investments':3
            ,'inventories':4
            ,'assets_for_sale':5
        }
        ,'non-current':{
            'ppe':6
            ,'intangible':7
            ,'goodwill':8
        }
    }
    , 'liabilities':{
        'current':{
            'trade_payables':9
            ,'accrued_expenses':10
            ,'current_tax':11
            ,'current_loans':12
            ,'other_liab':13
            ,'liab_for_sale':14
        }
        ,'non-current':{
            'loans':15
            , 'def_tax':16
            , 'Other_non_liab':17
        
        }
        
    }
    , 'equity':{
        'equity':{
            'capital_stock':18
            , 'paid_capital':19
            ,'retained_earnings':20
           }
    }
}

In [35]:
bs_def

{'assets': {'current': {'assets_for_sale': 5,
   'cash': 1,
   'inventories': 4,
   'investments': 3,
   'trade_receivables': 2},
  'non-current': {'goodwill': 8, 'intangible': 7, 'ppe': 6}},
 'equity': {'equity': {'capital_stock': 18,
   'paid_capital': 19,
   'retained_earnings': 20}},
 'liabilities': {'current': {'accrued_expenses': 10,
   'current_loans': 12,
   'current_tax': 11,
   'liab_for_sale': 14,
   'other_liab': 13,
   'trade_payables': 9},
  'non-current': {'Other_non_liab': 17, 'def_tax': 16, 'loans': 15}}}

<a href='#home'>Back to the top</a>
## 5. Seeding Initial Balances into the company structure

Also, remember Assets = liabilties + equity. Will assign value to Assets, and Liabilities / Equity will be arbitrarily determined by picking a random distribution.

In [36]:
def seedCompany():
    # start with a couple of billion dollars as the inital company balance
    total_company_assets = random.randint(1,50)*1e9

    # back into liabilities with random % distribution
    total_company_liabilities = -random.random()*total_company_assets
    total_company_equity = -(total_company_assets + total_company_liabilities)

    # print the current total balances
    print 'Assets \t\t', "{:,}".format(total_company_assets)
    print 'Liabilities \t', "{:,}".format(total_company_liabilities)
    print 'Equity \t\t', "{:,}".format(total_company_equity)

    company = {
        'assets' :total_company_assets
        , 'liabilities' : total_company_liabilities
        , 'equity': total_company_equity
    }
    return company

company = seedCompany()

Assets 		5,000,000,000.0
Liabilities 	-2,520,892,265.82
Equity 		-2,479,107,734.18


<a href='#home'>Back to the top</a>
## 6. Setting up common randomized split functions

### Create a random splitting function - takes in total balances and splits randomly n times

#### Defintion: Pct Split functions

First returns a all-positive percentage that adds up to 1
Second returns a positive or negative percetange list that still adds up to 1, but may have swing values

In [37]:
def splitPosPct(n,swing=100):
    rand_nums = np.array([random.randint(1,swing) for i in range(n)])
    return rand_nums*1.0 / sum(rand_nums)

def splitSwingPct(n,swing=100):
    rand_nums = np.array([random.randint(-int(swing/2),int(swing/2)) for i in range(n)])
    return rand_nums*1.0 / sum(rand_nums)

#### Sample

In [38]:
print splitPosPct(6)
print splitSwingPct(6)

[ 0.13986014  0.26223776  0.04195804  0.21328671  0.25174825  0.09090909]
[-7.    8.75 -7.25  0.75  6.5  -0.75]


#### Definition: Sum split
For a input sum, returns n random integers that add up to the original sum 

In [39]:
def intSplit(count,n, minm=0,swing=100):
    base = [minm for i in range(n)]
    subcount = count - sum(base)
    first = [int(x*int(subcount)) for x in splitPosPct(n,swing)][:-1]
    first.append(subcount-sum(first))
    
    return list(np.array(first)+np.array(base))

#### Sample

In [40]:
print intSplit(17,5,2), sum(intSplit(17,5,1))

[3, 3, 3, 3, 5] 17


#### Definition: Sublist split by n, with minimum counts

Takes in a list and returns n lists with random subsets. Can set minimum for list

In [41]:
def listSplit(inputlist,n,minm=2,swing=100):
    inlist = inputlist[:]
    inlist.sort()
    divs = intSplit(len(inlist),n,minm,swing=100)
    outlist = []
    print divs
    for x in divs:
        divlist = []
        for i in range(x):
            divlist.append(inlist.pop(0))
        outlist.append(divlist)
    return outlist

#### Sample

In [42]:
print listSplit(['a','b','c','d','e','f','g','h','j','k','l','m','n','o','p','q'],4)

[2, 4, 4, 6]
[['a', 'b'], ['c', 'd', 'e', 'f'], ['g', 'h', 'j', 'k'], ['l', 'm', 'n', 'o', 'p', 'q']]


#### Definition: Balance split (same direction )

Takes in a total and uses pct split to make arbitrary balances

In [43]:
def splitBalance(total, n, swing=100):
    return splitPosPct(n,swing)*total

splitBalance(10, 5,swing=100)

array([ 1.16883117,  2.53246753,  0.45454545,  4.02597403,  1.81818182])

#### Postive / negative split (for lower level) - this is to mimic how accounting transactions open and close

In [44]:
def splitSwingBalance(total, n, swing=100):
    return splitSwingPct(n,swing) * total
# sample of 10 dollars
splitSwingBalance(10, 5,swing=100)

array([  5.8974359 ,   8.46153846,   9.74358974,  -3.84615385, -10.25641026])

<a href='#home'>Back to the top</a>
## 7. Design a Class for Financial Units

Each balance will have the following:

    RCU Class
        Name:
        Total: USD Balance
        Items List: of other RCUs

Its fairly straightforward and is designed for nesting. So if we had "bank account" for $100, it might be made up of "saving 50", "checking 20", and "stock 30" RCU units. Many of the functions are designed for recursive calls for splitting end nodes, or for printing dynamic depth trees 


** Sample of Nesting **

    RCU Class
        Name: RCU_BASE
        Total: USD Balance
        Items List: RCU_A, RCU_B, ...
        
            RCU Class
                Name: RCU_A
                Total: USD Balance
                Items List: RCU_AA, RCU_BB,..
                
                ..

In [84]:
class RCU(object):
    def __init__(self, name, total):
        self.name = name
        self.total = total
        self.delimiter = '|'
        self.items = {}
        
    def __str__(self):
        return 'name: %s , total: %s' % (self.name, "{:,}".format(self.total))
    
    def getItems(self):
        return self.items.values()
    
    #using recursive print
    def printItems(self,n=0):
        g = self.getItems()
        print n*'\t',self
        n+=1
        for x in g:
            x.printItems(n)
    
    def splitItems(self, labels,swing=100):
        n = len(labels)
        self.items = { j: RCU(j,k) for j,k in zip(labels, splitBalance(self.total,n,swing=100))}
        
    def explictItems(self, in_dict):
        if sum(in_dict.values()) == self.total:
            self.items = { j: RCU(j,k) for j,k in in_dict.items()}
        else:
            print 'total balance doesnt match submitted breakout of items'
    def splitEnds(self,labels):
        if len(self.getItems()) == 0:
            self.splitItems(random.sample(labels,int(len(labels)*.75)))
        else:
            for x in self.getItems():
                x.splitEnds(labels)
                
    def flatten(self):
        if len(self.getItems()) == 0:
            return [self.name+self.delimiter+str(self.total)]
        else:
            return_list = []
            for x in self.getItems():
                for y in x.flatten():
                    return_list.append(self.name +self.delimiter+ y)
            return return_list

### Testing the RCU class 

In [46]:
#create a generic RCU
newrcu = RCU('fakecompany', 100)
print newrcu

# split the initial balance of 100 by 3 different labels listed below
newrcu.splitItems(['apple','banana','pirate'])

# test the nested print function
newrcu.printItems()

# test the recursive print function
newrcu.explictItems({'apple':20 ,'banana':30,'pirate':50})

# test the recursive split function 
newrcu.splitEnds(['half','other_half','other_other_half'])
newrcu.printItems()

name: fakecompany , total: 100
 name: fakecompany , total: 100
	name: pirate , total: 62.0689655172
	name: apple , total: 4.31034482759
	name: banana , total: 33.6206896552
 name: fakecompany , total: 100
	name: pirate , total: 50
		name: other_other_half , total: 21.1409395973
		name: half , total: 28.8590604027
	name: banana , total: 30
		name: other_half , total: 6.71052631579
		name: half , total: 23.2894736842
	name: apple , total: 20
		name: half , total: 14.8275862069
		name: other_other_half , total: 5.1724137931


In [47]:
# test the recursive flatten function.
# should turn the nested dictionary into 
# pipe delimited lines with repeated values
print newrcu.flatten()

['fakecompany|pirate|other_other_half|21.1409395973', 'fakecompany|pirate|half|28.8590604027', 'fakecompany|banana|other_half|6.71052631579', 'fakecompany|banana|half|23.2894736842', 'fakecompany|apple|half|14.8275862069', 'fakecompany|apple|other_other_half|5.1724137931']


## Pilot: Use my data structure to setup my first company!

<a href='#home'>Back to the top</a>
### 8. Initialize highest level items ( Asset / Liab)

    Fakecompany --> FSLI Split
    Fakecompany = 0 dollars (accounting rule), balance sheets must "balance" which means the total = 0

In [48]:
mycompany = seedCompany()

fakecompany = RCU('fakecompany', 0)
fakecompany.explictItems(mycompany)

Assets 		49,000,000,000.0
Liabilities 	-276,133,861.848
Equity 		-48,723,866,138.2


<a href='#home'>Back to the top</a>
### 9. First Split: initialize below level items (current / non current)

    Fakecompany --> FSLI --> subFSLI split

In [49]:
for x in fakecompany.getItems():
    x.splitItems(bs_def[x.name].keys())
    for y in x.getItems():
        y.splitItems(bs_def[x.name][y.name].keys(),swing=10000)

In [50]:
fakecompany.printItems()

 name: fakecompany , total: 0
	name: liabilities , total: -276,133,861.848
		name: current , total: -201,790,129.812
			name: trade_payables , total: -26,780,788.8331
			name: accrued_expenses , total: -53,561,577.6662
			name: liab_for_sale , total: -28,026,406.9184
			name: other_liab , total: -19,929,889.3642
			name: current_tax , total: -39,236,969.6857
			name: current_loans , total: -34,254,497.3447
		name: non-current , total: -74,343,732.0361
			name: def_tax , total: -21,706,199.1346
			name: Other_non_liab , total: -51,552,222.9447
			name: loans , total: -1,085,309.95673
	name: assets , total: 49,000,000,000.0
		name: current , total: 24,752,577,319.6
			name: trade_receivables , total: 11,078,076,562.6
			name: inventories , total: 5,192,848,388.72
			name: assets_for_sale , total: 865,474,731.454
			name: investments , total: 6,750,702,905.34
			name: cash , total: 865,474,731.454
		name: non-current , total: 24,247,422,680.4
			name: intangible , total: 9,041,411,846.93


<a href='#home'>Back to the top</a>
### 10. Randomly generate the number of accounts - will choose 4 digit accounts to keep it simple

In [51]:
asset_accounts = [str(x).zfill(4) for x in random.sample(xrange(1000,1999), maxaccounts)]
liab_accounts = [str(x).zfill(4) for x in random.sample(xrange(2000,2899), maxaccounts)]
equity_accounts = [str(x).zfill(4) for x in random.sample(xrange(2900,2999), 10)]

asset_accounts.sort()
liab_accounts.sort()
equity_accounts.sort()
print asset_accounts
print liab_accounts
print equity_accounts

['1000', '1016', '1021', '1023', '1039', '1042', '1050', '1060', '1090', '1096', '1127', '1172', '1185', '1203', '1210', '1232', '1238', '1241', '1287', '1309', '1329', '1347', '1363', '1364', '1397', '1418', '1425', '1431', '1450', '1463', '1468', '1487', '1497', '1504', '1511', '1515', '1517', '1520', '1522', '1528', '1551', '1554', '1574', '1577', '1585', '1606', '1630', '1687', '1689', '1752', '1754', '1758', '1777', '1783', '1798', '1810', '1814', '1873', '1903', '1904', '1905', '1924', '1934', '1968', '1983', '1994']
['2011', '2017', '2025', '2031', '2035', '2038', '2040', '2049', '2071', '2080', '2085', '2090', '2093', '2097', '2098', '2123', '2124', '2135', '2148', '2175', '2193', '2224', '2244', '2256', '2266', '2270', '2283', '2287', '2299', '2315', '2332', '2343', '2350', '2358', '2383', '2394', '2412', '2414', '2439', '2452', '2459', '2464', '2468', '2471', '2526', '2553', '2597', '2599', '2643', '2664', '2699', '2707', '2711', '2744', '2760', '2763', '2770', '2783', '2786'

In [52]:
print listSplit(asset_accounts,5,2)

[2, 21, 5, 23, 15]
[['1000', '1016'], ['1021', '1023', '1039', '1042', '1050', '1060', '1090', '1096', '1127', '1172', '1185', '1203', '1210', '1232', '1238', '1241', '1287', '1309', '1329', '1347', '1363'], ['1364', '1397', '1418', '1425', '1431'], ['1450', '1463', '1468', '1487', '1497', '1504', '1511', '1515', '1517', '1520', '1522', '1528', '1551', '1554', '1574', '1577', '1585', '1606', '1630', '1687', '1689', '1752', '1754'], ['1758', '1777', '1783', '1798', '1810', '1814', '1873', '1903', '1904', '1905', '1924', '1934', '1968', '1983', '1994']]


<a href='#home'>Back to the top</a>
### 11. will do random account assignment to further split the values

    Fakecompany --> FSLI --> subFSLI --> account split

In [53]:

for x in fakecompany.items['assets'].getItems():
    print len(x.getItems())
    acct_ct = listSplit(asset_accounts,len(x.getItems()),2)
    print acct_ct
    for y,accts in zip(x.getItems(),acct_ct):
        y.splitItems(accts)

        
for x in fakecompany.items['liabilities'].getItems():
    print len(x.getItems())
    acct_ct = listSplit(liab_accounts,len(x.getItems()),2)
    print acct_ct
    for y,accts in zip(x.getItems(),acct_ct):
        y.splitItems(accts)
        
for x in fakecompany.items['equity'].getItems():
    print len(x.getItems())
    acct_ct = listSplit(equity_accounts,len(x.getItems()),2)
    print acct_ct
    for y,accts in zip(x.getItems(),acct_ct):
        y.splitItems(accts)
        
fakecompany.printItems()

5
[28, 2, 25, 2, 9]
[['1000', '1016', '1021', '1023', '1039', '1042', '1050', '1060', '1090', '1096', '1127', '1172', '1185', '1203', '1210', '1232', '1238', '1241', '1287', '1309', '1329', '1347', '1363', '1364', '1397', '1418', '1425', '1431'], ['1450', '1463'], ['1468', '1487', '1497', '1504', '1511', '1515', '1517', '1520', '1522', '1528', '1551', '1554', '1574', '1577', '1585', '1606', '1630', '1687', '1689', '1752', '1754', '1758', '1777', '1783', '1798'], ['1810', '1814'], ['1873', '1903', '1904', '1905', '1924', '1934', '1968', '1983', '1994']]
3
[28, 2, 36]
[['1000', '1016', '1021', '1023', '1039', '1042', '1050', '1060', '1090', '1096', '1127', '1172', '1185', '1203', '1210', '1232', '1238', '1241', '1287', '1309', '1329', '1347', '1363', '1364', '1397', '1418', '1425', '1431'], ['1450', '1463'], ['1468', '1487', '1497', '1504', '1511', '1515', '1517', '1520', '1522', '1528', '1551', '1554', '1574', '1577', '1585', '1606', '1630', '1687', '1689', '1752', '1754', '1758', '1777

<a href='#home'>Back to the top</a>
### 12. Adding Legal Entity, anywhere from 1-6 subentities per country

    Fakecompany --> FSLI --> subFSLI --> account --> legal Entity

In [54]:
alpha3 = [x.alpha3 for x in list(pycountry.countries)]
alpha3_entity = []

#typical transfer pricing legal entities
le_functions = ['factory', 'distributor', 'sales','rd_center','tax_entity','hq']
for y in alpha3:
    alpha3_entity.extend([str(y)+str(z)+'-'+random.sample(le_functions,1)[0] for z in range(1,random.randint(1,5))])

#total dictionary of possible entities    
print len(alpha3_entity)


501


In [55]:
# will choose only 30-80 legal entities
random_le = random.sample(alpha3_entity,random.randint(30,80))
random_le.sort()
fakecompany.splitEnds(random_le)

In [56]:
fakecompany.printItems()

 name: fakecompany , total: 0
	name: liabilities , total: -276,133,861.848
		name: current , total: -201,790,129.812
			name: trade_payables , total: -26,780,788.8331
				name: 2025 , total: -3,997,132.66166
					name: DMA2-sales , total: -168,638.444791
					name: GRD1-sales , total: -98,372.4261282
					name: ARM1-factory , total: -94,357.2250618
					name: AGO3-sales , total: -180,684.047991
					name: MYS3-hq , total: -194,737.251723
					name: NPL2-tax_entity , total: -114,433.230394
					name: MAC2-factory , total: -60,228.0159969
					name: TLS3-distributor , total: -18,068.4047991
					name: GUM2-factory , total: -108,410.428794
					name: ATA1-tax_entity , total: -18,068.4047991
					name: HND2-tax_entity , total: -66,250.8175966
					name: COD1-hq , total: -170,646.045324
					name: FLK1-distributor , total: -74,281.2197295
					name: SXM3-factory , total: -62,235.6165301
					name: HKG2-factory , total: -98,372.4261282
					name: BFA3-distributor , total: -62,235.6165301
			

<a href='#home'>Back to the top</a>
### 13. Turn existing nested structure to a flat format - then to DataFrame format

In [57]:
fakeco_list = [x.split('|') for x in fakecompany.flatten()]

In [58]:
fakeco_list

[['fakecompany',
  'liabilities',
  'current',
  'trade_payables',
  '2025',
  'DMA2-sales',
  '-168638.444791'],
 ['fakecompany',
  'liabilities',
  'current',
  'trade_payables',
  '2025',
  'GRD1-sales',
  '-98372.4261282'],
 ['fakecompany',
  'liabilities',
  'current',
  'trade_payables',
  '2025',
  'ARM1-factory',
  '-94357.2250618'],
 ['fakecompany',
  'liabilities',
  'current',
  'trade_payables',
  '2025',
  'AGO3-sales',
  '-180684.047991'],
 ['fakecompany',
  'liabilities',
  'current',
  'trade_payables',
  '2025',
  'MYS3-hq',
  '-194737.251723'],
 ['fakecompany',
  'liabilities',
  'current',
  'trade_payables',
  '2025',
  'NPL2-tax_entity',
  '-114433.230394'],
 ['fakecompany',
  'liabilities',
  'current',
  'trade_payables',
  '2025',
  'MAC2-factory',
  '-60228.0159969'],
 ['fakecompany',
  'liabilities',
  'current',
  'trade_payables',
  '2025',
  'TLS3-distributor',
  '-18068.4047991'],
 ['fakecompany',
  'liabilities',
  'current',
  'trade_payables',
  '2025',

In [59]:
df = pd.DataFrame(fakeco_list)

In [60]:
df[6] = df[6].astype(float)

In [61]:
print df.info()
print df.shape
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10138 entries, 0 to 10137
Data columns (total 7 columns):
0    10138 non-null object
1    10138 non-null object
2    10138 non-null object
3    10138 non-null object
4    10138 non-null object
5    10138 non-null object
6    10138 non-null float64
dtypes: float64(1), object(6)
memory usage: 554.5+ KB
None
(10138, 7)


Unnamed: 0,0,1,2,3,4,5,6
0,fakecompany,liabilities,current,trade_payables,2025,DMA2-sales,-168638.444791
1,fakecompany,liabilities,current,trade_payables,2025,GRD1-sales,-98372.426128
2,fakecompany,liabilities,current,trade_payables,2025,ARM1-factory,-94357.225062
3,fakecompany,liabilities,current,trade_payables,2025,AGO3-sales,-180684.047991
4,fakecompany,liabilities,current,trade_payables,2025,MYS3-hq,-194737.251723


### Note: Performance / Splitting:

    Fakecompany --> FSLI --> subFSLI --> account --> legal Entity

Up to this point, every split has ALL values. So account 2050 should be split over all Legal Entities. For the next section, there will be limited coverage. Some balances will be split to say A, B, C, and others to D, E, F. See below for a sample:

    100 dollars in legal entity --> 60 group 1 ---> A 30
                                               ---> B 20
                                               ---> C 10
                                --> 40 group 2 ---> D 15
                                               ---> E 15
                                               ---> F 10
                                               
As a result, splitting is more focused on smaller and smaller sets of data. This is supposed to represent a typical product hierarchy. Like Apple Hardware would have iMac, Macbook, iPhone. iPhone would not ALSO show in Apple Services.

## Product Data Detail 

<a href='#home'>Back to the top</a>
### 14. Data Detail - randomly select what product detail will be available 

In [63]:
maxfields = random.randint(2,4)
fieldPrefix = random.sample(fieldOptions,maxfields+1)
print fieldPrefix

['SYSTEM', 'SEGMENT', 'PRODCODE', 'PROFITCTR']


In [64]:
fieldstack = fieldPrefix[:]

def cartesianRows(inlist):
    if len(inlist) == 1:
        seed = random.randint(2,5)
        return [inlist[0]+'_'+RandomWords().random_words()[0] for i in range(seed)]
    else:
        seed = random.randint(2,5)
        local_list = []
        for y in [inlist[0]+'_'+RandomWords().random_words()[0] for i in range(seed)]:
            local_list.extend( y +'-'+ z for z in cartesianRows(inlist[1:]))
        return local_list

# for fieldname in inlist:
#     seed = random.randint(2,6)
#     print [RandomWords().random_words() for i in range(seed)]

product_data = cartesianRows(fieldstack)

<a href='#home'>Back to the top</a>
### 15. Do the final Split (may take a while)

In [68]:
product_data = [str(x) for x in product_data]

In [69]:
df.shape[0]

10138

In [73]:
len(product_data)
df.shape

t_matrix = np.zeros([df.shape[0],len(product_data)])
for i in range(df.shape[0]):
    t_matrix[i,:] = splitSwingPct(len(product_data))
t_matrix

array([[-0.10625   , -0.0875    , -0.1125    , ...,  0.2875    ,
         0.0125    ,  0.2       ],
       [ 0.0387931 , -0.19396552, -0.03448276, ..., -0.04741379,
        -0.06034483, -0.04741379],
       [-0.07729469,  0.06280193, -0.07729469, ..., -0.16425121,
        -0.06280193,  0.03864734],
       ..., 
       [-0.14728682,  0.18604651,  0.01162791, ..., -0.03875969,
        -0.0503876 ,  0.13565891],
       [-0.12121212,  0.15824916,  0.05387205, ...,  0.10774411,
        -0.15488215, -0.12794613],
       [ 1.13636364,  1.86363636,  1.13636364, ...,  1.40909091,
        -0.36363636, -0.09090909]])

In [74]:
(df[6].values * t_matrix.transpose()).transpose()

array([[  1.79178348e+04,   1.47558639e+04,   1.89718250e+04, ...,
         -4.84835529e+04,  -2.10798056e+03,  -3.37276890e+04],
       [ -3.81617170e+03,   1.90808585e+04,   3.39215263e+03, ...,
          4.66420986e+03,   5.93626709e+03,   4.66420986e+03],
       [  7.29331208e+03,  -5.92581607e+03,   7.29331208e+03, ...,
          1.54982882e+04,   5.92581607e+03,  -3.64665604e+03],
       ..., 
       [  2.16858229e+07,  -2.73926184e+07,  -1.71203865e+06, ...,
          5.70679550e+06,   7.41883416e+06,  -1.99737843e+07],
       [  2.04712215e+07,  -2.67263170e+07,  -9.09832068e+06, ...,
         -1.81966414e+07,   2.61576719e+07,   2.16085116e+07],
       [ -2.43587852e+08,  -3.99484078e+08,  -2.43587852e+08, ...,
         -3.02048937e+08,   7.79481127e+07,   1.94870282e+07]])

In [75]:
df2 = pd.DataFrame((df[6].values * t_matrix.transpose()).transpose())
df2.columns = product_data
df2.head()

Unnamed: 0,SYSTEM_course-SEGMENT_wash-PRODCODE_photo-PROFITCTR_capital,SYSTEM_course-SEGMENT_wash-PRODCODE_photo-PROFITCTR_furnaces,SYSTEM_course-SEGMENT_wash-PRODCODE_photo-PROFITCTR_tire,SYSTEM_course-SEGMENT_wash-PRODCODE_photo-PROFITCTR_anticipation,SYSTEM_course-SEGMENT_wash-PRODCODE_components-PROFITCTR_star,SYSTEM_course-SEGMENT_wash-PRODCODE_components-PROFITCTR_printout,SYSTEM_course-SEGMENT_wash-PRODCODE_pans-PROFITCTR_supply,SYSTEM_course-SEGMENT_wash-PRODCODE_pans-PROFITCTR_pipe,SYSTEM_course-SEGMENT_wash-PRODCODE_leapers-PROFITCTR_chart,SYSTEM_course-SEGMENT_wash-PRODCODE_leapers-PROFITCTR_chimney,...,SYSTEM_grain-SEGMENT_statement-PRODCODE_fruits-PROFITCTR_drips,SYSTEM_grain-SEGMENT_statement-PRODCODE_fruits-PROFITCTR_maximums,SYSTEM_grain-SEGMENT_statement-PRODCODE_contraband-PROFITCTR_circumference,SYSTEM_grain-SEGMENT_statement-PRODCODE_contraband-PROFITCTR_coordinate,SYSTEM_grain-SEGMENT_statement-PRODCODE_contraband-PROFITCTR_atom,SYSTEM_grain-SEGMENT_ore-PRODCODE_raise-PROFITCTR_churns,SYSTEM_grain-SEGMENT_ore-PRODCODE_raise-PROFITCTR_load,SYSTEM_grain-SEGMENT_ore-PRODCODE_raise-PROFITCTR_bricks,SYSTEM_grain-SEGMENT_ore-PRODCODE_semicolons-PROFITCTR_prop,SYSTEM_grain-SEGMENT_ore-PRODCODE_semicolons-PROFITCTR_desks
0,17917.834759,14755.863919,18971.825039,37943.650078,28457.737558,36889.659798,40051.630638,11593.893079,10539.902799,-11593.893079,...,24241.776439,-38997.640358,48483.552877,-43213.601478,38997.640358,-52699.513997,-9485.912519,-48483.552877,-2107.98056,-33727.688958
1,-3816.171703,19080.858516,3392.152625,-8056.362485,13144.591422,11872.534188,-8904.400641,-19928.896673,4240.190781,-13568.6105,...,12720.572344,3392.152625,-21200.953907,-12720.572344,5088.228938,-15264.686813,12720.572344,4664.20986,5936.267094,4664.20986
2,7293.312082,-5925.816067,7293.312082,-7749.144087,0.0,-2279.160026,-19144.944215,7749.144087,-20056.608226,12307.464138,...,14130.792159,11395.800128,5469.984062,-20056.608226,10484.136118,20056.608226,-9116.640103,15498.288174,5925.816067,-3646.656041
3,80097.052202,-26078.110019,-85685.218635,39117.165029,-14901.777154,-26078.110019,-68920.719337,72646.163625,29803.554308,-1862.722144,...,70783.441481,-13039.05501,-89410.662923,-76371.607914,-37254.442885,31666.276452,-68920.719337,93136.107212,76371.607914,-26078.110019
4,0.0,23861.086179,8466.837031,6157.699659,-23091.373722,33867.348126,-16163.961605,-32327.923211,-19242.811435,11545.686861,...,36946.197955,26170.223552,5387.987202,28479.360924,-38485.62287,31558.210754,10775.974404,-18473.098978,-37715.910413,-31558.210754


In [76]:
df = pd.merge(df,df2, how='inner', left_index=True, right_index=True)

In [77]:
# def addProductData(x):
#     if int(x.name) % 1000 == 0:
#         print x.name,
#     select_products = random.sample(product_data, int(len(product_data)*.75))
#     bals = splitSwingBalance(x[6],len(select_products))
#     for j,k in zip(select_products,bals ):
#         x[j] = k
#     return x

# df = df.apply(addProductData,axis=1)

In [78]:
flat_df = pd.melt(df,id_vars=[0,1,2,3,4,5,6,], var_name='variable')

In [79]:
fieldlkup = {x:i for i,x in enumerate(fieldstack)}
for field in fieldstack:
     flat_df[field] = flat_df['variable'].map(lambda x : x.split('-')[fieldlkup[field]].split('_')[1])


In [80]:
flat_df

Unnamed: 0,0,1,2,3,4,5,6,variable,value,SYSTEM,SEGMENT,PRODCODE,PROFITCTR
0,fakecompany,liabilities,current,trade_payables,2025,DMA2-sales,-1.686384e+05,SYSTEM_course-SEGMENT_wash-PRODCODE_photo-PROF...,1.791783e+04,course,wash,photo,capital
1,fakecompany,liabilities,current,trade_payables,2025,GRD1-sales,-9.837243e+04,SYSTEM_course-SEGMENT_wash-PRODCODE_photo-PROF...,-3.816172e+03,course,wash,photo,capital
2,fakecompany,liabilities,current,trade_payables,2025,ARM1-factory,-9.435723e+04,SYSTEM_course-SEGMENT_wash-PRODCODE_photo-PROF...,7.293312e+03,course,wash,photo,capital
3,fakecompany,liabilities,current,trade_payables,2025,AGO3-sales,-1.806840e+05,SYSTEM_course-SEGMENT_wash-PRODCODE_photo-PROF...,8.009705e+04,course,wash,photo,capital
4,fakecompany,liabilities,current,trade_payables,2025,MYS3-hq,-1.947373e+05,SYSTEM_course-SEGMENT_wash-PRODCODE_photo-PROF...,0.000000e+00,course,wash,photo,capital
5,fakecompany,liabilities,current,trade_payables,2025,NPL2-tax_entity,-1.144332e+05,SYSTEM_course-SEGMENT_wash-PRODCODE_photo-PROF...,1.123271e+04,course,wash,photo,capital
6,fakecompany,liabilities,current,trade_payables,2025,MAC2-factory,-6.022802e+04,SYSTEM_course-SEGMENT_wash-PRODCODE_photo-PROF...,3.725444e+03,course,wash,photo,capital
7,fakecompany,liabilities,current,trade_payables,2025,TLS3-distributor,-1.806840e+04,SYSTEM_course-SEGMENT_wash-PRODCODE_photo-PROF...,-8.699602e+03,course,wash,photo,capital
8,fakecompany,liabilities,current,trade_payables,2025,GUM2-factory,-1.084104e+05,SYSTEM_course-SEGMENT_wash-PRODCODE_photo-PROF...,-1.211646e+05,course,wash,photo,capital
9,fakecompany,liabilities,current,trade_payables,2025,ATA1-tax_entity,-1.806840e+04,SYSTEM_course-SEGMENT_wash-PRODCODE_photo-PROF...,-9.441148e+02,course,wash,photo,capital


<a href='#home'>Back to the top</a>
## Format the dataframe - drop additional columns

In [81]:
flat_df.rename(columns = {
        0:'company_name'
        , 1:'fsli_l1'
        , 2:'fsli_l2'
        , 3:'fsli_l3'
        , 4:'acct'
        , 5:'legal_entity'
        , 6:'total_amt'
        , 'value':'amount'
    }, inplace=True)
flat_df.drop(['variable','total_amt'],axis = 1,inplace=True)
flat_df.head()

Unnamed: 0,company_name,fsli_l1,fsli_l2,fsli_l3,acct,legal_entity,amount,SYSTEM,SEGMENT,PRODCODE,PROFITCTR
0,fakecompany,liabilities,current,trade_payables,2025,DMA2-sales,17917.834759,course,wash,photo,capital
1,fakecompany,liabilities,current,trade_payables,2025,GRD1-sales,-3816.171703,course,wash,photo,capital
2,fakecompany,liabilities,current,trade_payables,2025,ARM1-factory,7293.312082,course,wash,photo,capital
3,fakecompany,liabilities,current,trade_payables,2025,AGO3-sales,80097.052202,course,wash,photo,capital
4,fakecompany,liabilities,current,trade_payables,2025,MYS3-hq,0.0,course,wash,photo,capital


<a id='class'></a>
<a href='#home'>Back to the top</a>
## Grand Finale: Putting it all together as two Classes: RCU, FakeCompanyFinancials


In [86]:
# import libraries
import os
import pycountry
import random 
import string
import numpy as np
import pandas as pd
from random_words import RandomWords
from decimal import *
       
def splitPosPct(n,swing=100):
    rand_nums = np.array([random.randint(1,swing) for i in range(n)])
    return rand_nums*1.0 / sum(rand_nums)

def splitSwingPct(n,swing=100):
    rand_nums = np.array([random.randint(-int(swing/2),int(swing/2)) for i in range(n)])
    return rand_nums*1.0 / sum(rand_nums)

def intSplit(count,n, minm=0,swing=100):
    base = [minm for i in range(n)]
    subcount = count - sum(base)
    first = [int(x*int(subcount)) for x in splitPosPct(n,swing)][:-1]
    first.append(subcount-sum(first))

    return list(np.array(first)+np.array(base))


def listSplit(inputlist,n,minm=2,swing=100):
    inlist = inputlist[:]
    inlist.sort()
    divs = intSplit(len(inlist),n,minm,swing=100)
    outlist = []
    for x in divs:
        divlist = []
        for i in range(x):
            divlist.append(inlist.pop(0))
        outlist.append(divlist)
    return outlist


def splitBalance(total, n, swing=100):
    return splitPosPct(n,swing)*total

def splitSwingBalance(total, n, swing=100):
    return splitSwingPct(n,swing) * total


class RCU(object):
    def __init__(self, name, total):
        self.name = name
        self.total = total
        self.delimiter = '|'
        self.items = {}
        
    def __str__(self):
        return 'name: %s , total: %s' % (self.name, "{:,}".format(self.total))
    
    def getItems(self):
        return self.items.values()
    
    #using recursive print
    def printItems(self,n=0):
        g = self.getItems()
        print n*'\t',self
        n+=1
        for x in g:
            x.printItems(n)
    
    def splitItems(self, labels,swing=100):
        n = len(labels)
        self.items = { j: RCU(j,k) for j,k in zip(labels, splitBalance(self.total,n,swing=100))}
        
    def explictItems(self, in_dict):
        if sum(in_dict.values()) == self.total:
            self.items = { j: RCU(j,k) for j,k in in_dict.items()}
        else:
            print 'total balance doesnt match submitted breakout of items'
    def splitEnds(self,labels):
        if len(self.getItems()) == 0:
            self.splitItems(random.sample(labels,int(len(labels)*.75)))
        else:
            for x in self.getItems():
                x.splitEnds(labels)
                
    def flatten(self):
        if len(self.getItems()) == 0:
            return [self.name+self.delimiter+str(self.total)]
        else:
            return_list = []
            for x in self.getItems():
                for y in x.flatten():
                    return_list.append(self.name +self.delimiter+ y)
            return return_list

#================================================================================================
#================================================================================================
#================================================================================================
        
# class construction
class FakeCompanyFinancials(object):
    def __struct__(self):
        self.bs_def ={
            'assets':{
                'current':{
                    'cash':1
                    ,'trade_receivables':2
                    ,'investments':3
                    ,'inventories':4
                    ,'assets_for_sale':5
                }
                ,'non-current':{
                    'ppe':6
                    ,'intangible':7
                    ,'goodwill':8
                }
            }
            , 'liabilities':{
                'current':{
                    'trade_payables':9
                    ,'accrued_expenses':10
                    ,'current_tax':11
                    ,'current_loans':12
                    ,'other_liab':13
                    ,'liab_for_sale':14
                }
                ,'non-current':{
                    'loans':15
                    , 'def_tax':16
                    , 'Other_non_liab':17

                }

            }
            , 'equity':{
                'equity':{
                    'capital_stock':18
                    , 'paid_capital':19
                    ,'retained_earnings':20
                   }
            }
        }
        self.fieldOptions = ('COSTCTR','PROFITCTR','PRODCODE','SEGMENT','BUSAREA', \
                             'PROGRAM','PROJECT','REFE','BGROUP','SKU','SYSTEM')
 

    def _updatedDataFrame(self):
        list_format = [x.split('|') for x in self.rootRCU.flatten()]
        self.dataf = pd.DataFrame(list_format)    
    
        
    def __init__(self
                 , name = 'fakeCompany'
                 , maxfields = random.randint(2,5)
                , maxrows = random.randint(100,400)
                , maxentities = random.randint(4,30)
                 , maxaccounts = random.randint(40,120)
                ):
        self.__struct__()
        
        self.name = name
        self.maxfields = maxfields
        self.maxrows = maxrows
        self.maxaccounts = maxaccounts
        
        # start with a couple of billion dollars as the inital company balance
        total_company_assets = random.randint(1,50)*1e9

        # back into liabilities with random % distribution
        total_company_liabilities = -random.random()*total_company_assets
        total_company_equity = -(total_company_assets + total_company_liabilities)

        # print the current total balances
        print 'Assets \t\t', "{:,}".format(total_company_assets)
        print 'Liabilities \t', "{:,}".format(total_company_liabilities)
        print 'Equity \t\t', "{:,}".format(total_company_equity)

        self.company = {
            'assets' :total_company_assets
            , 'liabilities' : total_company_liabilities
            , 'equity': total_company_equity
        }
        
        self.rootRCU = RCU(self.name, 0)
        
        # assigns basic FSLI balances
        self.rootRCU.explictItems(self.company)
        self._updatedDataFrame()
    

    
    
    def generateBalanceSheet(self, verbose=0):
        if verbose==1:
            print 'Splitting by FSLI'
        self._splitBySubFSLI()
        
        if verbose==1:
            print 'Making Fake Accounts'        
        self._generateAccounts()
        
        if verbose==1:
            print 'Splitting by Accounts'        
        self._splitByAccount()
        
        if verbose==1:
            print 'Splitting by Legal Entity'
        self._addLegalEntity()
        
        if verbose==1:
            print 'Updating stored dataframe'        
        self._updatedDataFrame()
        
        if verbose==1:
            print 'generate Data Detail'        
        self._generateDataDetail(verbose)
        

        if verbose==1:
            print 'storingCSV data'        
        self._generateCSV()
        
        
       


       
                
    def _splitBySubFSLI(self):
        for x in self.rootRCU.getItems():
            x.splitItems(self.bs_def[x.name].keys())
            for y in x.getItems():
                y.splitItems(self.bs_def[x.name][y.name].keys(),swing=10000)     
    
    def _generateAccounts(self):
        self.asset_accounts = [str(x).zfill(4) for x in random.sample(xrange(1000,1999), self.maxaccounts)]
        self.liab_accounts = [str(x).zfill(4) for x in random.sample(xrange(2000,2899), self.maxaccounts)]
        self.equity_accounts = [str(x).zfill(4) for x in random.sample(xrange(2900,2999), 10)]

        self.asset_accounts.sort()
        self.liab_accounts.sort()
        self.equity_accounts.sort()
    
    def _splitByAccount(self):
        for x in self.rootRCU.items['assets'].getItems():
            acct_ct = listSplit(self.asset_accounts,len(x.getItems()),2)
            for y,accts in zip(x.getItems(),acct_ct):
                y.splitItems(accts)

       
        for x in self.rootRCU.items['liabilities'].getItems():
            acct_ct = listSplit(self.liab_accounts,len(x.getItems()),2)
            for y,accts in zip(x.getItems(),acct_ct):
                y.splitItems(accts)

        for x in self.rootRCU.items['equity'].getItems():
            acct_ct = listSplit(self.equity_accounts,len(x.getItems()),2)
            for y,accts in zip(x.getItems(),acct_ct):
                y.splitItems(accts)
                
                
    def _addLegalEntity(self):
        self.alpha3 = [x.alpha3 for x in list(pycountry.countries)]
        self.alpha3_entity = []

        #typical transfer pricing legal entities
        le_functions = ['factory', 'distributor', 'sales','rd_center','tax_entity','hq']
        for y in self.alpha3:
            self.alpha3_entity.extend([str(y)+str(z)+'-'+random.sample(le_functions,1)[0] for z in range(1,random.randint(1,5))])

        #total dictionary of possible entities    
    
        random_le = random.sample(self.alpha3_entity,random.randint(30,80))
        random_le.sort()
        self.rootRCU.splitEnds(random_le)

    
    def _generateDataDetail(self,verbose=0):
   
        self.fieldPrefix = random.sample(self.fieldOptions,self.maxfields+1)
        if verbose !=0:
            print 'fields to generate', self.fieldPrefix
        
        #formatting amounts as float
        self.dataf[6] = self.dataf[6].astype(float)
        
        # recursive generator
        # 
        def cartesianRows(inlist):
            if len(inlist) == 1:
                seed = random.randint(2,5)
                return [inlist[0]+'_'+RandomWords().random_words()[0] for i in range(seed)]
            else:
                seed = random.randint(2,5)
                local_list = []
                for y in [inlist[0]+'_'+RandomWords().random_words()[0] for i in range(seed)]:
                    local_list.extend( y +'-'+ z for z in cartesianRows(inlist[1:]))
                return local_list

        
        
        self.product_data = [str(x) for x in cartesianRows(self.fieldPrefix)]
        
        if verbose !=0:
            print 'creating partition matrix'
        #create percentage split matrix
        rowcount_df =self.dataf.shape[0]
        t_matrix = np.zeros([rowcount_df,len(self.product_data)])
        for i in range(rowcount_df):
            t_matrix[i,:] = splitSwingPct(len(self.product_data))

        # pull all the amounts
        current_amounts = self.dataf[6].values
        
        
        if verbose !=0:
            print 'allocating balances'
        # create 'allocated balance* matrix with linear algebra
        alloc_amounts = (current_amounts*t_matrix.T).T
        
        # create a dataframe of the allocated balances.
        # update the columns
        add_dataf = pd.DataFrame(alloc_amounts)
        add_dataf.columns = self.product_data
        

        # merge existing dataframe with product data added
        self.dataf = pd.merge(self.dataf, add_dataf, how = 'inner',left_index=True,right_index=True)        
        
        flat_df = pd.melt(self.dataf,id_vars=[0,1,2,3,4,5,6], var_name='variable')
        
        #splits variable column into all the various fields that were encoded with _
        fieldlkup = {x:i for i,x in enumerate(self.fieldPrefix)}
        for field in self.fieldPrefix:
             flat_df[field] = flat_df['variable'].map(lambda x : x.split('-')[fieldlkup[field]].split('_')[1])
        
        
        # formatting the column names and dropping duplicate columns
        flat_df.rename(columns = {
                0:'company_name'
                , 1:'fsli_l1'
                , 2:'fsli_l2'
                , 3:'fsli_l3'
                , 4:'acct'
                , 5:'legal_entity'
                , 6:'total_amt'
                , 'value':'amount'
            }, inplace=True)
        flat_df.drop(['variable','total_amt'],axis = 1,inplace=True)
        self.dataf = flat_df
        
    def _generateCSV(self):
        self.CSVHeader = self.dataf.columns
        self.CSVList = self.dataf.to_csv()
        

#### Test new class - make a new company

In [87]:
sammys_bakery = FakeCompanyFinancials()

Assets 		37,000,000,000.0
Liabilities 	-32,644,178,495.6
Equity 		-4,355,821,504.39


#### generate the full detail

In [None]:
sammys_bakery.generateBalanceSheet(verbose=1)

Splitting by FSLI
Making Fake Accounts
Splitting by Accounts
Splitting by Legal Entity
Updating stored dataframe
generate Data Detail
fields to generate ['BUSAREA', 'SKU', 'PROJECT', 'PROFITCTR', 'SEGMENT', 'PROGRAM']
creating partition matrix


#### check the dataframe output

In [518]:
sammys_bakery.dataf.head()

Unnamed: 0,company_name,fsli_l1,fsli_l2,fsli_l3,acct,legal_entity,amount,SYSTEM,PROGRAM,SKU,BGROUP
0,fakeCompany,liabilities,current,trade_payables,2050,CHL1-factory,653365.8,coordinate,principle,radars,boiler
1,fakeCompany,liabilities,current,trade_payables,2050,CYP2-rd_center,279467.7,coordinate,principle,radars,boiler
2,fakeCompany,liabilities,current,trade_payables,2050,IMN3-hq,-1216944.0,coordinate,principle,radars,boiler
3,fakeCompany,liabilities,current,trade_payables,2050,SPM3-distributor,-253886.3,coordinate,principle,radars,boiler
4,fakeCompany,liabilities,current,trade_payables,2050,IDN2-factory,843812.2,coordinate,principle,radars,boiler


#### Generate a second company

In [519]:
mikes_fish = FakeCompanyFinancials('mikes_fish')
mikes_fish.generateBalanceSheet(verbose=1)

Assets 		33,000,000,000.0
Liabilities 	-13,406,276,289.1
Equity 		-19,593,723,710.9
Splitting by FSLI
Making Fake Accounts
Splitting by Accounts
Splitting by Legal Entity
Updating stored dataframe
generate Data Detail
fields to generate ['BGROUP', 'SKU', 'COSTCTR', 'BUSAREA']
creating partition matrix
allocating balances
storingCSV data


#### check the dataframe output

In [None]:
mikes_fish.dataf.head()

<a href='#home'>Back to the top</a>