# How to build an interactive dashboard using Quickbooks data # 
*Written by Hassan Syyid @ [hotglue](https://hotglue.xyz)* 

Check out the corresponding [Article]()

## Introduction ##
In this article, I'll show you how to leverage hotglue and cumul.io together to analyze Quickbooks data in an interactive dashboard. This Python script handles processing the data from Quickbooks to produce MRR and Churn metrics. 

In [1]:
import ast
import gluestick as gs
import pandas as pd

import dateutil
from dateutil.relativedelta import *

### Step 1: Read the data ###
Let's start by reading the data.

This example is built on a hotglue environment with data coming from Quickbooks. In hotglue, the data is placed in the local sync-output folder in a CSV format. We will use the gluestick package to read the raw data in the input folder into a dictionary of pandas dataframes using the read_csv_folder function.

By specifying index_cols={'Invoice': 'DocNumber'} the Invoices dataframe will use the DocNumber column as an index. By specifying converters, we can use ast to parse the JSON data in the Line and CustomField columns.

In [2]:
ROOT_DIR = "./sync-output"

qb_lookup_keys = {'key_prop': 'name', 'value_prop': 'value'}

##
# Read and Process invoice data
##
input_data = gs.read_csv_folder(ROOT_DIR,
                                index_cols={'Invoice': 'DocNumber'},
                                converters={'Invoice': {'Line': ast.literal_eval, 'CustomField': ast.literal_eval,
                                                        'Categories': ast.literal_eval}})

##### Take a peek #####
Let's take a look at what data we're working with.

In [3]:
input_data.get("Invoice").head()

Unnamed: 0_level_0,Id,MetaData.LastUpdatedTime,AllowIPNPayment,AllowOnlinePayment,AllowOnlineCreditCardPayment,AllowOnlineACHPayment,MetaData,CustomField,TxnDate,CurrencyRef,CustomerRef,Line,FreeFormAddress,ShipFromAddr,DueDate,TotalAmt,ApplyTaxAfterDiscount,PrintStatus,Balance
DocNumber,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
1037,130,,False,False,False,False,"{""CreateTime"": ""2020-06-20T13:16:17-07:00"", ""L...","[{'DefinitionId': '1', 'Name': 'Crew #', 'Type...",2020-06-20T00:00:00.000000Z,"{""value"": ""USD"", ""name"": ""United States Dollar""}","{""value"": ""24"", ""name"": ""Sonnenschein Family S...","[{'Id': '1', 'LineNum': '1', 'Amount': 275.0, ...",True,,2020-07-20T00:00:00.000000Z,362.07,False,NeedToPrint,362.07
1036,129,,False,False,False,False,"{""CreateTime"": ""2020-06-20T13:15:36-07:00"", ""L...","[{'DefinitionId': '1', 'Name': 'Crew #', 'Type...",2020-06-20T00:00:00.000000Z,"{""value"": ""USD"", ""name"": ""United States Dollar""}","{""value"": ""8"", ""name"": ""0969 Ocean View Road""}","[{'Id': '1', 'LineNum': '1', 'Amount': 50.0, '...",True,,2020-07-20T00:00:00.000000Z,477.5,False,NeedToPrint,477.5
1031,96,,False,False,False,False,"{""CreateTime"": ""2020-06-19T13:30:49-07:00"", ""L...","[{'DefinitionId': '1', 'Name': 'Crew #', 'Type...",2020-04-05T00:00:00.000000Z,"{""value"": ""USD"", ""name"": ""United States Dollar""}","{""value"": ""8"", ""name"": ""0969 Ocean View Road""}","[{'Id': '1', 'LineNum': '1', 'Amount': 90.0, '...",True,,2020-05-05T00:00:00.000000Z,387.0,False,NeedToPrint,0.0
1004,12,,False,False,False,False,"{""CreateTime"": ""2020-06-17T15:04:04-07:00"", ""L...","[{'DefinitionId': '1', 'Name': 'Crew #', 'Type...",2020-06-08T00:00:00.000000Z,"{""value"": ""USD"", ""name"": ""United States Dollar""}","{""value"": ""3"", ""name"": ""Cool Cars""}","[{'Id': '1', 'LineNum': '1', 'Amount': 20.0, '...",False,,2020-07-08T00:00:00.000000Z,2369.52,False,NotSet,0.0
1035,119,,False,False,False,False,"{""CreateTime"": ""2020-06-20T12:57:24-07:00"", ""L...","[{'DefinitionId': '1', 'Name': 'Crew #', 'Type...",2020-06-20T00:00:00.000000Z,"{""value"": ""USD"", ""name"": ""United States Dollar""}","{""value"": ""17"", ""name"": ""Mark Cho""}","[{'Id': '1', 'LineNum': '1', 'Amount': 275.0, ...",True,,2020-07-20T00:00:00.000000Z,314.28,False,NeedToPrint,314.28


### Step 2: Clean the data ###

#### Extract information ####

The `Line`, `Metadata`, `CurrencyRef`, and `CustomerRef` columns are actually serialized JSON objects provided by Quickbooks with several useful elements in it. We'll need to start by **flattening** the JSON and then **exploding** into unique columns so we can work with the data.

Again, we'll use the [gluestick](https://pypi.org/project/gluestick/) package to accomplish this. The `explode_json_to_rows` function handles the flattening and exploding in one step. To avoid exploding too many levels of this object, we'll specify `max_level=1`

Here is a snippet from one to give you an idea.
```json
[{
	'Id': '1',
	'LineNum': '1',
	'Amount': 275.0,
	'DetailType': 'SalesItemLineDetail',
	'SalesItemLineDetail': {
		'ItemRef': {
			'value': '5',
			'name': 'Rock Fountain'
		},
		'ItemAccountRef': {
			'value': '79',
			'name': 'Sales of Product Income'
		},
		'TaxCodeRef': {
			'value': 'TAX',
			'name': None
		}
	},
	'SubTotalLineDetail': None,
	'DiscountLineDetail': None
}]
```

#### Rename the columns ####

Once we finish this process, we'll rename the generated columns to more readable names.
```
CustomerRef.value -> CustomerId
CustomerRef.name -> Customer
MetaData.LastUpdatedTime -> LastUpdated
MetaData.CreateTime -> CreatedOn
CurrencyRef.name -> Currency
CurrencyRef.value -> CurrencyCode
```

In [6]:
invoices = (input_data['Invoice']
            .pipe(lambda x: x.drop('MetaData.LastUpdatedTime', 1))
            .pipe(gs.explode_json_to_rows, "MetaData", max_level=1)
            .pipe(gs.explode_json_to_rows, "Line", max_level=1)
            .pipe(gs.explode_json_to_rows, "CurrencyRef", max_level=1)
            .pipe(gs.explode_json_to_rows, "CustomerRef", max_level=1)
            .pipe(gs.json_tuple_to_cols, 'Line.SalesItemLineDetail.ItemRef',
                  col_config={'cols': {'key_prop': 'Item', 'value_prop': 'Item Id'},
                              'look_up': qb_lookup_keys})
            .pipe(gs.json_tuple_to_cols, 'Line.SalesItemLineDetail.ItemAccountRef',
                  col_config={'cols': {'key_prop': 'Item Ref', 'value_prop': 'Item Ref Id'},
                              'look_up': qb_lookup_keys})
            .pipe(gs.json_tuple_to_cols, 'Line.SalesItemLineDetail.TaxCodeRef',
                  col_config={'cols': {'key_prop': 'Tax Desc', 'value_prop': 'Tax Code'},
                              'look_up': qb_lookup_keys})
            .pipe(gs.json_tuple_to_cols, 'Line.DiscountLineDetail.DiscountAccountRef',
                  col_config={'cols': {'key_prop': 'Discount Details', 'value_prop': 'Discount %'},
                              'look_up': qb_lookup_keys})
            .pipe(gs.explode_json_to_cols, 'CustomField', reducer=gs.array_to_dict_reducer('Name', 'StringValue'))
            .pipe(lambda x: x.rename(columns={'CustomerRef.value': 'CustomerId', 'CustomerRef.name': 'Customer',
                                              'MetaData.LastUpdatedTime': 'LastUpdated',
                                              'MetaData.CreateTime': 'CreatedOn', 'CurrencyRef.name': 'Currency',
                                              'CurrencyRef.value': 'CurrencyCode'}))
            .pipe(lambda x: x[x['Line.DetailType'] == 'SalesItemLineDetail'])
            .pipe(lambda x: x.loc[:, 'Id':])
            )
invoices.columns

Index(['Id', 'AllowIPNPayment', 'AllowOnlinePayment',
       'AllowOnlineCreditCardPayment', 'AllowOnlineACHPayment', 'TxnDate',
       'FreeFormAddress', 'ShipFromAddr', 'DueDate', 'TotalAmt',
       'ApplyTaxAfterDiscount', 'PrintStatus', 'Balance', 'CreatedOn',
       'LastUpdated', 'Line.Id', 'Line.LineNum', 'Line.Amount',
       'Line.DetailType', 'Line.SubTotalLineDetail', 'Line.DiscountLineDetail',
       'Line.SalesItemLineDetail', 'CurrencyCode', 'Currency', 'CustomerId',
       'Customer', 'Item', 'Item Id', 'Item Ref', 'Item Ref Id', 'Tax Desc',
       'Tax Code', 'Discount Details', 'Discount %', 'CustomField.Crew #'],
      dtype='object')

#### Drop duplicates ####
To avoid counting the same invoice multiple times, we drop any duplicates.

In [7]:
invoices = invoices.drop_duplicates(subset='Id')
invoices.shape

(30, 35)

In [9]:
invoices.head()

Unnamed: 0_level_0,Id,AllowIPNPayment,AllowOnlinePayment,AllowOnlineCreditCardPayment,AllowOnlineACHPayment,TxnDate,FreeFormAddress,ShipFromAddr,DueDate,TotalAmt,...,Customer,Item,Item Id,Item Ref,Item Ref Id,Tax Desc,Tax Code,Discount Details,Discount %,CustomField.Crew #
DocNumber,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1001,9,False,False,False,False,2020-06-17T00:00:00.000000Z,True,,2020-07-17T00:00:00.000000Z,108.0,...,Amy's Bird Sanctuary,Gardening,6,Landscaping Services,45,,TAX,,,
1002,10,False,False,False,False,2020-03-05T00:00:00.000000Z,False,,2020-04-04T00:00:00.000000Z,175.0,...,Bill's Windsurf Shop,Gardening,6,Landscaping Services,45,,NON,,,103.0
1004,12,False,False,False,False,2020-06-08T00:00:00.000000Z,False,,2020-07-08T00:00:00.000000Z,2369.52,...,Cool Cars,Sprinkler Heads,16,Sales of Product Income,79,,TAX,,,
1005,13,False,False,False,False,2020-06-11T00:00:00.000000Z,True,,2020-07-11T00:00:00.000000Z,54.0,...,55 Twin Lane,Gardening,6,Landscaping Services,45,,TAX,,,
1006,14,False,False,False,False,2020-05-12T00:00:00.000000Z,True,,2020-06-11T00:00:00.000000Z,86.4,...,55 Twin Lane,Gardening,6,Landscaping Services,45,,TAX,,,


### Step 3: Transform the data ###

#### Get start/end dates ####
We'll convert the `TxnDate` to a `datetime` and name it `RevStartDate` so we can compute a `RevEndDate` (we are assuming it's 12 months after the `RevStartDate`). This is needed to compute MRR.

In [10]:
lineitems = invoices.loc[:, ['TxnDate', 'CustomerId', 'Line.Id', 'Line.Amount',  'Item Ref', 'Discount %']].astype({'TxnDate':'datetime64'})
lineitems.dtypes

TxnDate        datetime64[ns]
CustomerId             object
Line.Id                object
Line.Amount           float64
Item Ref               object
Discount %             object
dtype: object

In [11]:
lineitems['RevEndDate'] = lineitems['TxnDate'].apply(lambda x: x + relativedelta(months=+12))
lineitems = lineitems.rename(columns={'TxnDate': 'RevStartDate'})
lineitems.head()

Unnamed: 0_level_0,RevStartDate,CustomerId,Line.Id,Line.Amount,Item Ref,Discount %,RevEndDate
DocNumber,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1001,2020-06-17,1,1,100.0,Landscaping Services,,2021-06-17
1002,2020-03-05,2,1,140.0,Landscaping Services,,2021-03-05
1004,2020-06-08,3,1,20.0,Sales of Product Income,,2021-06-08
1005,2020-06-11,9,1,50.0,Landscaping Services,,2021-06-11
1006,2020-05-12,9,1,80.0,Landscaping Services,,2021-05-12


In [12]:
def compute_date_range(x):
    return pd.date_range(start=x.RevStartDate, end=x.RevEndDate, freq='D')

#### Compute the daily revenue ####
Before we compute MRR, we'll compute the daily revenue based on the active days for each item. 

In [13]:
lineitems['ActiveDays'] = lineitems.apply(compute_date_range, axis=1)
lineitems.head()

Unnamed: 0_level_0,RevStartDate,CustomerId,Line.Id,Line.Amount,Item Ref,Discount %,RevEndDate,ActiveDays
DocNumber,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1001,2020-06-17,1,1,100.0,Landscaping Services,,2021-06-17,"DatetimeIndex(['2020-06-17', '2020-06-18', '20..."
1002,2020-03-05,2,1,140.0,Landscaping Services,,2021-03-05,"DatetimeIndex(['2020-03-05', '2020-03-06', '20..."
1004,2020-06-08,3,1,20.0,Sales of Product Income,,2021-06-08,"DatetimeIndex(['2020-06-08', '2020-06-09', '20..."
1005,2020-06-11,9,1,50.0,Landscaping Services,,2021-06-11,"DatetimeIndex(['2020-06-11', '2020-06-12', '20..."
1006,2020-05-12,9,1,80.0,Landscaping Services,,2021-05-12,"DatetimeIndex(['2020-05-12', '2020-05-13', '20..."


In [10]:
lineitems['Days'] =  lineitems['RevEndDate'] - lineitems['RevStartDate']
lineitems['Days'] = lineitems['Days'].apply(lambda x: x.days)
lineitems.head()


Unnamed: 0_level_0,RevStartDate,CustomerId,Line.Id,Line.Amount,Item Ref,Discount %,RevEndDate,ActiveDays,Days
DocNumber,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1001,2020-06-17,1,1,100.0,Landscaping Services,,2021-06-17,"DatetimeIndex(['2020-06-17', '2020-06-18', '20...",365
1002,2020-03-05,2,1,140.0,Landscaping Services,,2021-03-05,"DatetimeIndex(['2020-03-05', '2020-03-06', '20...",365
1004,2020-06-08,3,1,20.0,Sales of Product Income,,2021-06-08,"DatetimeIndex(['2020-06-08', '2020-06-09', '20...",365
1005,2020-06-11,9,1,50.0,Landscaping Services,,2021-06-11,"DatetimeIndex(['2020-06-11', '2020-06-12', '20...",365
1006,2020-05-12,9,1,80.0,Landscaping Services,,2021-05-12,"DatetimeIndex(['2020-05-12', '2020-05-13', '20...",365


In [11]:
lineitems['Revenue'] = lineitems['Line.Amount'] / lineitems['Days']
lineitems.head()

Unnamed: 0_level_0,RevStartDate,CustomerId,Line.Id,Line.Amount,Item Ref,Discount %,RevEndDate,ActiveDays,Days,Revenue
DocNumber,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1001,2020-06-17,1,1,100.0,Landscaping Services,,2021-06-17,"DatetimeIndex(['2020-06-17', '2020-06-18', '20...",365,0.273973
1002,2020-03-05,2,1,140.0,Landscaping Services,,2021-03-05,"DatetimeIndex(['2020-03-05', '2020-03-06', '20...",365,0.383562
1004,2020-06-08,3,1,20.0,Sales of Product Income,,2021-06-08,"DatetimeIndex(['2020-06-08', '2020-06-09', '20...",365,0.054795
1005,2020-06-11,9,1,50.0,Landscaping Services,,2021-06-11,"DatetimeIndex(['2020-06-11', '2020-06-12', '20...",365,0.136986
1006,2020-05-12,9,1,80.0,Landscaping Services,,2021-05-12,"DatetimeIndex(['2020-05-12', '2020-05-13', '20...",365,0.219178


In [12]:
revenue = lineitems.loc[: , ['ActiveDays', 'Revenue', 'CustomerId', 'Item Ref']]
revenue = revenue.explode('ActiveDays')


In [13]:
daily_revenue = revenue.set_index('ActiveDays')
daily_revenue = daily_revenue.loc[:'2020-05-31',:]
daily_revenue.head()

Unnamed: 0_level_0,Revenue,CustomerId,Item Ref
ActiveDays,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-03-05,0.383562,2,Landscaping Services
2020-03-06,0.383562,2,Landscaping Services
2020-03-07,0.383562,2,Landscaping Services
2020-03-08,0.383562,2,Landscaping Services
2020-03-09,0.383562,2,Landscaping Services


#### Compute the monthly revenue ####
Now that we have the MRR, we'll compute the monthly revenue by summing the daily revenue figures across each Customer and Item. 

In [16]:
monthly_revenue = daily_revenue.groupby(by=['CustomerId', 'Item Ref']).resample("M").sum()['Revenue']
monthly_revenue

CustomerId  Item Ref                                                          ActiveDays
1           Landscaping Services                                              2020-05-31     9.534247
            Landscaping Services:Job Materials:Plants and Soil                2020-05-31     1.232877
2           Landscaping Services                                              2020-03-31    10.356164
                                                                              2020-04-30    11.506849
                                                                              2020-05-31    15.863014
5           Landscaping Services:Job Materials:Fountains and Garden Lighting  2020-05-31     5.958904
8           Landscaping Services                                              2020-03-31     3.698630
                                                                              2020-04-30    10.520548
                                                                              2020-05-31    11.

In [17]:
monthly_revenue = monthly_revenue.reset_index(level=[0,1])
monthly_revenue

Unnamed: 0_level_0,CustomerId,Item Ref,Revenue
ActiveDays,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-05-31,1,Landscaping Services,9.534247
2020-05-31,1,Landscaping Services:Job Materials:Plants and ...,1.232877
2020-03-31,2,Landscaping Services,10.356164
2020-04-30,2,Landscaping Services,11.506849
2020-05-31,2,Landscaping Services,15.863014
2020-05-31,5,Landscaping Services:Job Materials:Fountains a...,5.958904
2020-03-31,8,Landscaping Services,3.69863
2020-04-30,8,Landscaping Services,10.520548
2020-05-31,8,Landscaping Services,11.890411
2020-05-31,9,Landscaping Services,10.342466


#### Get prior MRR ####
Since we want to analyze trends of MRR, we'll also create a prior MRR, shifted back a month. This will be useful when creating our cumul.io dashboard

In [18]:
monthly_revenue.index.rename("Month", inplace=True)
monthly_revenue.rename(columns={"Revenue": "MRR"}, inplace=True)
monthly_revenue[monthly_revenue['CustomerId']==8].sort_values(by='Item Ref').sort_index()

Unnamed: 0_level_0,CustomerId,Item Ref,MRR
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-03-31,8,Landscaping Services,3.69863
2020-04-30,8,Landscaping Services,10.520548
2020-05-31,8,Landscaping Services,11.890411


In [19]:
prior_month_revenue = monthly_revenue.shift(freq='M')
prior_month_revenue[prior_month_revenue['CustomerId']==8].sort_values(by='Item Ref').sort_index()

Unnamed: 0_level_0,CustomerId,Item Ref,MRR
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-04-30,8,Landscaping Services,3.69863
2020-05-31,8,Landscaping Services,10.520548
2020-06-30,8,Landscaping Services,11.890411


#### Combine into one table ####
Now that we have all our MRR metrics, we'll combine it into one table (dataframe).

In [20]:
mrr = monthly_revenue.merge(prior_month_revenue, how='outer', on=['Month', 'CustomerId','Item Ref'], suffixes=['_Curr','_Pri'], sort=True).fillna(0)
mrr[mrr['CustomerId']==1]

Unnamed: 0_level_0,CustomerId,Item Ref,MRR_Curr,MRR_Pri
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2020-05-31,1,Landscaping Services,9.534247,0.0
2020-05-31,1,Landscaping Services:Job Materials:Plants and ...,1.232877,0.0
2020-06-30,1,Landscaping Services,0.0,9.534247
2020-06-30,1,Landscaping Services:Job Materials:Plants and ...,0.0,1.232877


In [21]:
mrr['Period'] = mrr.index 
customer_first = mrr.sort_values('Month').groupby(by='CustomerId')[['CustomerId', 'Period']].first()
customer_first.rename(columns={'Period': 'First_Period'}, inplace=True)
customer_first

Unnamed: 0_level_0,CustomerId,First_Period
CustomerId,Unnamed: 1_level_1,Unnamed: 2_level_1
1,1,2020-05-31
2,2,2020-03-31
5,5,2020-05-31
8,8,2020-03-31
9,9,2020-05-31
12,12,2020-05-31
13,13,2020-05-31
16,16,2020-05-31
20,20,2020-04-30


In [22]:
mrr = mrr.join(customer_first,on='CustomerId', how='left', rsuffix='_F')
mrr.drop(columns=['Period', 'CustomerId_F'], inplace=True)


In [23]:
mrr[mrr['CustomerId']==8]

Unnamed: 0_level_0,CustomerId,Item Ref,MRR_Curr,MRR_Pri,First_Period
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-03-31,8,Landscaping Services,3.69863,0.0,2020-03-31
2020-04-30,8,Landscaping Services,10.520548,3.69863,2020-03-31
2020-05-31,8,Landscaping Services,11.890411,10.520548,2020-03-31
2020-06-30,8,Landscaping Services,0.0,11.890411,2020-03-31


#### Save the MRR metrics ####
Now that we have the final metrics, we'll write this to a CSV.

In [24]:
mrr.sort_values(by=['Item Ref','CustomerId'])
mrr[(mrr['CustomerId']==1)].sort_values(by=['Item Ref', 'Month'])
f = mrr[['MRR_Curr', 'MRR_Pri']].groupby(by='Month').sum()

f.to_csv("./etl-output/mrr.csv")

#### Calculate the churn ####
Now we will calculate the churn month over month (the percentage of accounts that cancel or choose not to renew their subscriptions). We will start by getting MRR on a customer level so we can analyze each one.

In [25]:
#churn calc

move_col_to_front = lambda df, col: df[[col]+list(set(df.columns).difference([col]))]

customer_mrr = mrr.groupby(by=['CustomerId', 'Month'])[['MRR_Pri', 'MRR_Curr']].sum()
customer_mrr['CustomerId'] = customer_mrr.index.get_level_values('CustomerId')
customer_mrr = move_col_to_front(customer_mrr, "CustomerId")   
customer_mrr = customer_mrr.set_index(customer_mrr.index.get_level_values(1))
customer_mrr

Unnamed: 0_level_0,CustomerId,MRR_Pri,MRR_Curr
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-05-31,1,0.0,10.767123
2020-06-30,1,10.767123,0.0
2020-03-31,2,0.0,10.356164
2020-04-30,2,10.356164,11.506849
2020-05-31,2,11.506849,15.863014
2020-06-30,2,15.863014,0.0
2020-05-31,5,0.0,5.958904
2020-06-30,5,5.958904,0.0
2020-03-31,8,0.0,3.69863
2020-04-30,8,3.69863,10.520548


#### Compute New, Loss, Expansion, Contraction ####
Based on the customer MRR, we can easily compute the change in our accounts. "New" is where MRR_Pri is 0, "Loss" is where MRR_Curr is 0, "Expansion" is when MRR_Curr > MRR_Pri, and "Contraction" is when MRR_Curr < MRR_Pri.

In [26]:
customer_mrr['New'] = customer_mrr.apply(axis=1, func=lambda x:  x['MRR_Curr'] if x['MRR_Curr']!=0 and x['MRR_Pri']==0 else 0)
customer_mrr['Loss'] = customer_mrr.apply(axis=1, func=lambda x:  -1.0 * x['MRR_Pri'] if x['MRR_Curr']==0 else 0)
customer_mrr['Expansion'] = customer_mrr.apply(axis=1, func=lambda x:  x['MRR_Curr'] -x['MRR_Pri'] if x['MRR_Curr'] > x['MRR_Pri'] and x['MRR_Pri'] != 0 else 0)
customer_mrr['Contraction']= customer_mrr.apply(axis=1, func=lambda x:  x['MRR_Curr'] -x['MRR_Pri'] if x['MRR_Curr'] < x['MRR_Pri'] and x['MRR_Curr'] != 0 else 0)
customer_mrr = customer_mrr.loc[:'2020-05-31', :]
customer_mrr[(customer_mrr['Expansion'] > 0)].loc['2020-05-31'].sort_values(by='Expansion', ascending=False)
#a[a['CustomerId']==461]
#invoices[invoices['CustomerId']==1066]['Customer']
invoices[invoices['Customer'].str.contains('^Pelo', na=False)][['TotalAmt', 'TxnDate', 'Id']]

Unnamed: 0_level_0,TotalAmt,TxnDate,Id
DocNumber,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1


#### Save the churn metrics ####
Now that we have generated the churn metrics, we write to to a CSV.

In [28]:
gd = customer_mrr.groupby(by=['Month'])[['New', 'Expansion', 'Loss', 'Contraction']].sum()

gd.to_csv("./etl-output/churn.csv")