# How to integrate Quickbooks with Python Sample #
*Written by Hassan Syyid @ [hotglue](https://hotglue.xyz)*

Check out the corresponding [Medium Article]()

## Introduction ##
In this article, I'll show you how to leverage Singer's tap-quickbooks to extract data from Quickbooks. From there I'll walk you through how to parse the JSON output data from Singer using target-csv and standardize it using a simple Python script.

In [1]:
import gluestick as gs
import pandas as pd

### Step 1: Read the data ###
Let's start by reading the data. 

We will use the [gluestick](https://pypi.org/project/gluestick/) package to read the raw data in the input folder into a dictionary of pandas dataframes using the `read_csv_folder` function.

By specifying `index_cols={'Invoice': 'Id'}` the `Lead` dataframe will use the `Id` column as an index.

In [2]:
# standard directory for hotglue
ROOT_DIR = "./sync-output"

# Read input data
input_data = gs.read_csv_folder(ROOT_DIR, index_cols={'Invoice': 'Id'})

##### Take a peek #####
Let's take a look at what data we're working with.

In [3]:
input_df = input_data['Invoice']
input_df.head()

Unnamed: 0_level_0,AllowIPNPayment,AllowOnlinePayment,AllowOnlineCreditCardPayment,AllowOnlineACHPayment,MetaData__CreateTime,MetaData__LastUpdatedTime,CustomField,DocNumber,TxnDate,CurrencyRef__value,...,CustomerRef__value,CustomerRef__name,Line,FreeFormAddress,ShipFromAddr,DueDate,TotalAmt,ApplyTaxAfterDiscount,PrintStatus,Balance
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
130,False,False,False,False,2020-06-20T20:16:17.000000Z,2020-06-20T20:16:17.000000Z,"[{'DefinitionId': '1', 'Name': 'Crew #', 'Type...",1037,2020-06-20T00:00:00.000000Z,USD,...,24,Sonnenschein Family Store,"[{'Id': '1', 'LineNum': '1', 'Amount': 275.0, ...",True,,2020-07-20T00:00:00.000000Z,362.07,False,NeedToPrint,362.07
129,False,False,False,False,2020-06-20T20:15:36.000000Z,2020-06-20T20:15:36.000000Z,"[{'DefinitionId': '1', 'Name': 'Crew #', 'Type...",1036,2020-06-20T00:00:00.000000Z,USD,...,8,0969 Ocean View Road,"[{'Id': '1', 'LineNum': '1', 'Amount': 50.0, '...",True,,2020-07-20T00:00:00.000000Z,477.5,False,NeedToPrint,477.5
96,False,False,False,False,2020-06-19T20:30:49.000000Z,2020-06-20T20:13:33.000000Z,"[{'DefinitionId': '1', 'Name': 'Crew #', 'Type...",1031,2020-04-05T00:00:00.000000Z,USD,...,8,0969 Ocean View Road,"[{'Id': '1', 'LineNum': '1', 'Amount': 90.0, '...",True,,2020-05-05T00:00:00.000000Z,387.0,False,NeedToPrint,0.0
12,False,False,False,False,2020-06-17T22:04:04.000000Z,2020-06-20T19:59:21.000000Z,"[{'DefinitionId': '1', 'Name': 'Crew #', 'Type...",1004,2020-06-08T00:00:00.000000Z,USD,...,3,Cool Cars,"[{'Id': '1', 'LineNum': '1', 'Amount': 20.0, '...",False,,2020-07-08T00:00:00.000000Z,2369.52,False,NotSet,0.0
119,False,False,False,False,2020-06-20T19:57:24.000000Z,2020-06-20T19:57:24.000000Z,"[{'DefinitionId': '1', 'Name': 'Crew #', 'Type...",1035,2020-06-20T00:00:00.000000Z,USD,...,17,Mark Cho,"[{'Id': '1', 'LineNum': '1', 'Amount': 275.0, ...",True,,2020-07-20T00:00:00.000000Z,314.28,False,NeedToPrint,314.28


### Step 2: Filter the columns ###
Let's clean up the data by only selecting the columns we want.

In [4]:
# Let's only select the columns we want 
invoices = input_df[["CustomerRef__name", "TotalAmt", "Balance", "DueDate"]]
invoices.head()

Unnamed: 0_level_0,CustomerRef__name,TotalAmt,Balance,DueDate
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
130,Sonnenschein Family Store,362.07,362.07,2020-07-20T00:00:00.000000Z
129,0969 Ocean View Road,477.5,477.5,2020-07-20T00:00:00.000000Z
96,0969 Ocean View Road,387.0,0.0,2020-05-05T00:00:00.000000Z
12,Cool Cars,2369.52,0.0,2020-07-08T00:00:00.000000Z
119,Mark Cho,314.28,314.28,2020-07-20T00:00:00.000000Z


## Conclusion ##
Our final data looks something like below. In this sample we didn't do any extensive ETL operations - this is just a starting point for manipulating data from tap-quickbooks. Feel free to check out the open source [hotglue recipes](https://github.com/hotgluexyz/recipes) for more samples in the future.

In [5]:
invoices.head()

Unnamed: 0_level_0,CustomerRef__name,TotalAmt,Balance,DueDate
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
130,Sonnenschein Family Store,362.07,362.07,2020-07-20T00:00:00.000000Z
129,0969 Ocean View Road,477.5,477.5,2020-07-20T00:00:00.000000Z
96,0969 Ocean View Road,387.0,0.0,2020-05-05T00:00:00.000000Z
12,Cool Cars,2369.52,0.0,2020-07-08T00:00:00.000000Z
119,Mark Cho,314.28,314.28,2020-07-20T00:00:00.000000Z
