## Reconcile API Usage with CSV Usage

Load the usage as reported by the API and reconcile with the usage as given on the billing CSV.

This notebook takes the results from two other notebooks and reconciles the reported usage 
differences. One of the source notebooks, [Load Azure Usage CSV.ipynb](Load%20Azure%20Usage%20CSV.ipynb), uses the billing CSV export feature of the Azure admin portal. 
The other, [Load Azure Daily Usage for Month to Match Invoice.ipynb](Load%20Azure%20Daily%20Usage%20for%20Month%20to%20Match%20Invoice.ipynb),  uses an Azure API. Currently, this reconcilliation reports some differences for the two 
sources that are likely due to discounts or included usage credits for the affected resources. 

In [None]:
import pickle
import pandas as pd
import numpy as np

In [None]:
# load data processed from related notebooks

# data from CSV downloaded from Azure portal
df_daily_usage_csv = pickle.load( open( "df_daily_usage.p", "rb" ) )

# data from Azure billing API Azure
df_invoice_daily_usage = pickle.load( open( "df_invoice_daily_usage.p", "rb" ) )

In [None]:
df_daily_usage_csv.columns

In [None]:
df_invoice_daily_usage.columns

In [None]:
# compare the two datasets joining on resource and usage date
result = pd.merge(df_daily_usage_csv,
    df_invoice_daily_usage,
    left_on=['Usage Date','Meter Id'], right_on = ['usageStartTime','meterId'],
    how='outer', 
    indicator=True)

In [None]:
len(result)

In [None]:
# this is a kludge
# I can't figure out how the join is delivering multiple rows on the merge, so I'm dropping duplicates
result = result.drop_duplicates(subset=['Meter Id', 'Usage Date', 'usageStartTime', 'meterId'])
len(result)

In [None]:
result.dtypes

In [None]:
# add a percent differnce column to compare the usage 
result['pct_diff'] = result.apply(lambda row: (row['Consumed Quantity'] - row['quantity']) / row['quantity'], axis=1)

In [None]:
# The resouces and dates match if all rows '_merge' is both. _merge would 
# be 'right only' or 'left only' if a given resouce-date tuple is only in the CSV or
 # only in the API
result[['Usage Date', 'Meter Id', 'usageStartTime', 'meterId', 'Consumed Quantity', 'quantity', 'pct_diff', '_merge']]

In [None]:
# show results with percent difference between CSV source and API source > 0.0
result.loc[result['pct_diff'] > 0.000001][['Usage Date', 'Meter Id', 'usageStartTime', 'meterId', 'Consumed Quantity', 'quantity', 'pct_diff', '_merge']]

In [None]:
one_res = result.loc[result['meterId'] == '65d4ded2-41ae-43a8-bb68-3c200e1ba864'][['Usage Date', 'Meter Id', 'usageStartTime', 'meterId', 'Consumed Quantity', 'quantity', 'pct_diff']]
one_res