## Reconcile API Usage with CSV Usage

Load the usage as reported by the API and reconcile with the usage as given on the billing CSV.

This notebook takes the results from two other notebooks and reconciles the reported usage 
differences. One of the source notebooks, [Load Azure Usage CSV.ipynb](Load%20Azure%20Usage%20CSV.ipynb), uses the billing CSV export feature of the Azure admin portal. 
The other, [Load Azure Daily Usage for Month to Match Invoice.ipynb](Load%20Azure%20Daily%20Usage%20for%20Month%20to%20Match%20Invoice.ipynb),  uses an Azure API. Currently, this reconcilliation reports some differences for the two 
sources that are likely due to discounts or included usage credits for the affected resources. 

In [1]:
import pickle
import pandas as pd
import numpy as np

In [2]:
# load data processed from related notebooks

# data from CSV downloaded from Azure portal
df_daily_usage_csv = pickle.load( open( "df_daily_usage.p", "rb" ) )

# data from Azure billing API Azure
df_invoice_daily_usage = pickle.load( open( "df_invoice_daily_usage.p", "rb" ) )

In [3]:
df_daily_usage_csv.columns

Index(['Meter Id', 'Usage Date', 'Meter Category', 'Meter Region',
       'Meter Name', 'Meter Sub-category', 'Instance Id', 'Unit',
       'Resource Location', 'Resource Group', 'Consumed Service',
       'Consumed Quantity'],
      dtype='object')

In [4]:
df_invoice_daily_usage.columns

Index(['meterId', 'usageStartTime', 'usageEndTime', 'meterCategory',
       'meterRegion', 'meterName', 'meterSubCategory', 'subscriptionId',
       'unit', 'quantity'],
      dtype='object')

In [5]:
# compare the two datasets joining on resource and usage date
result = pd.merge(df_daily_usage_csv,
    df_invoice_daily_usage,
    left_on=['Usage Date','Meter Id'], right_on = ['usageStartTime','meterId'],
    how='outer', 
    indicator=True)

In [6]:
len(result)

474

In [7]:
result.dtypes

Meter Id                      object
Usage Date            datetime64[ns]
Meter Category              category
Meter Region                category
Meter Name                  category
Meter Sub-category          category
Instance Id                 category
Unit                        category
Resource Location           category
Resource Group              category
Consumed Service            category
Consumed Quantity            float64
meterId                       object
usageStartTime        datetime64[ns]
usageEndTime          datetime64[ns]
meterCategory               category
meterRegion                 category
meterName                   category
meterSubCategory            category
subscriptionId              category
unit                        category
quantity                     float64
_merge                      category
dtype: object

In [8]:
# add a percent differnce column to compare the usage 
result['pct_diff'] = result.apply(lambda row: (row['Consumed Quantity'] - row['quantity']) / row['quantity'], axis=1)

In [9]:
# The resouces and dates match if all rows '_merge' is both. _merge would 
# be 'right only' or 'left only' if a given resouce-date tuple is only in the CSV or
 # only in the API
result[['Usage Date', 'Meter Id', 'usageStartTime', 'meterId', 'Consumed Quantity', 'quantity', 'pct_diff', '_merge']]

Unnamed: 0,Usage Date,Meter Id,usageStartTime,meterId,Consumed Quantity,quantity,pct_diff,_merge
0,2018-06-20,0c4d13cb-7134-4be8-a6fa-a52fdcb87e4c,2018-06-20,0c4d13cb-7134-4be8-a6fa-a52fdcb87e4c,0.000311,0.000311,-1.743090e-16,both
1,2018-06-21,0c4d13cb-7134-4be8-a6fa-a52fdcb87e4c,2018-06-21,0c4d13cb-7134-4be8-a6fa-a52fdcb87e4c,0.001250,0.001250,0.000000e+00,both
2,2018-06-22,0c4d13cb-7134-4be8-a6fa-a52fdcb87e4c,2018-06-22,0c4d13cb-7134-4be8-a6fa-a52fdcb87e4c,0.001251,0.001251,0.000000e+00,both
3,2018-06-23,0c4d13cb-7134-4be8-a6fa-a52fdcb87e4c,2018-06-23,0c4d13cb-7134-4be8-a6fa-a52fdcb87e4c,0.001251,0.001251,0.000000e+00,both
4,2018-06-24,0c4d13cb-7134-4be8-a6fa-a52fdcb87e4c,2018-06-24,0c4d13cb-7134-4be8-a6fa-a52fdcb87e4c,0.001251,0.001251,0.000000e+00,both
5,2018-06-25,0c4d13cb-7134-4be8-a6fa-a52fdcb87e4c,2018-06-25,0c4d13cb-7134-4be8-a6fa-a52fdcb87e4c,0.001251,0.001251,0.000000e+00,both
6,2018-06-26,0c4d13cb-7134-4be8-a6fa-a52fdcb87e4c,2018-06-26,0c4d13cb-7134-4be8-a6fa-a52fdcb87e4c,0.001251,0.001251,0.000000e+00,both
7,2018-06-27,0c4d13cb-7134-4be8-a6fa-a52fdcb87e4c,2018-06-27,0c4d13cb-7134-4be8-a6fa-a52fdcb87e4c,0.001251,0.001251,0.000000e+00,both
8,2018-06-28,0c4d13cb-7134-4be8-a6fa-a52fdcb87e4c,2018-06-28,0c4d13cb-7134-4be8-a6fa-a52fdcb87e4c,0.001251,0.001251,0.000000e+00,both
9,2018-06-29,0c4d13cb-7134-4be8-a6fa-a52fdcb87e4c,2018-06-29,0c4d13cb-7134-4be8-a6fa-a52fdcb87e4c,0.001251,0.001251,0.000000e+00,both


In [10]:
# show results with percent difference between CSV source and API source > 0.0
result.loc[result['pct_diff'] > 0.000001][['Usage Date', 'Meter Id', 'usageStartTime', 'meterId', 'Consumed Quantity', 'quantity', 'pct_diff', '_merge']]

Unnamed: 0,Usage Date,Meter Id,usageStartTime,meterId,Consumed Quantity,quantity,pct_diff,_merge


In [11]:
one_res = result.loc[result['meterId'] == '65d4ded2-41ae-43a8-bb68-3c200e1ba864'][['Usage Date', 'Meter Id', 'usageStartTime', 'meterId', 'Consumed Quantity', 'quantity', 'pct_diff']]
one_res

Unnamed: 0,Usage Date,Meter Id,usageStartTime,meterId,Consumed Quantity,quantity,pct_diff
182,2018-06-16,65d4ded2-41ae-43a8-bb68-3c200e1ba864,2018-06-16,65d4ded2-41ae-43a8-bb68-3c200e1ba864,76.0,76.0,0.0
183,2018-06-17,65d4ded2-41ae-43a8-bb68-3c200e1ba864,2018-06-17,65d4ded2-41ae-43a8-bb68-3c200e1ba864,432.0,432.0,0.0
184,2018-06-18,65d4ded2-41ae-43a8-bb68-3c200e1ba864,2018-06-18,65d4ded2-41ae-43a8-bb68-3c200e1ba864,432.0,432.0,0.0
185,2018-06-19,65d4ded2-41ae-43a8-bb68-3c200e1ba864,2018-06-19,65d4ded2-41ae-43a8-bb68-3c200e1ba864,334.0,334.0,0.0
186,2018-06-20,65d4ded2-41ae-43a8-bb68-3c200e1ba864,2018-06-20,65d4ded2-41ae-43a8-bb68-3c200e1ba864,96.0,96.0,0.0
187,2018-06-21,65d4ded2-41ae-43a8-bb68-3c200e1ba864,2018-06-21,65d4ded2-41ae-43a8-bb68-3c200e1ba864,96.0,96.0,0.0
188,2018-06-22,65d4ded2-41ae-43a8-bb68-3c200e1ba864,2018-06-22,65d4ded2-41ae-43a8-bb68-3c200e1ba864,96.0,96.0,0.0
189,2018-06-23,65d4ded2-41ae-43a8-bb68-3c200e1ba864,2018-06-23,65d4ded2-41ae-43a8-bb68-3c200e1ba864,96.0,96.0,0.0
190,2018-06-24,65d4ded2-41ae-43a8-bb68-3c200e1ba864,2018-06-24,65d4ded2-41ae-43a8-bb68-3c200e1ba864,96.0,96.0,0.0
191,2018-06-25,65d4ded2-41ae-43a8-bb68-3c200e1ba864,2018-06-25,65d4ded2-41ae-43a8-bb68-3c200e1ba864,92.0,92.0,0.0
