# Compare - CM Vs Stripe

#### Specifications

**CM Vs Stripe**

Input files
1. Cleaned Stripe file
2. Segregated CM file

Insights on Input data
1. Stripe raw data has no duplicate entries for ChargeID. It has summed up values as 'gross'
2. CM data has multiple transactions with the same ChargeID.
3. In CM file, sometimes the same ChargeID appears separately in Monthly and Yearly files of CM and hence while clubbing we end up having two records for the same ChargeID with different Plan Types.

Comparison

Group CM data by ChargeID by summing up "Line Item Value Account Currency" and then compare it with Stripe's 'Gross'.


### Script
---

#### Imports,prepartions and functions

In [1]:
import pandas as pd
import os
!pip install XlsxWriter
import xlsxwriter

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting XlsxWriter
  Downloading XlsxWriter-3.0.8-py3-none-any.whl (152 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m152.8/152.8 KB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: XlsxWriter
Successfully installed XlsxWriter-3.0.8


###### Preparations
---


In [2]:
#Mount GDrive
from google.colab import drive
drive.mount('/content/gdrive')

#Set file paths
path = "/content/gdrive/My Drive/Data Science 2022/CM Audit/"

inpath = path + "Monthly Output/2023.02/"
filestripe = inpath + "Stripe_2023.02.xlsx"
filecm  = inpath + "CM_2023.02.xlsx"

outpath = path + "Monthly Output/2023.02/Comparison Results/"
outfile = outpath + "CMVsStripe_202302.xlsx"
#Create Results folder if missing
if not os.path.isdir(outpath):
  os.mkdir(outpath)



Mounted at /content/gdrive


#### Compare -  CM Vs Stripe 


In [4]:
#Load CM file
dfcm0 = pd.read_excel(filecm,sheet_name="Stripe" , na_filter=False, index_col=False)

#Group CM data by 'ChargeId' sum up 'Line Item Value Account Currency'
dfcm = dfcm0.groupby(['Charge ID'], as_index = False).agg({'Line Item Value Account Currency':'sum','Customer Name':'first','Charge Timestamp':'first','Invoice Timestamp':'first',
                                                           'Invoice ID':'first','Payment / Refund':'first','Line Item Name':'first','Line Item Type':'first',
                                                           'Customer External ID':'first','Plan Type':'first'})


#Load 'Revenue' sheet of stripe file
df = pd.read_excel(filestripe,sheet_name="Cleaned Data", na_filter=False, index_col=False)


dfmerge = pd.merge(dfcm, df, left_on="Charge ID", right_on="source_id", how='left',suffixes = (None,"_S"))

#Keep only CM data along with 'gross' of stripe
cmcols = dfcm.columns.tolist()
cmcols.append('gross')  #Keep only 'gross' column of stripe

mcols = dfmerge.columns.tolist()
newcols = [x for x  in mcols if x in cmcols  ]
dfnews = dfmerge[newcols]
#Delete duplicate rows
dfnews.drop_duplicates(inplace=True)


#Rearrange columns so that 'gross' and 'Line Item Value Acc Curr' appears at last.
new_cols1 = [col for col in dfnews.columns if col != 'Line Item Value Account Currency'] + ['Line Item Value Account Currency']
dfnews = dfnews[new_cols1]
new_cols = [col for col in dfnews.columns if col != 'gross'] + ['gross']
dfnews = dfnews[new_cols]

#Add difference column
dfnews['gross'] = dfnews['gross'].round(decimals = 2)
dfnews['Line Item Value Account Currency'] = dfnews['Line Item Value Account Currency'].round(decimals = 2)

diff = dfnews['Line Item Value Account Currency'] - dfnews['gross'] 
dfnews['Difference'] = diff

dfnews.rename(columns={'gross':'From Stripe data'}, inplace=True)

#Sort by plan type
dfnews = dfnews.sort_values('Plan Type',ascending=False)
writer = pd.ExcelWriter(outfile, engine='xlsxwriter')
dfnews.to_excel(writer,sheet_name="CM Vs Stripe", index=False)

writer.save()
writer.close()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return func(*args, **kwargs)
  warn("Calling close() on already closed file.")
