# Compare - CM Vs PayPal

#### Specifications

**CM Vs PayPal**

Input files
1. Segregated CM file
2. PayPal raw file


Comparison
1. Compare 'Line Item Value' of CM with 'Gross' of Paypal
2. 'refund' records of CM will not have data from Paypal



### Script
---

#### Imports,prepartions and functions

In [1]:
import pandas as pd
import os
!pip install XlsxWriter
import xlsxwriter


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting XlsxWriter
  Downloading XlsxWriter-3.0.8-py3-none-any.whl (152 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m152.8/152.8 KB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: XlsxWriter
Successfully installed XlsxWriter-3.0.8


###### Preparations
---


In [4]:
#Mount GDrive
from google.colab import drive
drive.mount('/content/gdrive')

#Set file paths
#Segregated CM file
path = "/content/gdrive/My Drive/Data Science 2022/CM Audit/"

inpath = path + "Monthly Output/2023.02/"
filecm  = inpath + "CM_2023.02.xlsx"
#Paypal raw file
filepp = path  + "Paypal Data/2023/Paypal_Raw_Data_2023_02.xlsx"

outpath = path + "Monthly Output/2023.02/Comparison Results/"
outfile = outpath + "CM Vs PayPal_202302.xlsx"
#Create Results folder if missing
if not os.path.isdir(outpath):
  os.mkdir(outpath)



Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


#### Compare -  CM Vs PayPal 


In [5]:
#Load CM file
dfcm0 = pd.read_excel(filecm,sheet_name="Paypal" , na_filter=False, index_col=False)

#Load 'Revenue' sheet of stripe file
df = pd.read_excel(filepp, na_filter=False, index_col=False)

#Change the data type of gross to numeric
df = df.astype({"Gross": str})
df['Gross'] = df['Gross'].str.replace(',','')
df = df.astype({"Gross": float})

#Remove 'refund' records from CM and append upon merging
dfcm = dfcm0[~(dfcm0["Payment / Refund"] == 'refund')]
dfrefund=dfcm0[dfcm0["Payment / Refund"] == 'refund']

dfmerge = pd.merge(dfcm, df, left_on="Customer External ID", right_on="Reference Txn ID", how='left',suffixes = (None,"_P"))

#Keep only CM data along with 'Gross' of Paypal
cmcols = dfcm.columns.tolist()
cmcols.append('Gross')  #Keep only 'gross' column of stripe

mcols = dfmerge.columns.tolist()
newcols = [x for x  in mcols if x in cmcols  ]
dfnewpp = dfmerge[newcols]
#Delete duplicate rows
dfnewpp.drop_duplicates(inplace=True)

#Append refund records that was removed previously
dfnewpp = pd.concat([dfnewpp,dfrefund])

#Rearrange columns so that 'Gross' and 'Line Item Value' appears at last.
new_cols1 = [col for col in dfnewpp.columns if col != 'Line Item Value'] + ['Line Item Value']
dfnewpp = dfnewpp[new_cols1]
new_cols = [col for col in dfnewpp.columns if col != 'Gross'] + ['Gross']
dfnewpp = dfnewpp[new_cols]

#Add difference column
dfnewpp['Gross'] = dfnewpp['Gross'].round(decimals = 2)
dfnewpp['Line Item Value'] = dfnewpp['Line Item Value'].round(decimals = 2)

diff = dfnewpp['Line Item Value'] - dfnewpp['Gross'] 
dfnewpp['Difference'] = diff

dfnewpp.rename(columns={'Gross':'From PayPal data'}, inplace=True)

#Sort by plan type
dfnewpp = dfnewpp.sort_values('Plan Type',ascending=False)
writer = pd.ExcelWriter(outfile, engine='xlsxwriter')
dfnewpp.to_excel(writer,sheet_name="CM Vs PayPal", index=False)

writer.save()
writer.close()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return func(*args, **kwargs)
  warn("Calling close() on already closed file.")
