## Rental Registration Spreadsheet Cleanup  
There are several types of error that make the raw spreadsheet unusable for joins with other data sets.  All of them come down to problems finding a sutable string in the join key 'acctid', which is a normalized form of what the state calls the 'district' and 'account identifier' fields.  

acctid = '100' + \<district\> + \<account identifier\>  
 
The problems are tyically that the necessary fields is missing, the fields a mis-entered, and sometimes there are duplicate rows that cause confusion.  This notebook provides tools for cleaning up the rental registrations so there is a usable 'acctid' field.  

The methodology here is to accumulate corrections in a file that can be applied to correct the data.  That way we can accumulate fixes in this notebook, and have a file we can use to record fixes that we have to resolve manually (looking in SDAT, checking a map, etc).

There is also code at the bottom of the sheet for clustering properties by owner information pulled from SDAT.

In [1]:
!pip install simpledbf

Collecting simpledbf
  Downloading simpledbf-0.2.6.tar.gz (17 kB)
Building wheels for collected packages: simpledbf
  Building wheel for simpledbf (setup.py) ... [?25l[?25hdone
  Created wheel for simpledbf: filename=simpledbf-0.2.6-py3-none-any.whl size=13801 sha256=c3e8e704280aeb57bef053539abc1b98803b00ce662555a9aa7eacff4bcea787
  Stored in directory: /root/.cache/pip/wheels/24/43/f4/39ad84349e5358346be977fe626160f5625fdd3ea8e017518c
Successfully built simpledbf
Installing collected packages: simpledbf
Successfully installed simpledbf-0.2.6


In [2]:
import re
import pandas as pd
from simpledbf import Dbf5
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


#### Load the rental billing spreadsheet to clean up

In [9]:
#df = pd.read_csv('/content/drive/My Drive/pita 2021/rental billing 2019.csv')
df = pd.read_csv('/content/drive/My Drive/pita 2021/rental billing 20210610.csv')

corrections = [
  ['AVENUE','AVE'],
  ['TERRACE','TER'],
  ['AVE DOWN','AVE'],
  ['AVENUE','AVE'],
  ['RD','ROAD'],
  ['#',':'],
  ['BAYLY ROAD','BAYLY AVE'],
  ['HAYWAROAD','HAYWARD'],
  ['HUBBAROAD','HUBBARD'],
  ['LEONAROAD','LEONARD'],
  ['LEONARD LANE','LEONARDS LANE'],
  ['702-A PINE','702 PINE'],
  ['5242 GALLIUM CT','5240 GALLIUM'],
  ['545 POPLAR ST APTS','543 POPLAR ST'],
  ['203 ROBBINS ST APT A','203 ROBBINS ST'],
  ['210 VIRGINA AVE','210 VIRGINIA AVE'],
  ['404 RUDDY DUCK DR','404 RUDDY DUCK CT']
]

def subsitute(err_string,correction,target):
  return re.sub(err_string,correction,target)

def cleanup_address(a):
  b = a.strip().replace('.','')
  # b = b.replace('-','')
  # b = b.replace(r' +',' ')
  
  for correction in corrections:
    b = subsitute(correction[0],correction[1],b)

  pieces = b.split()
  if pieces[-1][0].isnumeric():
    return " ".join(pieces[0:-1]).replace('.','')
  else:
    return b

# strip off rows with comments, etc
print('Input column headers')
print(df.columns)
print('raw rows:',len(df))

# clean up the column headers
df.rename(columns={'Dist/Account No    ':'Dist/Account No','RENTAL  ':'RENTAL'},inplace=True)
df = df[df['RENTAL'].notna()]
unnnamed = [x for x in df.columns if 'Unnamed' in x]
df.drop(columns=unnnamed, inplace=True)

print('useful rows:',len(df))
print('Updated column headers')
print(df.columns)

# fix addresses and add acctid info that had to be fixed manually to match SDAT
df['Property Location']=df['Property Location'].apply(lambda x: cleanup_address(x))

df.loc[df['Property Location'].fillna("").str.contains('606 WATER ST - UNIT 3'), 'Property Location'] = '606 WATER ST UNIT: 3'
df.loc[df['Property Location'].fillna("").str.contains('700 CATTAIL COVE UNIT:310'), 'Dist/Account No'] = '07-214006'
df.loc[df['Property Location'].fillna("").str.contains('801 TRUMAN ST'), 'Dist/Account No'] = '07-172109'
df.loc[df['Property Location'].fillna("").str.contains('312 WEST END AVE'), 'Dist/Account No'] = '07-141122'
df.loc[df['Property Location'].fillna("").str.contains('1101 GLOVER AVE'), 'Dist/Account No'] = '07-145314'
df.loc[df['Property Location'].fillna("").str.contains('705-707 RIGBY AVE'), 'Dist/Account No'] = '07-111673'
df.loc[df['Property Location'].fillna("").str.contains('1110 LOCUST ST'), 'Dist/Account No'] = '07-143745'
df.loc[df['Property Location'].fillna("").str.contains('GALLIUM'), 'Dist/Account No'] = '07-286609'

df.loc[df['Property Location'].fillna("").str.contains('1014 MILES AVE'), 'Dist/Account No'] = '07-199740'
df.loc[df['Property Location'].fillna("").str.contains('504 RACE ST'), 'Dist/Account No'] = '07-145632'
df.loc[df['Property Location'].fillna("").str.contains('933 PINE ST'), 'Dist/Account No'] = '07-111967'
df.loc[df['Property Location'].fillna("").str.contains('1013 WASHINGTON ST'), 'Dist/Account No'] = '07-144431'
df.loc[df['Business Name'].fillna("").str.contains('EAST COAST CAPITAL INVEST LLC'), 'Dist/Account No'] = '07-213743'

# licenses assigned to wrong acctid?
df.loc[df['License Id'].fillna("").str.contains('20-00414'), 'Dist/Account No'] = '07-106432'
df.loc[df['License Id'].fillna("").str.contains('20-01495'), 'Dist/Account No'] = '07-193475'
df.loc[df['License Id'].fillna("").str.contains('20-01471'), 'Dist/Account No'] = '07-215223'

# duplicate licenes?
df.loc[df['License Id'].fillna("").str.contains('20-01275'), 'Property Location'] = '711 DOUGLAS ST'
df = df[df['License Id'] != "20-01399"]
df = df[df['License Id'] != "20-01247"]

# fix up account numbers to make apn-format tax acctid column
df['Dist/Account No'] = df['Dist/Account No'].fillna('-1')
df['Dist/Account No'] = df.apply(lambda x: x['Dist/Account No'].replace(' ',''),axis=1)
df['acctid'] = df.apply(lambda x: "10{}".format(x['Dist/Account No'].replace('-','')), axis = 1)


Input column headers
Index(['License Id', 'License Type Id', 'Business Name', 'Customer Id',
       'Issue Date', 'Effective Date', 'Expiration Date', 'State Id', 'Status',
       'Property Location', 'Dist/Account No    ', 'Control Num', 'Phone',
       'Phone Ext', 'Contact', 'Vin Id', 'Inspected By', 'Inspection Date',
       'Seasonal', 'Insurance Co', 'Policy No', 'Insurance Exp Date',
       'RENTAL  ', 'RR      ', 'Unnamed: 24', 'Unnamed: 25'],
      dtype='object')
raw rows: 1463
useful rows: 1463
Updated column headers
Index(['License Id', 'License Type Id', 'Business Name', 'Customer Id',
       'Issue Date', 'Effective Date', 'Expiration Date', 'State Id', 'Status',
       'Property Location', 'Dist/Account No', 'Control Num', 'Phone',
       'Phone Ext', 'Contact', 'Vin Id', 'Inspected By', 'Inspection Date',
       'Seasonal', 'Insurance Co', 'Policy No', 'Insurance Exp Date', 'RENTAL',
       'RR      '],
      dtype='object')


#### Load the latest SDAT information, and join it

In [10]:
#sdat = pd.read_csv('drive/My Drive/pita 2021/SDAT-CAN-ref-202105.csv')
sdat = pd.read_csv('drive/My Drive/pita 2021/SDAT-CAN-ref-202106.csv')
sdat.acctid = sdat.acctid.apply(lambda x: str(x).strip())
sdat = sdat.set_index('acctid')

  interactivity=interactivity, compiler=compiler, result=result)


In [11]:
sdat.query('ownname1.str.contains("ARCADE")',engine='python')

Unnamed: 0_level_0,jurscode,digxcord,digycord,ct2010,bg2010,geogcode,ooi,resityp,address,strtnum,strtdir,strtnam,strttyp,strtsfx,strtunt,addrtyp,city,zipcode,ownname1,ownname2,namekey,ownadd1,ownadd2,owncity,ownstate,ownerzip,ownzip2,premsnum,premsdir,premsnam,premstyp,premcity,premzip,premzip2,legal1,legal2,legal3,dr1clerk,dr1liber,dr1folio,...,crtarcod,fcmacode,agfndarea,agfndluom,entzndat,entznassm,plndevdat,nprctstdat,nprcarea,nprcluom,homqlcod,homqldat,bldg_story,bldg_units,resident,resi2010,resi2000,resi1990,resiuths,aprtment,trailer,special,other,ptype,sdatwebadr,existing,mdpvdate,sdat,google_maps,struct_sqft,assessed_value,address_number,address_unit_id,street_direction,street_name,street_type,premise_address_type_mdp_field_premstyp_sdat_field_24,premise_address_city_mdp_field_premcity_sdat_field_25,premise_address_zip_code_mdp_field_premzip_sdat_field_26,mdp_street_address_mdp_field_address
acctid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
1007145632,DORC,480423.3,100458.6,24019970000.0,240199700000.0,81,N,AP,504 RACE ST,504.0,,RACE,ST,,,P,CAMBRIDGE,21613.0,ARCADE LLC,,ARCADE LLC,PO BOX 1118,,STEVENSVILLE,MD,21666.0,1118.0,504.0,,RACE,ST,CAMBRIDGE,21613.0,,"IMPSLOT 38,375 SQ.FT.",CORNER RACE AND MUIR ST.,CAMBRIDGE,MLB,581.0,446.0,...,,,0.0,,,0.0,,,0.0,,,,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,http://sdat.dat.maryland.gov/RealProperty/Page...,MDPV2017_18,2020JUN,http://sdat.dat.maryland.gov/RealProperty/Page...,https://maps.google.com/maps?t=h&q=38.56800502...,35087.0,811600.0,1910.0,504.0,,,RACE,ST,CAMBRIDGE,21613.0,504 RACE ST


In [12]:
sdat_merge_df = df.merge(sdat,left_on='acctid',right_on='acctid',how='outer',indicator=True)
cleaned_registrations_df = sdat_merge_df.query('_merge == "both"')[list(df.columns)+['address']]
print('found:',len(cleaned_registrations_df))
print('problem records, in the rental sheet but not in sdat?:',len(sdat_merge_df.query('_merge == "left_only"')))
cleaned_registrations_df

found: 1396
problem records, in the rental sheet but not in sdat?: 65


Unnamed: 0,License Id,License Type Id,Business Name,Customer Id,Issue Date,Effective Date,Expiration Date,State Id,Status,Property Location,Dist/Account No,Control Num,Phone,Phone Ext,Contact,Vin Id,Inspected By,Inspection Date,Seasonal,Insurance Co,Policy No,Insurance Exp Date,RENTAL,RR,acctid,address
0,20-00001,RENTAL,OTTER LLC,RR-07388,,7/1/2020,6/30/2021,,Approved,416 BOUNDARY AVE,07-113935,,(410)841-6835,,,,,,N,,,,1.0,0.0,1007113935,416 BOUNDARY AVE
1,20-00002,RENTAL,DAGOSTINO COREY,RR-07981,7/31/2020,7/1/2020,6/30/2021,,Approved,704 CHURCH ST,07-148038,,(202)258-9377,,,,,,N,,,,2.0,0.0,1007148038,704 CHURCH ST
2,20-00003,RENTAL,JAMES INVESTMENTS LLC,RR-04889,,7/1/2020,6/30/2021,,Approved,809 PHILLIPS ST,07-130538,,(410)228-0810,,,,,,N,,,,1.0,0.0,1007130538,809 PHILLIPS ST
3,20-00004,RENTAL,OTTER LLC,RR-00375,,7/1/2020,6/30/2021,,Approved,715 PEACHBLOSSOM AVE,07-126905,,(443)521-5298,,,,,,N,,,,1.0,0.0,1007126905,715 PEACHBLOSSOM AVE
4,20-00005,RENTAL,RUDDY DUCK LLC,RR-00380,,7/1/2020,6/30/2021,,Approved,727 PEACHBLOSSOM AVE,07-104294,,,,,,,,N,,,,1.0,0.0,1007104294,727 PEACHBLOSSOM AVE
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1456,20-01579,RENTAL,MORRIS FAMILY INVESTMENTS LLC,RR-04159,,7/1/2020,6/30/2021,,Approved,907 PHILLIPS ST,07-145179,,,,,,,,N,,,,1.0,0.0,1007145179,907 PHILLIPS ST
1457,20-01583,RENTAL,DORCHESTER ELKS LODGE 223,RR-04428,,7/1/2020,6/30/2021,,Approved,622 PINE ST,07-107927,,,,,,,,N,,,,1.0,0.0,1007107927,622 PINE ST
1458,20-01584,RENTAL,HERNANDEZ GUZMAN DIEGO,RR-00593,,7/1/2020,6/30/2021,,Approved,411 MARYLAND AVE,07-146914,,,,,,,,N,,,,1.0,0.0,1007146914,411 MARYLAND AVE
1459,20-01585,RENTAL,WELCH PATRICK,RR-08637,5/7/2021,7/1/2020,6/30/2021,,Approved,606 WATER ST UNIT: 3,07-191790,,(772)221-7928,,,,,,N,,,,1.0,0.0,1007191790,606 WATER ST


### Fix up any problem records. 
The next few cells are ways to check for errors.  
The first method is to take the prior work, and try to match fields picked up from this process last time.

In [13]:
# start with the left join of the prior result, that's the records that didn't match
problems_df = sdat_merge_df.query('_merge == "left_only"')

# grab the acctid from a prior year where you can, and try to merge with sdat using that key for some of the bad rows
history_df = pd.read_csv('drive/My Drive/pita 2021/cambridge-combined-old-new-rental-lists-17-18.csv').rename(columns={'ACCTID':'acctid'})
history_df = history_df[history_df['acctid'].notna()]
history_df.acctid = history_df.acctid.apply(lambda x: str(x).strip())
fixups_df = problems_df.drop(columns=['_merge']).merge(history_df,on='acctid',how='outer',indicator=True).drop_duplicates()

print("these can be fixed leverging prior results:", len(fixups_df[(fixups_df['_merge'] == "both")]))
cleaned_registrations_df = cleaned_registrations_df.append(fixups_df[(fixups_df['_merge'] == "both")][list(df.columns)+['address']])
print(len(cleaned_registrations_df),"of",len(df),"rows cleaned")
print("these still need more work:",len(fixups_df[(fixups_df['_merge'] == "left_only")] ))

these can be fixed leverging prior results: 6
1402 of 1461 rows cleaned
these still need more work: 59


#### One thing to do is to try a join on address...  
but first you have to clean them up a bit

In [14]:
corrected_address_df = fixups_df[(fixups_df['_merge'] == "left_only")].drop(columns='_merge')
corrected_address_df['Property Location'] = corrected_address_df.apply(lambda x: cleanup_address( x['Property Location'] ), axis=1)

In [15]:
# these are found
corrected_address_join = corrected_address_df.drop(columns=['acctid','address']).merge(sdat.reset_index()[['acctid','address']],
                              left_on='Property Location',right_on='address',
                              how='outer',indicator=True)
found_by_join_on_address = corrected_address_join.query('_merge == "both"')[df.columns.to_list()+['address']]

cleaned_registrations_df = cleaned_registrations_df.append(found_by_join_on_address[list(df.columns)+['address']])
print(len(cleaned_registrations_df),"of",len(df),"rows cleaned")
print("these still need more work:",len(corrected_address_join.query('_merge == "left_only"')[df.columns.to_list()+['address']]))

1462 of 1461 rows cleaned
these still need more work: 0


In [16]:
corrected_address_join.query('_merge == "left_only"')[df.columns.to_list()+['address']]

Unnamed: 0,License Id,License Type Id,Business Name,Customer Id,Issue Date,Effective Date,Expiration Date,State Id,Status,Property Location,Dist/Account No,Control Num,Phone,Phone Ext,Contact,Vin Id,Inspected By,Inspection Date,Seasonal,Insurance Co,Policy No,Insurance Exp Date,RENTAL,RR,acctid,address


### Now check where more than one license is assigned to an acctid

In [17]:
cleaned_registrations_df#[cleaned_registrations_df.duplicated(keep=False)]
# len(cleaned_registrations_df.drop_duplicates())
# cleaned_registrations_df[cleaned_registrations_df.duplicated(subset=['acctid','Customer Id','address'],keep=False)]
# cleaned_registrations_df[cleaned_registrations_df.duplicated(subset=['acctid','address'],keep=False)]


Unnamed: 0,License Id,License Type Id,Business Name,Customer Id,Issue Date,Effective Date,Expiration Date,State Id,Status,Property Location,Dist/Account No,Control Num,Phone,Phone Ext,Contact,Vin Id,Inspected By,Inspection Date,Seasonal,Insurance Co,Policy No,Insurance Exp Date,RENTAL,RR,acctid,address
0,20-00001,RENTAL,OTTER LLC,RR-07388,,7/1/2020,6/30/2021,,Approved,416 BOUNDARY AVE,07-113935,,(410)841-6835,,,,,,N,,,,1.0,0.0,1007113935,416 BOUNDARY AVE
1,20-00002,RENTAL,DAGOSTINO COREY,RR-07981,7/31/2020,7/1/2020,6/30/2021,,Approved,704 CHURCH ST,07-148038,,(202)258-9377,,,,,,N,,,,2.0,0.0,1007148038,704 CHURCH ST
2,20-00003,RENTAL,JAMES INVESTMENTS LLC,RR-04889,,7/1/2020,6/30/2021,,Approved,809 PHILLIPS ST,07-130538,,(410)228-0810,,,,,,N,,,,1.0,0.0,1007130538,809 PHILLIPS ST
3,20-00004,RENTAL,OTTER LLC,RR-00375,,7/1/2020,6/30/2021,,Approved,715 PEACHBLOSSOM AVE,07-126905,,(443)521-5298,,,,,,N,,,,1.0,0.0,1007126905,715 PEACHBLOSSOM AVE
4,20-00005,RENTAL,RUDDY DUCK LLC,RR-00380,,7/1/2020,6/30/2021,,Approved,727 PEACHBLOSSOM AVE,07-104294,,,,,,,,N,,,,1.0,0.0,1007104294,727 PEACHBLOSSOM AVE
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
55,20-01580,RENTAL,EDWARD GRIFFITH,RR-01822,,7/1/2020,6/30/2021,,Approved,415 TALBOT AVE,,,(410)221-0938,,,,,,N,,,,1.0,0.0,1007151055,415 TALBOT AVE
56,20-01581,RENTAL,QUALITY HOUSING OF CAMBRIDGE,RR-04130,,7/1/2020,6/30/2021,,Approved,710 WASHINGTON ST,,,,,,,,,N,,,,1.0,0.0,1007148771,710 WASHINGTON ST
57,20-01581,RENTAL,QUALITY HOUSING OF CAMBRIDGE,RR-04130,,7/1/2020,6/30/2021,,Approved,710 WASHINGTON ST,,,,,,,,,N,,,,1.0,0.0,1007151985,710 WASHINGTON ST
58,20-01582,RENTAL,HOCKADAY ROBERT,RR-09596,,7/1/2020,6/30/2021,,Approved,543 POPLAR ST,,,,,,,,,N,,,,4.0,0.0,1007112114,543 POPLAR ST


## WRITE OUT THE CLEANED RENTAL BILLING  
Any remaining above will need to be added by hand?

In [18]:
cleaned_registrations_df.reset_index(drop=True).drop_duplicates().to_csv('/content/drive/My Drive/pita 2021/cleaned_rental_billing-2021.csv')

In [None]:
diffs = pd.read_csv('/content/drive/My Drive/pita 2021/rental billing 20210610.csv')
diffs.rename(columns={'Dist/Account No    ':'Dist/Account No','RENTAL  ':'RENTAL'},inplace=True)
print("raw",len(diffs))
# clean up the column headers
diffs.rename(columns={'Dist/Account No    ':'Dist/Account No','RENTAL  ':'RENTAL'},inplace=True)
len(diffs[~diffs['RENTAL'].notna()])
diffs = diffs[diffs['RENTAL'].notna()]
diffs
unnnamed = [x for x in diffs.columns if 'Unnamed' in x]
diffs.drop(columns=unnnamed, inplace=True)
print('useful rows:',len(diffs))

diffs['acctid_given'] = diffs['Dist/Account No'].apply(lambda x: "10"+re.sub(r' +','',x).replace('-',''))
diffs['address_given'] = diffs.apply(lambda x: cleanup_address( x['Property Location'] ), axis=1)
diffs_merged = diffs.merge(cleaned_registrations_df[['License Id','acctid','address']],on='License Id',how='outer',indicator=True)
#for x in sorted(list(
#  print(x)
print("cleaned",len(cleaned_registrations_df.drop(columns=['address'])))

raw 1463
useful rows: 1463
cleaned 1462


## ADD Fixups  
This section adds back things accumulated in a separate csv file as fixups.

As more items in the list are resolved, the fixup list will shrink at each update.  The method here, of merging historical data with the cleanded spresdsheet will resolve most issues in the future.

In [None]:
# for x in zip(sorted(list(diffs_merged.query('_merge == "right_only"')['address'].astype(str))),sorted(list(diffs_merged.query('_merge == "left_only"')['Property Location']))):
#   print(x[1],"-->",x[1])
# for x in zip(sorted(list(diffs_merged.query('_merge == "right_only"')['acctid'].astype(str))),sorted(list(diffs_merged.query('_merge == "left_only"')['Property Location']))):
#   print(x[1],"-->",x[0])

report_df = diffs_merged.query('_merge == "both"')
report_df.acctid = report_df.apply(lambda x: x.acctid if x.acctid != x.acctid_given else "",axis=1)
report_df.address = report_df.apply(lambda x: x.address if x.address != x.address_given else "",axis=1)
report_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self[name] = value


Unnamed: 0,License Id,License Type Id,Business Name,Customer Id,Issue Date,Effective Date,Expiration Date,State Id,Status,Property Location,Dist/Account No,Control Num,Phone,Phone Ext,Contact,Vin Id,Inspected By,Inspection Date,Seasonal,Insurance Co,Policy No,Insurance Exp Date,RENTAL,RR,acctid_given,address_given,acctid,address,_merge
0,20-00001,RENTAL,OTTER LLC,RR-07388,,7/1/2020,6/30/2021,,Approved,416 BOUNDARY AVE 128,07- 113935,,(410)841-6835,,,,,,N,,,,1.0,0.0,1007113935,416 BOUNDARY AVE,,,both
1,20-00002,RENTAL,DAGOSTINO COREY,RR-07981,7/31/2020,7/1/2020,6/30/2021,,Approved,704 CHURCH ST 374,07- 148038,,(202)258-9377,,,,,,N,,,,2.0,0.0,1007148038,704 CHURCH ST,,,both
2,20-00003,RENTAL,JAMES INVESTMENTS LLC,RR-04889,,7/1/2020,6/30/2021,,Approved,809 PHILLIPS ST 1425,07- 130538,,(410)228-0810,,,,,,N,,,,1.0,0.0,1007130538,809 PHILLIPS ST,,,both
3,20-00004,RENTAL,OTTER LLC,RR-00375,,7/1/2020,6/30/2021,,Approved,715 PEACHBLOSSOM AVE 1370,07- 126905,,(443)521-5298,,,,,,N,,,,1.0,0.0,1007126905,715 PEACHBLOSSOM AVE,,,both
4,20-00005,RENTAL,RUDDY DUCK LLC,RR-00380,,7/1/2020,6/30/2021,,Approved,727 PEACHBLOSSOM AVE 1375,07- 104294,,,,,,,,N,,,,1.0,0.0,1007104294,727 PEACHBLOSSOM AVE,,,both
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1459,20-01583,RENTAL,DORCHESTER ELKS LODGE 223,RR-04428,,7/1/2020,6/30/2021,,Approved,622 PINE ST 2632,07- 107927,,,,,,,,N,,,,1.0,0.0,1007107927,622 PINE ST,,,both
1460,20-01584,RENTAL,HERNANDEZ GUZMAN DIEGO,RR-00593,,7/1/2020,6/30/2021,,Approved,411 MARYLAND AVE 1121,07- 146914,,,,,,,,N,,,,1.0,0.0,1007146914,411 MARYLAND AVE,,,both
1461,20-01585,RENTAL,WELCH PATRICK,RR-08637,5/7/2021,7/1/2020,6/30/2021,,Approved,606 WATER ST - UNIT 3 3210,07- 191790,,(772)221-7928,,,,,,N,,,,1.0,0.0,1007191790,606 WATER ST - UNIT 3,,606 WATER ST,both
1462,20-01586,RENTAL,AL MOORE,RR-02780,,7/1/2020,6/30/2021,,Approved,105 MILL ST 1163,07- 167407,,,,,,,,N,,,,1.0,0.0,1007167407,105 MILL ST,,,both


In [None]:
report_df.to_csv('/content/drive/My Drive/pita 2021/rental_billing-2020-changelog.csv')