## Rental Registration Spreadsheet Cleanup  
There are several types of error that make the raw spreadsheet unusable for joins with other data sets.  All of them come down to problems finding a sutable string in the join key 'acctid', which is a normalized form of what the state calls the 'district' and 'account identifier' fields.  

acctid = '100' + \<district\> + \<account identifier\>  
 
The problems are tyically that the necessary fields is missing, the fields a mis-entered, and sometimes there are duplicate rows that cause confusion.  This notebook provides tools for cleaning up the rental registrations so there is a usable 'acctid' field.  

The methodology here is to accumulate corrections in a file that can be applied to correct the data.  That way we can accumulate fixes in this notebook, and have a file we can use to record fixes that we have to resolve manually (looking in SDAT, checking a map, etc).

There is also code at the bottom of the sheet for clustering properties by owner information pulled from SDAT.

In [49]:
!pip install simpledbf

Collecting leven
  Downloading https://files.pythonhosted.org/packages/73/02/37084115516cfd595ee2f9a873fffe8b85c6b1538523ff6a8b8dd7ff7d46/leven-1.0.4.tar.gz
Collecting nose
[?25l  Downloading https://files.pythonhosted.org/packages/15/d8/dd071918c040f50fa1cf80da16423af51ff8ce4a0f2399b7bf8de45ac3d9/nose-1.3.7-py3-none-any.whl (154kB)
[K     |████████████████████████████████| 163kB 6.8MB/s 
[?25hBuilding wheels for collected packages: leven
  Building wheel for leven (setup.py) ... [?25l[?25hdone
  Created wheel for leven: filename=leven-1.0.4-cp37-cp37m-linux_x86_64.whl size=55483 sha256=2064ccf24021b85aecbf915a446d206cbb969719a2f82609d5558be1d59aa8c7
  Stored in directory: /root/.cache/pip/wheels/54/64/a5/439db671d666a50f3b3cebd2dcab3fbbab02785adf58e47552
Successfully built leven
Installing collected packages: nose, leven
Successfully installed leven-1.0.4 nose-1.3.7


In [4]:
import pandas as pd
from simpledbf import Dbf5
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth

In [5]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


#### Load the rental billing spreadsheet to clean up

In [6]:
#df = pd.read_csv('/content/drive/My Drive/pita 2021/rental billing 2019.csv')
df = pd.read_csv('/content/drive/My Drive/pita 2021/rental billing 2019.csv')

# strip off rows with comments, etc
print('Input column headers')
print(df.columns)
print('raw rows:',len(df))

# clean up the column headers
df.rename(columns={'Dist/Account No    ':'Dist/Account No','RENTAL  ':'RENTAL'},inplace=True)
df = df[df['RENTAL'].notna()]

# fix up account numbers to make apn-format tax acctid column
df['Dist/Account No'] = df['Dist/Account No'].fillna('-1')
df['Dist/Account No'] = df.apply(lambda x: x['Dist/Account No'].replace(' ',''),axis=1)
df['acctid'] = df.apply(lambda x: "10{}".format(x['Dist/Account No'].replace('-','')), axis = 1)

print('useful rows:',len(df))
print('Updated column headers')
print(df.columns)


Input column headers
Index(['License Id', 'License Type Id', 'Business Name', 'Customer Id',
       'Effective Date', 'Expiration Date', 'State Id', 'Status',
       'Property Location', 'Dist/Account No    ', 'RENTAL  ', 'Unnamed: 11',
       'Unnamed: 12'],
      dtype='object')
raw rows: 1556
useful rows: 1550
Updated column headers
Index(['License Id', 'License Type Id', 'Business Name', 'Customer Id',
       'Effective Date', 'Expiration Date', 'State Id', 'Status',
       'Property Location', 'Dist/Account No', 'RENTAL', 'Unnamed: 11',
       'Unnamed: 12', 'acctid'],
      dtype='object')


#### Load the latest SDAT information, and join it

In [21]:
sdat = pd.read_csv('drive/My Drive/pita 2021/SDAT-CAN-ref-202105.csv')
sdat.acctid = sdat.acctid.apply(lambda x: str(x).strip())
sdat = sdat.set_index('acctid')
sdat

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0_level_0,jurscode,digxcord,digycord,ct2010,bg2010,geogcode,ooi,resityp,address,strtnum,strtdir,strtnam,strttyp,strtsfx,strtunt,addrtyp,city,zipcode,ownname1,ownname2,namekey,ownadd1,ownadd2,owncity,ownstate,ownerzip,ownzip2,premsnum,premsdir,premsnam,premstyp,premcity,premzip,premzip2,legal1,legal2,legal3,dr1clerk,dr1liber,dr1folio,...,crtarcod,fcmacode,agfndarea,agfndluom,entzndat,entznassm,plndevdat,nprctstdat,nprcarea,nprcluom,homqlcod,homqldat,bldg_story,bldg_units,resident,resi2010,resi2000,resi1990,resiuths,aprtment,trailer,special,other,ptype,sdatwebadr,existing,mdpvdate,sdat,google_maps,struct_sqft,assessed_value,address_number,address_unit_id,street_direction,street_name,street_type,premise_address_type_mdp_field_premstyp_sdat_field_24,premise_address_city_mdp_field_premcity_sdat_field_25,premise_address_zip_code_mdp_field_premzip_sdat_field_26,mdp_street_address_mdp_field_address
acctid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
1001000012,DORC,508948.5,110654.0,2.401997e+10,2.401997e+11,80,N,SF,5727 ADAMS ROAD,5727.0,,ADAMS,RD,,,P,FEDERALSBURG,21632.0,NAGEL RICHARD LEE & CONNIE JANE,,NAGEL RICHARD LEE & CONNI,5714 ADAMS RD,,FEDERALSBURG,MD,21632.0,1700.0,5727.0,,ADAMS,RD,FEDERALSBURG,21632.0,,52.94 ACRES,S/S ADAMS RD.,NE OF FINCHVILLE,MLB,363.0,779.0,...,,,0.0,,,0.0,,,0.0,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,http://sdat.dat.maryland.gov/RealProperty/Page...,MDPV2017_18,2020JUN,http://sdat.dat.maryland.gov/RealProperty/Page...,https://maps.google.com/maps?t=h&q=38.65678537...,0,22100,0,5727,,,ADAMS,RD,FEDERALSBURG,21632.0,5727 ADAMS ROAD
1001000039,DORC,511216.1,106713.9,2.401997e+10,2.401997e+11,80,H,SF,6009 COKESBURY ROAD,6009.0,,COKESBURY,RD,,,P,RHODESDALE,19973.0,GARDINER KEVIN E,GARDINER LORI A,GARDINER KEVIN E,6009 COKESBURY RD,,SEAFORD,DE,19973.0,,6009.0,,COKESBURY,RD,SEAFORD,19973.0,,IMPS4.80 ACRES,E/S COKESBURY RD,SW/RELIANCE,,1493.0,455.0,...,,,0.0,,,0.0,,,0.0,,,,,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,2.0,http://sdat.dat.maryland.gov/RealProperty/Page...,MDPV2017_18,2020JUN,http://sdat.dat.maryland.gov/RealProperty/Page...,https://maps.google.com/maps?t=h&q=38.62101071...,3492,287100,2004,6009,,,COKESBURY,RD,SEAFORD,19973.0,6009 COKESBURY ROAD
1001000047,DORC,508807.3,110360.1,2.401997e+10,2.401997e+11,80,N,SF,5731 DAVIS MILL POND ROAD,5731.0,,DAVIS MILL POND,RD,,,P,FEDERALSBURG,21632.0,HARIM MILLSBORO LLC,,HARIM MILLSBORO LLC,PO BOX 1380,MAILSTOP 100484,MILLSBORO,DE,19966.0,,5731.0,,DAVIS MILL POND,RD,,,,IMPS20 ACRES,W/S DAVIS MILLPOND RD,NE/FINCHVILLE,,1471.0,11.0,...,,,0.0,,,0.0,,,0.0,,,,,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,2.0,http://sdat.dat.maryland.gov/RealProperty/Page...,MDPV2017_18,2020JUN,http://sdat.dat.maryland.gov/RealProperty/Page...,https://maps.google.com/maps?t=h&q=38.65415540...,1438,90400,1920,5731,,,DAVIS MILL POND,RD,,0.0,5731 DAVIS MILL POND ROAD
1001000055,DORC,507295.0,112993.8,2.401997e+10,2.401997e+11,80,N,TR,6940 RELIANCE ROAD,6940.0,,RELIANCE,RD,,,P,FEDERALSBURG,21632.0,HARIM MILLSBORO LLC,,HARIM MILLSBORO LLC,PO BOX 1380,MAILSTOP 100484,MILLSBORO,DE,19966.0,,6940.0,,RELIANCE,RD,,,,IMPS232 AC,S/W ALLENS COR-FDG RD,W/ALLENS COR,,1471.0,11.0,...,,,0.0,,,0.0,,,0.0,,,,,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,2.0,http://sdat.dat.maryland.gov/RealProperty/Page...,MDPV2017_18,2020JUN,http://sdat.dat.maryland.gov/RealProperty/Page...,https://maps.google.com/maps?t=h&q=38.67806461...,1056,181000,1989,6940,,,RELIANCE,RD,,0.0,6940 RELIANCE ROAD
1001000063,DORC,511793.2,105251.7,2.401997e+10,2.401997e+11,80,D,SF,6002 ALLEN ROAD,6002.0,,ALLEN,RD,,,P,RHODESDALE,19973.0,DONOVAN MICHAEL,DONOVAN VICKI,DONOVAN MICHAEL,6002 ALLEN RD,RT 3 BOX 270,SEAFORD,DE,19973.0,6057.0,6002.0,,ALLEN,RD,SEAFORD,19973.0,,IMPS72 ACRES,E/S ALLEN RD.,S OF RELIANCE,PLC,243.0,368.0,...,,,0.0,,,0.0,,,0.0,,A,,,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,2.0,http://sdat.dat.maryland.gov/RealProperty/Page...,MDPV2017_18,2020JUN,http://sdat.dat.maryland.gov/RealProperty/Page...,https://maps.google.com/maps?t=h&q=38.60776656...,2491,365400,2004,6002,,,ALLEN,RD,SEAFORD,19973.0,6002 ALLEN ROAD
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1018001128,DORC,486110.6,71591.0,2.401997e+10,2.401997e+11,82,N,,2211 ELLIOTT ISLAND ROAD,,,,,,,,VIENNA,21869.0,MARTINEK CHESTER DANIEL,MARTINEK MELISSA G,MARTINEK CHESTER DANIEL,2232 ELLIOTT ISLAND RD,,VIENNA,MD,21869.0,9633.0,2211.0,,ELLIOTT ISLAND,RD,,,,IMPV2.06 ACRES,E/S FISHING BAY,,,1597.0,100.0,...,,,,,,,,,,,,,,,,,,,,,,,,2.0,https://sdat.dat.maryland.gov/RealProperty/Pag...,MDPV2017_18,,http://sdat.dat.maryland.gov/RealProperty/Page...,https://maps.google.com/maps?t=h&q=38.30742140...,938,157400,1970,2211,,,ELLIOTT ISLAND,RD,,0.0,2211 ELLIOTT ISLAND ROAD
1018001721,DORC,486204.8,71552.5,2.401997e+10,2.401997e+11,82,N,,,,,,,,,,VIENNA,,MARTINEK CHESTER DANIEL,MARTINEK MELISSA G,MARTINEK CHESTER DANIEL,2232 ELLIOTT ISLAND RD,,VIENNA,MD,21869.0,9633.0,,,ELLIOTT ISLAND,RD,,,,1 ACRE,S OF ELLIOTT ISLAND RD.,,,1597.0,100.0,...,,,,,,,,,,,,,,,,,,,,,,,,2.0,https://sdat.dat.maryland.gov/RealProperty/Pag...,MDPV2017_18,,http://sdat.dat.maryland.gov/RealProperty/Page...,https://maps.google.com/maps?t=h&q=38.30706542...,0,1000,0,0,,,ELLIOTT ISLAND,RD,,0.0,
1018001764,DORC,487882.7,71842.0,2.401997e+10,2.401997e+11,82,N,,2357 ELLIOTT ISLAND ROAD,,,,,,,,VIENNA,21869.0,ZIMMERMAN EARL,ZIMMERMAN BRENDA,ZIMMERMAN EARL,3 BUCH MILL RD,,LITITZ,PA,17543.0,,2357.0,,ELLIOTT ISLAND,RD,VIENNA,21869.0,9603.0,"IMPV30,492 SQ. FT.",S OF ELLIOTT ISLAND RD.,ELLIOTT ISLAND,,1626.0,290.0,...,,,,,,,,,,,,,,,,,,,,,,,,2.0,https://sdat.dat.maryland.gov/RealProperty/Pag...,MDPV2017_18,,http://sdat.dat.maryland.gov/RealProperty/Page...,https://maps.google.com/maps?t=h&q=38.30950854...,1512,133233,1984,2357,,,ELLIOTT ISLAND,RD,VIENNA,21869.0,2357 ELLIOTT ISLAND ROAD
1018001772,DORC,486501.3,71460.5,2.401997e+10,2.401997e+11,82,H,,2233 ELLIOTT ISLAND ROAD,,,,,,,,VIENNA,21869.0,MARTINEK HOWARD FRANKLIN,MARTINEK ROCHELLE M,MARTINEK HOWARD FRANKLIN,2233 ELLIOTT ISLAND RD,,VIENNA,MD,21869.0,9633.0,2233.0,,ELLIOTT ISLAND,RD,VIENNA,21869.0,,IMPV2 ACRES,S OF ELLIOTT ISLAND RD,E/S FISHING BAY,,1597.0,105.0,...,,,,,,,,,,,,,,,,,,,,,,,,2.0,https://sdat.dat.maryland.gov/RealProperty/Pag...,MDPV2017_18,,http://sdat.dat.maryland.gov/RealProperty/Page...,https://maps.google.com/maps?t=h&q=38.30620777...,2016,233500,1955,2233,,,ELLIOTT ISLAND,RD,VIENNA,21869.0,2233 ELLIOTT ISLAND ROAD


In [22]:
sdat_merge_df = df.merge(sdat,left_on='acctid',right_on='acctid',how='outer',indicator=True)
cleaned_registrations_df = sdat_merge_df.query('_merge == "both"')[list(df.columns)+['address']]
print('found:',len(cleaned_registrations_df))
print('problem records, in the rental sheet but not in sdat?:',len(sdat_merge_df.query('_merge == "left_only"')))
cleaned_registrations_df

found: 1491
problem records, in the rental sheet but not in sdat?: 59


Unnamed: 0,License Id,License Type Id,Business Name,Customer Id,Effective Date,Expiration Date,State Id,Status,Property Location,Dist/Account No,RENTAL,Unnamed: 11,Unnamed: 12,acctid,address
0,15-00183,RENTAL,DEREK MILLER,RR-01223,7/1/2019,6/30/2020,,Approved,406 KENT ST 4144,07-143230,1.0,True,True,1007143230,406 KENT ST
1,18-01619,RENTAL,BRYAN JEFFREY & JULIE,RR-96791,7/1/2019,6/30/2020,,Approved,700 CATTAIL COVE UNIT# 301,07-213905,1.0,True,True,1007213905,700 CATTAIL COVE
2,19-00001,RENTAL,OTTER LLC,RR-07388,7/1/2019,6/30/2020,,Approved,416 BOUNDARY AVE 128,07-113935,1.0,True,True,1007113935,416 BOUNDARY AVE
3,19-00002,RENTAL,DAGOSTINO COREY,RR-07981,7/1/2019,6/30/2020,,Approved,704 CHURCH ST 374,07-148038,2.0,True,True,1007148038,704 CHURCH ST
4,19-00003,RENTAL,MILLER DURELL,RR-03640,7/1/2019,6/30/2020,,Approved,623 DOUGLAS ST 471,07-164289,1.0,True,True,1007164289,623 DOUGLAS ST
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1545,19-01523,RENTAL,KLEYMAN ALEKANDAR,RR-09679,7/1/2019,6/30/2020,,Approved,900 MARSHY COVE UNIT# 302,07-214987,1.0,True,True,1007214987,900 MARSHY COVE
1546,19-01524,RENTAL,BRADLEY THOMAS,RR-06434,7/1/2019,6/30/2020,,Approved,1000 RACE ST,07-165730,1.0,True,True,1007165730,1000 RACE ST
1547,19-01543,RENTAL,ADAMS JOEL,RR-04294,7/1/2019,6/30/2020,,Approved,811 PINE ST 1528,07-152078,1.0,True,True,1007152078,811 PINE ST
1548,19-01546,RENTAL,EHISAM FREDERICK,RR-09678,7/1/2019,6/30/2020,,Approved,900 MARSHY COVE UNIT# 312,07-215096,1.0,True,True,1007215096,900 MARSHY COVE


### Fix up any problem records. 
The next few cells are ways to check for errors.  
The first method is to take the prior work, and try to match fields picked up from this process last time.

In [23]:
# start with the left join of the prior result, that's the records that didn't match
problems_df = sdat_merge_df.query('_merge == "left_only"')

# grab the acctid from a prior year where you can, and try to merge with sdat using that key for some of the bad rows
history_df = pd.read_csv('drive/My Drive/pita 2021/cambridge-combined-old-new-rental-lists-17-18.csv').rename(columns={'ACCTID':'acctid'})
history_df = history_df[history_df['acctid'].notna()]
history_df.acctid = history_df.acctid.apply(lambda x: str(x).strip())
fixups_df = problems_df.drop(columns=['_merge']).merge(history_df,on='acctid',how='outer',indicator=True).drop_duplicates()

print("these can be fixed leverging prior results:", len(fixups_df[(fixups_df['_merge'] == "both")]))
cleaned_registrations_df = cleaned_registrations_df.append(fixups_df[(fixups_df['_merge'] == "both")][list(df.columns)+['address']])
print(len(cleaned_registrations_df),"of",len(df),"rows cleaned")
print("these still need more work:",len(fixups_df[(fixups_df['_merge'] == "left_only")] ))

these can be fixed leverging prior results: 9
1500 of 1550 rows cleaned
these still need more work: 51


#### One thing to do is to try a join on address...  
but first you have to clean them up a bit

In [24]:
def cleanup_address(a):
  b = a.strip().replace('.','').replace('AVENUE','AVE')
  pieces = b.split()
  if pieces[-1][0].isnumeric():
    return " ".join(pieces[0:-1]).replace('.','')
  else:
    return b

corrected_address_df = fixups_df[(fixups_df['_merge'] == "left_only")].drop(columns='_merge')
corrected_address_df['Property Location'] = corrected_address_df.apply(lambda x: cleanup_address( x['Property Location'] ), axis=1)

In [41]:
# these are found
corrected_address_join = corrected_address_df.drop(columns=['acctid','address']).merge(sdat.reset_index()[['acctid','address']],
                              left_on='Property Location',right_on='address',
                              how='outer',indicator=True)
found_by_join_on_address = corrected_address_join.query('_merge == "both"')[df.columns.to_list()+['address']]

cleaned_registrations_df = cleaned_registrations_df.append(found_by_join_on_address[list(df.columns)+['address']])
print(len(cleaned_registrations_df),"of",len(df),"rows cleaned")
print("these still need more work:",len(corrected_address_join.query('_merge == "left_only"')[df.columns.to_list()+['address']]))

1568 of 1550 rows cleaned
these still need more work: 17


In [None]:
# the easiest thing to do may be to fix thees in the original file

corrected_address_df[['Property Location']].merge(sdat[['address','acctid']],left_on='Property Location',right_on='address',how='outer',indicator=True).query('_merge == "left_only"')

Unnamed: 0,Property Location,address,acctid,_merge
4,701 MOORES AVE,,,left_only
13,700 CATTAIL COVE UNIT 110,,,left_only
17,712 LINCOLN TERRACE,,,left_only
20,303 BAYLY RD,,,left_only
22,1403 STONE BOUNDARY RD,,,left_only
25,5242 GALLIUM CT,,,left_only
32,203 ROBBINS ST APT A,,,left_only
33,105 RAMBLER RD,,,left_only
34,707 PEACHBLOSSOM AVE DOWN,,,left_only
36,116 BLACK DUCK DR,,,left_only


## WRITE OUT THE CLEANED RENTAL BILLING  
Any remaining above will need to be added by hand?

In [None]:
cleaned_registrations_df.reset_index(drop=True).to_csv('/content/drive/My Drive/pita 2021/cleaned_rental_billing-2020-test.csv')

## ADD Fixups  
This section adds back things accumulated in a separate csv file as fixups.

As more items in the list are resolved, the fixup list will shrink at each update.  The method here, of merging historical data with the cleanded spresdsheet will resolve most issues in the future.

In [51]:
# TODO

# # rentals = df.merge(license_df,on='acctid',how='outer',indicator=True)
# # rentals.query('_merge == "right_only"')

# ## NOTE, THIS DROPS THE WOODS ROAD BUILDINGS !!!!
# from collections import defaultdict 

# licensed_accts = set(license_df.acctid.array)
# rentals = df.merge(license_df,on='acctid',how='inner')#,indicator=True)
# rental_accts = set(rentals.acctid.array)
# split_accts = licensed_accts.difference(rental_accts)
# split_accts.add('1007127049-1')
# split_accts = {(acct.split("-")[0],acct.split("-")[1]) for acct in split_accts }
# acct_splits = defaultdict(list)
# for k,v in split_accts:
#   acct_splits[k].append(v)

# # make a df with the expanded reccords in it
# def add_extras(acct_splits,split=False):
#   df_extras = pd.DataFrame(columns=df.columns)
#   for acct in acct_splits.keys():
#     row = df.query('acctid == @acct').copy()
#     if split:
#       for variant in acct_splits[acct]:
#         row.acctid = acct+"-"+str(variant)
#     df_extras = df_extras.append(row)
#   return df_extras

# rentals = rentals.append(add_extras(acct_splits).merge(license_df,on='acctid',how='inner'))