# FindYourLandlord Data Cleaning
### This project aims to:
* provide a visual representation for who or what entity owns the most property, what other properties the owner owns, and where
* provide tenants in North Jersey with pertinent information about who or what owns their building or apartment
### Beginning with Jersey City, I'll eventually expand to other cities throughout North Jersey: Paterson, Newark, Hackensack…
### This notebook is the data portion of the project, where I will test out methods for cleaning and extracting the relevant data. The data will eventually be picked up by React and mapboxGL.
### The csv's were retrieved from Monmouth County's Tax Assessor website:
### https://tax1.co.monmouth.nj.us/cgi-bin/prc6.cgi?menu=index&ms_user=monm&passwd=data&district=1301&mode=11

In [1]:
import pandas as pd
import os
import numpy as np
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter

In [2]:
# for returning lat long coordinates from addresses
geolocator = Nominatim(user_agent="myGeolocator", timeout=2)

In [3]:
hackensack = pd.read_csv('0223monm205221.csv', dtype={14: str, 20: str, 24: str})

jersey_city = pd.read_csv('0906monm220028.csv', dtype={20: str,
                                                       22: str,
                                                       23: str,
                                                       25: str,
                                                       42: str,
                                                       43: str,
                                                       47: str,
                                                       87: str,
                                                       89: str})

jersey_city.rename(columns={'Property Location': 'property_location', 
                            "Owner's Name": 'owners_name', 
                            "Owner's Mailing Address": 'owners_mailing_address', 
                            "City/State/Zip": "city_state_zip"}, inplace=True)

hackensack.rename(columns={'Property Location': 'property_location', 
                            "Owner's Name": 'owners_name', 
                            "Owner's Mailing Address": 'owners_mailing_address',
                            "City/State/Zip": "city_state_zip"}, inplace=True)

# title() case is necessary in order to return coordinates with Nominatim 
# we also can't assume the same city has the same zipcode throughout
# the zipcode must be extracted from the city/state/zip column
jersey_city['property_full_address'] = jersey_city.property_location.str.title() + " " + jersey_city.city_state_zip.str.title()

In [4]:
jersey_city.columns

Index(['Municipality', 'Block', 'Lot', 'Qual', 'property_location',
       'Property Class', 'owners_name', 'owners_mailing_address',
       'city_state_zip', 'Sq. Ft.', 'Yr. Built', 'Building Class',
       'Prior Block', 'Prior Lot', 'Prior Qual', 'Updated', 'Zone', 'Account',
       'Mortgage Account', 'Bank Code', 'Sp Tax Cd', 'Sp Tax Cd.1',
       'Sp Tax Cd.2', 'Sp Tax Cd.3', 'Map Page', 'Additional Lots',
       'Land Desc', 'Building Desc', 'Class 4 Code', 'Acreage', 'EPL Own',
       'EPL Use', 'EPL Desc', 'EPL Statute', 'EPL Init', 'EPL Further',
       'EPL Facility Name', 'Taxes 1', 'Taxes 2', 'Taxes 3', 'Taxes 4',
       'Sale Date', 'Deed Book', 'Deed Page', 'Sale Price', 'NU Code', 'Ratio',
       'Type/Use', 'Year', 'Owner', 'Street', 'City/State/Zip.1',
       'Land Assmnt', 'Building Assmnt', 'Exempt', 'Total Assmnt', 'Assessed',
       'Year.1', 'Owner.1', 'Street.1', 'City/State/Zip.2', 'Land Assmnt.1',
       'Building Assmnt.1', 'Exempt.1', 'Total Assmnt.1', 'Asse

#### The following five cells are columns which can be deleted from the dataframe...
#### ...I needed to figure out their respective indices

In [5]:
jersey_city.columns[5]

'Property Class'

In [6]:
jersey_city.columns[9]

'Sq. Ft.'

In [7]:
jersey_city.columns[11:83]

Index(['Building Class', 'Prior Block', 'Prior Lot', 'Prior Qual', 'Updated',
       'Zone', 'Account', 'Mortgage Account', 'Bank Code', 'Sp Tax Cd',
       'Sp Tax Cd.1', 'Sp Tax Cd.2', 'Sp Tax Cd.3', 'Map Page',
       'Additional Lots', 'Land Desc', 'Building Desc', 'Class 4 Code',
       'Acreage', 'EPL Own', 'EPL Use', 'EPL Desc', 'EPL Statute', 'EPL Init',
       'EPL Further', 'EPL Facility Name', 'Taxes 1', 'Taxes 2', 'Taxes 3',
       'Taxes 4', 'Sale Date', 'Deed Book', 'Deed Page', 'Sale Price',
       'NU Code', 'Ratio', 'Type/Use', 'Year', 'Owner', 'Street',
       'City/State/Zip.1', 'Land Assmnt', 'Building Assmnt', 'Exempt',
       'Total Assmnt', 'Assessed', 'Year.1', 'Owner.1', 'Street.1',
       'City/State/Zip.2', 'Land Assmnt.1', 'Building Assmnt.1', 'Exempt.1',
       'Total Assmnt.1', 'Assessed.1', 'Year.2', 'Owner.2', 'Street.2',
       'City/State/Zip.3', 'Land Assmnt.2', 'Building Assmnt.2', 'Exempt.2',
       'Total Assmnt.2', 'Assessed.2', 'Year.3', 'Owner.3

In [8]:
jersey_city.columns[83]

'Assessed.3'

In [9]:
jersey_city.columns[86:91]

Index(['Neigh', 'VCS', 'StyDesc', 'Style', 'Unnamed: 90'], dtype='object')

## Does the owner live in the building? 
## Does the 'property location' match the 'owner's mailing address'?

In [10]:
# this method is inefficient, given we are searching 60k+ rows
jersey_city.property_location == jersey_city.owners_mailing_address

0        False
1        False
2        False
3        False
4        False
         ...  
63829    False
63830    False
63831    False
63832    False
63833    False
Length: 63834, dtype: bool

In [11]:
# numpy has better methods for quickly checking
# a quick look by wrapping the same equivalence in a numpy function to see if our intuition is correct
np.where(jersey_city.property_location == jersey_city.owners_mailing_address)

(array([   27,    29,    34, ..., 63776, 63796, 63797]),)

### property location matching the owners mailing address…can we safely assume they reside at the address?
### as many as 16544 owners reside in the building they own: 
### however they might not all be residential properties

In [12]:
jersey_city.loc[jersey_city.property_location == jersey_city.owners_mailing_address]

Unnamed: 0,Municipality,Block,Lot,Qual,property_location,Property Class,owners_name,owners_mailing_address,city_state_zip,Sq. Ft.,...,Total Assmnt.3,Assessed.3,Latitude,Longitude,Neigh,VCS,StyDesc,Style,Unnamed: 90,property_full_address
27,906,201.0,7.0,,817 TONNELE AVE.,1,"NELMIR REALTY, L.L.C.",817 TONNELE AVE.,"JERSEY CITY, NJ 07307",0,...,111900,111900.0,0,0,,,,,,"817 Tonnele Ave. Jersey City, Nj 07307"
29,906,201.0,9.0,,823 TONNELE AVE.,4A,GERRY GAS SUPPLY INC.C/O LILH,823 TONNELE AVE.,"JERSEY CITY, N.J. 07307",0,...,1699500,1699500.0,0,0,,,,,,"823 Tonnele Ave. Jersey City, N.J. 07307"
34,906,301.0,1.0,,677 LIBERTY AVE.,2,"PEDDI, PRADEEP",677 LIBERTY AVE.,"JERSEY CITY, N.J. 07307",1600,...,416500,416500.0,0,0,,,,,,"677 Liberty Ave. Jersey City, N.J. 07307"
35,906,301.0,2.0,,675 LIBERTY AVE.,2,"PAREJA, HENRY A. & MIRYAM C.",675 LIBERTY AVE.,"JERSEY CITY, N.J. 07307",1616,...,363300,363300.0,0,0,,,,,,"675 Liberty Ave. Jersey City, N.J. 07307"
36,906,301.0,3.0,,673 LIBERTY AVE.,2,"HIRPARA, PRAVIN",673 LIBERTY AVE.,"JERSEY CITY, NJ 07307",1600,...,364900,364900.0,0,0,,,,,,"673 Liberty Ave. Jersey City, Nj 07307"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
63740,906,30304.0,14.0,,100 GARFIELD AVE.,2,"SERVIDA, AUREA",100 GARFIELD AVE.,"JERSEY CITY, NJ 07305",1882,...,271000,271000.0,0,0,,,,,,"100 Garfield Ave. Jersey City, Nj 07305"
63742,906,30304.0,16.0,,104 GARFIELD AVE.,2,"RODRIGUEZ, BENJAMIN",104 GARFIELD AVE.,"JERSEY CITY, N.J. 07305",2412,...,268200,268200.0,0,0,,,,,,"104 Garfield Ave. Jersey City, N.J. 07305"
63776,906,30306.0,8.0,,100 SUMMIT PLACE,4B,"SUMMIT/GREENWICH URBAN RENEWAL, LLC",100 SUMMIT PLACE,"JERSEY CITY, NJ 07305",0,...,13139700,13139700.0,0,0,,,,,,"100 Summit Place Jersey City, Nj 07305"
63796,906,30307.0,10.0,,109 PORT JERSEY BLVD.,4B,DOMISA LLC,109 PORT JERSEY BLVD.,"JERSEY CITY, N.J. 07305",0,...,2863800,2863800.0,0,0,,,,,,"109 Port Jersey Blvd. Jersey City, N.J. 07305"


## Does the property owner own more than one property? Which property owner owns the most properties?
### These numbers can be deceiving; many properties or units can exist in a single address.
### The same address can be included twice. 

In [13]:
# As we'll see below, many of the addresses and owners are spelled incorrectly 
len(jersey_city['owners_name'].unique())

49247

In [14]:
len(jersey_city['property_location'].unique())

37842

### multiple units can appear in a single property, adding to the value counts below

In [15]:
jersey_city.owners_name.value_counts().head(50)

COA 99 HUDSON,LLC                      626
CITY OF JERSEY CITY                    325
75 PARK LANE, LLC                      215
95 VAN DAM URBAN RENEWAL, L.L.C.       129
CONSOLIDATED RAIL                      124
160 FIRST STREET URBAN RENEWAL, LLC    104
JERSEY CITY REDEVELOPMENT AGENCY        92
NEW JERSEY TRANSIT                      86
HOUSING AUTHORITY OF JERSEY CITY        74
LIBERTY HARBOR NORTH URBAN R., LLC      71
STATE OF N J DEPT OF ENV PROTECTION     57
BOARD OF EDUCATION OF J C               54
NJ DEPARTMENT OF TRANSPORTATION         51
300 COMMUNIPAW, INC.                    48
VILLAGE CONDOS III, LLC.                46
BERGEN AVE.ASSOC.C/O OSTROW             46
PUBLIC SERVICE ELECTRIC & GAS CO.       44
NEWPORT CENTRE                          40
N.J. DEPT. OF TRANSPORTATION            39
BLOCK 284 NORTH URBAN RENEWAL, LLC      33
AUDUBON AV.RLTY. %JASCO MANAGEMENT      32
CCA NEWPORT, INC.                       32
ERIE 10TH URBAN RENEWAL, L.L.C.         28
LIBERTY HAR

## How many LLCs have "UR" or "Urban Renewal" in the owner's name?

In [16]:
# More examples of how unclean the data is, even on a state database
jersey_city[jersey_city.owners_name.str.contains('URBAN RENEWAL')]['owners_name'].unique()

array(['HEIGHTS URBAN RENEWAL SENIOR H.',
       'RATAN JERSEY CITY URBAN RENEWAL,LLC',
       'BRASS WORKS URBAN RENEWAL CO., LLC',
       'HUDSON PALISADES URBAN RENEWAL,LLC',
       '364 NINTH STREET URBAN RENEWAL, LLC',
       'NINTH STREET TWO URBAN RENEWAL,LLC',
       'TOWER EAST URBAN RENEWAL COMPANY',
       'JAMES MONROE URBAN RENEWAL CO.', '25 RIVER DR.SO.URBAN RENEWAL',
       'SENATE PLACE URBAN RENEWAL, LLC',
       'H.P. LINCOLN URBAN RENEWAL COMPANY',
       'ERIE 10TH URBAN RENEWAL, L.L.C.', '9TH STREET URBAN RENEWAL,LLC',
       'VAN WAGENEN II URBAN RENEWAL, LLC.',
       '17-19 DIVISION ST URBAN RENEWAL,LLC',
       '380 NEWARK REALTY URBAN RENEWAL LLC',
       'BLOCK 284 NORTH URBAN RENEWAL, LLC',
       '500 MANILA AVE.URBAN RENEWAL, LLC',
       '500 MANILA AVE.URBAN RENEWAL LLC',
       'KRE HAMILTON URBAN RENEWAL LLC',
       'VAN WAGENEN I URBAN RENEWAL, LLC.',
       '160 FIRST STREET URBAN RENEWAL, LLC',
       'PS FIRST HUDSON URBAN RENEWAL LLC',
       '14

In [17]:
jersey_city[jersey_city.owners_name.str.contains(' UR ')]['owners_name'].unique()

array(['SUMMIT PLAZA ASSOCIATES, UR LTD PTN',
       'PADUA COURT UR C/O INSPIRED VISION',
       'PADUA COURT UR C/O INSPRED VISION',
       'CAL-HARBOR SO. PIER UR ASC %M CALI'], dtype=object)

In [18]:
jersey_city[jersey_city.owners_name == 'COA 99 HUDSON,LLC']['property_location'].unique()

array(['99 HUDSON ST.'], dtype=object)

In [19]:
def return_jc_properties(owner=str): 
    return jersey_city[jersey_city.owners_name == owner.rstrip()]['property_location'].unique()

In [21]:
len(return_jc_properties('HOUSING AUTHORITY OF JERSEY CITY'))

55

In [22]:
%%time
# a smaller dataset to view inconsistencies in the data
# are there duplicate addresses in this subset of data?
# are the addresses accurate? 

# a copy() is used to avoid writing on the original dataframe
js_ha = jersey_city[jersey_city.owners_name == 'HOUSING AUTHORITY OF JERSEY CITY'].copy()
js_ha.reset_index(inplace=True, drop=True)
js_ha['gcode'] = js_ha.property_full_address.apply(geolocator.geocode)

CPU times: user 152 ms, sys: 23 ms, total: 175 ms
Wall time: 37.7 s


In [23]:
%%time
city_of_js = jersey_city[jersey_city.owners_name == 'CITY OF JERSEY CITY'].copy()
city_of_js.reset_index(inplace=True, drop=True)
city_of_js['gcode'] = city_of_js.property_full_address.apply(geolocator.geocode)

CPU times: user 572 ms, sys: 86 ms, total: 658 ms
Wall time: 2min 42s


In [24]:
%%time
urban_renewal = jersey_city[jersey_city.owners_name.str.contains('URBAN RENEWAL')].copy()
urban_renewal.reset_index(inplace=True, drop=True)
urban_renewal['gcode'] = urban_renewal.property_full_address.apply(geolocator.geocode)

CPU times: user 886 ms, sys: 133 ms, total: 1.02 s
Wall time: 3min 54s


In [26]:
city_of_js

Unnamed: 0,Municipality,Block,Lot,Qual,property_location,Property Class,owners_name,owners_mailing_address,city_state_zip,Sq. Ft.,...,Assessed.3,Latitude,Longitude,Neigh,VCS,StyDesc,Style,Unnamed: 90,property_full_address,gcode
0,906,101.0,7.0,HM,SECAUCUS RD.,15C,CITY OF JERSEY CITY,280 GROVE ST,JERSEY CITY N J 07302,0,...,239400.0,0,0,,,,,,Secaucus Rd. Jersey City N J 07302,"(Secaucus Road, Jersey City, Hudson County, Ne..."
1,906,201.0,13.0,,929 SECAUCUS RD,15C,CITY OF JERSEY CITY,280 GROVE ST,JERSEY CITY N J 07302,0,...,403400.0,0,0,,,,,,929 Secaucus Rd Jersey City N J 07302,"(Secaucus Road, Jersey City, Hudson County, Ne..."
2,906,301.0,23.0,,386 TERRACE AVE.,15C,CITY OF JERSEY CITY,280 GROVE ST,JERSEY CITY N J 07302,0,...,567200.0,0,0,,,,,,386 Terrace Ave. Jersey City N J 07302,"(386, Terrace Avenue, Jersey City, Hudson Coun..."
3,906,303.0,1.0,,191 NELSON AVE.,15C,CITY OF JERSEY CITY,280 GROVE ST.,"JERSEY CITY, NJ 07302",0,...,323500.0,0,0,,,,,,"191 Nelson Ave. Jersey City, Nj 07302","(191, Nelson Avenue, Jersey City, Hudson Count..."
4,906,1201.0,47.0,,194 BLEECKER ST.,15C,CITY OF JERSEY CITY,280 GROVE ST,"JERSEY CITY, N.J. 07307",0,...,289400.0,0,0,,,,,,"194 Bleecker St. Jersey City, N.J. 07307","(194, Bleecker Street, Jersey City, Hudson Cou..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
320,906,30305.0,25.0,,35 LINDEN AVE. EAST,15C,CITY OF JERSEY CITY,280 GROVE ST.,"JERSEY CITY, NJ 07302",0,...,1851000.0,0,0,,,,,,"35 Linden Ave. East Jersey City, Nj 07302","(35, Linden Avenue East, Jersey City, Hudson C..."
321,906,30305.0,26.0,,9 EAST LINDEN AVE.,15C,CITY OF JERSEY CITY,280 GROVE STREET,"JERSEY CITY, NJ 07302",0,...,638300.0,0,0,,,,,,"9 East Linden Ave. Jersey City, Nj 07302","(9, Linden Avenue, Greenville, Jersey City, Hu..."
322,906,30305.0,27.0,,CENTRAL R.R.,15C,CITY OF JERSEY CITY,280 GROVE ST.,"JERSEY CITY, NJ 07305",0,...,45500.0,0,0,,,,,,"Central R.R. Jersey City, Nj 07305",
323,906,30305.0,28.0,,BROWN PL TO GATES,15C,CITY OF JERSEY CITY,280 GROVE ST,"JERSEY CITY, N J 07305",0,...,143000.0,0,0,,,,,,"Brown Pl To Gates Jersey City, N J 07305",


In [27]:
js_ha

Unnamed: 0,Municipality,Block,Lot,Qual,property_location,Property Class,owners_name,owners_mailing_address,city_state_zip,Sq. Ft.,...,Assessed.3,Latitude,Longitude,Neigh,VCS,StyDesc,Style,Unnamed: 90,property_full_address,gcode
0,906,7101.0,1.00,,235 SIXTEENTH ST.,15C,HOUSING AUTHORITY OF JERSEY CITY,514 NEWARK AVENUE,"JERSEY CITY, NJ 07306",0,...,33056100.0,0,0,,,,,,"235 Sixteenth St. Jersey City, Nj 07306",
1,906,8301.0,1.00,,514 NEWARK AVE.,15C,HOUSING AUTHORITY OF JERSEY CITY,514 NEWARK AVENUE,"JERSEY CITY, NJ 07306",0,...,20199600.0,0,0,,,,,,"514 Newark Ave. Jersey City, Nj 07306","(514, Newark Avenue, Jersey City, Hudson Count..."
2,906,11304.0,15.00,,88 ERIE ST.,15C,HOUSING AUTHORITY OF JERSEY CITY,400 U.S. HIGHWAY #1,"JERSEY CITY, NJ 07306",0,...,8320900.0,0,0,,,,,,"88 Erie St. Jersey City, Nj 07306","(88, Erie Street, Newport, Jersey City, Hudson..."
3,906,11703.0,2.00,,57 DALES AVE.,15A,HOUSING AUTHORITY OF JERSEY CITY,400 ROUTE 1,"JERSEY CITY, NJ 07306",0,...,45160600.0,0,0,,,,,,"57 Dales Ave. Jersey City, Nj 07306","(57, Dales Avenue, Marion, Jersey City, Hudson..."
4,906,13602.0,1.02,,547 MONTGOMERY ST.,1,HOUSING AUTHORITY OF JERSEY CITY,P.O. BOX 90708,"CAMDEN, NJ 08101",0,...,4235000.0,0,0,,,,,,"547 Montgomery St. Camden, Nj 08101",
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
69,906,28801.0,3.00,,71 CATOR AVE.,15C,HOUSING AUTHORITY OF JERSEY CITY,400 U.S. HIGHWAY #1,"JERSEY CITY, NJ 07306",0,...,475600.0,0,0,,,,,,"71 Cator Ave. Jersey City, Nj 07306","(71, Cator Avenue, Greenville, Jersey City, Hu..."
70,906,28801.0,21.00,,72 DANFORTH AVE.,15C,HOUSING AUTHORITY OF JERSEY CITY,400 U.S. HIGHWAY #1,"JERSEY CITY, NJ 07306",0,...,2499700.0,0,0,,,,,,"72 Danforth Ave. Jersey City, Nj 07306","(72, Danforth Avenue, Greenville, Jersey City,..."
71,906,28801.0,22.00,,82 DANFORTH AVE.,15C,HOUSING AUTHORITY OF JERSEY CITY,400 U.S. HIGHWAY #1,"JERSEY CITY, NJ 07306",0,...,4214200.0,0,0,,,,,,"82 Danforth Ave. Jersey City, Nj 07306","(82, Danforth Avenue, Greenville, Jersey City,..."
72,906,29902.0,45.00,,15 OLD BERGEN RD.,15C,HOUSING AUTHORITY OF JERSEY CITY,400 U.S. HIGHWAY #1,"JERSEY CITY, NJ 07306",0,...,255800.0,0,0,,,,,,"15 Old Bergen Rd. Jersey City, Nj 07306","(15, Old Bergen Road, Jersey City, Hudson Coun..."


In [28]:
# for now, we'll drop the None values in the gcode column to make our lives easier 
# this does not however solve the issue of duplicate values in that column
js_ha.dropna(axis=0, subset=['gcode'], inplace=True)
city_of_js.dropna(axis=0, subset=['gcode'], inplace=True)
urban_renewal.dropna(axis=0, subset=['gcode'], inplace=True)

In [29]:
# since the dataframe already contains Latitude and Longitude columns, we can write to them directly
js_ha['Latitude'] = [g.latitude for g in js_ha.gcode]
js_ha['Longitude'] = [g.longitude for g in js_ha.gcode]

city_of_js['Latitude'] = [g.latitude for g in city_of_js.gcode]
city_of_js['Longitude'] = [g.longitude for g in city_of_js.gcode]

urban_renewal['Latitude'] = [g.latitude for g in urban_renewal.gcode]
urban_renewal['Longitude'] = [g.longitude for g in urban_renewal.gcode]

In [30]:
# a less tedious way to execute this might be drop ranges of columns instead of doing it all manually
js_ha.drop(columns=['Municipality', 
                    'Block', 
                    'Lot', 
                    'Qual', 
                    'Property Class', 
                    'Sq. Ft.', 
                    'Building Class', 
                    'Prior Block', 
                    'Prior Lot', 
                    'Prior Qual', 
                    'Updated', 
                    'Zone', 
                    'Account', 
                    'Mortgage Account', 
                    'Bank Code', 
                    'Sp Tax Cd', 
                    'Sp Tax Cd.1', 
                    'Sp Tax Cd.2', 
                    'Sp Tax Cd.3', 
                    'Map Page',
                    'Additional Lots', 
                    'Land Desc', 
                    'Building Desc', 
                    'Class 4 Code', 
                    'Acreage', 
                    'EPL Own', 
                    'EPL Use', 
                    'EPL Desc', 
                    'EPL Statute', 
                    'EPL Init', 
                    'EPL Further', 
                    'EPL Facility Name', 
                    'Taxes 1', 
                    'Taxes 2', 
                    'Taxes 3', 
                    'Taxes 4', 
                    'Sale Date', 
                    'Deed Book', 
                    'Deed Page', 
                    'Sale Price', 
                    'NU Code', 
                    'Ratio', 
                    'Type/Use', 
                    'Year', 
                    'Owner', 
                    'Street', 
                    'City/State/Zip.1',
                    'Land Assmnt', 
                    'Building Assmnt', 
                    'Exempt', 
                    'Total Assmnt', 
                    'Assessed',
                    'Year.1', 
                    'Owner.1', 
                    'Street.1', 
                    'City/State/Zip.2', 
                    'Land Assmnt.1',
                    'Building Assmnt.1', 
                    'Exempt.1', 
                    'Total Assmnt.1', 
                    'Assessed.1',
                    'Year.2', 
                    'Owner.2', 
                    'Street.2', 
                    'City/State/Zip.3', 
                    'Land Assmnt.2',
                    'Building Assmnt.2', 
                    'Exempt.2', 
                    'Total Assmnt.2', 
                    'Assessed.2',
                    'Year.3', 
                    'Owner.3', 
                    'Street.3', 
                    'City/State/Zip.4', 
                    'Land Assmnt.3',
                    'Building Assmnt.3', 
                    'Exempt.3', 
                    'Total Assmnt.3', 
                    'Assessed.3', 
                    'Neigh', 
                    'VCS', 
                    'StyDesc', 
                    'Style',
                    'Unnamed: 90'], inplace=True) 

In [31]:
# a less tedious way to execute this might be drop ranges of columns instead of doing it all manually
city_of_js.drop(columns=['Municipality', 
                    'Block', 
                    'Lot', 
                    'Qual', 
                    'Property Class', 
                    'Sq. Ft.', 
                    'Building Class', 
                    'Prior Block', 
                    'Prior Lot', 
                    'Prior Qual', 
                    'Updated', 
                    'Zone', 
                    'Account', 
                    'Mortgage Account', 
                    'Bank Code', 
                    'Sp Tax Cd', 
                    'Sp Tax Cd.1', 
                    'Sp Tax Cd.2', 
                    'Sp Tax Cd.3', 
                    'Map Page',
                    'Additional Lots', 
                    'Land Desc', 
                    'Building Desc', 
                    'Class 4 Code', 
                    'Acreage', 
                    'EPL Own', 
                    'EPL Use', 
                    'EPL Desc', 
                    'EPL Statute', 
                    'EPL Init', 
                    'EPL Further', 
                    'EPL Facility Name', 
                    'Taxes 1', 
                    'Taxes 2', 
                    'Taxes 3', 
                    'Taxes 4', 
                    'Sale Date', 
                    'Deed Book', 
                    'Deed Page', 
                    'Sale Price', 
                    'NU Code', 
                    'Ratio', 
                    'Type/Use', 
                    'Year', 
                    'Owner', 
                    'Street', 
                    'City/State/Zip.1',
                    'Land Assmnt', 
                    'Building Assmnt', 
                    'Exempt', 
                    'Total Assmnt', 
                    'Assessed',
                    'Year.1', 
                    'Owner.1', 
                    'Street.1', 
                    'City/State/Zip.2', 
                    'Land Assmnt.1',
                    'Building Assmnt.1', 
                    'Exempt.1', 
                    'Total Assmnt.1', 
                    'Assessed.1',
                    'Year.2', 
                    'Owner.2', 
                    'Street.2', 
                    'City/State/Zip.3', 
                    'Land Assmnt.2',
                    'Building Assmnt.2', 
                    'Exempt.2', 
                    'Total Assmnt.2', 
                    'Assessed.2',
                    'Year.3', 
                    'Owner.3', 
                    'Street.3', 
                    'City/State/Zip.4', 
                    'Land Assmnt.3',
                    'Building Assmnt.3', 
                    'Exempt.3', 
                    'Total Assmnt.3', 
                    'Assessed.3', 
                    'Neigh', 
                    'VCS', 
                    'StyDesc', 
                    'Style',
                    'Unnamed: 90'], inplace=True) 

In [32]:
urban_renewal.drop(columns=['Municipality', 
                    'Block', 
                    'Lot', 
                    'Qual', 
                    'Property Class', 
                    'Sq. Ft.', 
                    'Building Class', 
                    'Prior Block', 
                    'Prior Lot', 
                    'Prior Qual', 
                    'Updated', 
                    'Zone', 
                    'Account', 
                    'Mortgage Account', 
                    'Bank Code', 
                    'Sp Tax Cd', 
                    'Sp Tax Cd.1', 
                    'Sp Tax Cd.2', 
                    'Sp Tax Cd.3', 
                    'Map Page',
                    'Additional Lots', 
                    'Land Desc', 
                    'Building Desc', 
                    'Class 4 Code', 
                    'Acreage', 
                    'EPL Own', 
                    'EPL Use', 
                    'EPL Desc', 
                    'EPL Statute', 
                    'EPL Init', 
                    'EPL Further', 
                    'EPL Facility Name', 
                    'Taxes 1', 
                    'Taxes 2', 
                    'Taxes 3', 
                    'Taxes 4', 
                    'Sale Date', 
                    'Deed Book', 
                    'Deed Page', 
                    'Sale Price', 
                    'NU Code', 
                    'Ratio', 
                    'Type/Use', 
                    'Year', 
                    'Owner', 
                    'Street', 
                    'City/State/Zip.1',
                    'Land Assmnt', 
                    'Building Assmnt', 
                    'Exempt', 
                    'Total Assmnt', 
                    'Assessed',
                    'Year.1', 
                    'Owner.1', 
                    'Street.1', 
                    'City/State/Zip.2', 
                    'Land Assmnt.1',
                    'Building Assmnt.1', 
                    'Exempt.1', 
                    'Total Assmnt.1', 
                    'Assessed.1',
                    'Year.2', 
                    'Owner.2', 
                    'Street.2', 
                    'City/State/Zip.3', 
                    'Land Assmnt.2',
                    'Building Assmnt.2', 
                    'Exempt.2', 
                    'Total Assmnt.2', 
                    'Assessed.2',
                    'Year.3', 
                    'Owner.3', 
                    'Street.3', 
                    'City/State/Zip.4', 
                    'Land Assmnt.3',
                    'Building Assmnt.3', 
                    'Exempt.3', 
                    'Total Assmnt.3', 
                    'Assessed.3', 
                    'Neigh', 
                    'VCS', 
                    'StyDesc', 
                    'Style',
                    'Unnamed: 90'], inplace=True) 

In [33]:
city_of_js

Unnamed: 0,property_location,owners_name,owners_mailing_address,city_state_zip,Yr. Built,Latitude,Longitude,property_full_address,gcode
0,SECAUCUS RD.,CITY OF JERSEY CITY,280 GROVE ST,JERSEY CITY N J 07302,,40.759108,-74.049506,Secaucus Rd. Jersey City N J 07302,"(Secaucus Road, Jersey City, Hudson County, Ne..."
1,929 SECAUCUS RD,CITY OF JERSEY CITY,280 GROVE ST,JERSEY CITY N J 07302,,40.759108,-74.049506,929 Secaucus Rd Jersey City N J 07302,"(Secaucus Road, Jersey City, Hudson County, Ne..."
2,386 TERRACE AVE.,CITY OF JERSEY CITY,280 GROVE ST,JERSEY CITY N J 07302,,40.758846,-74.051694,386 Terrace Ave. Jersey City N J 07302,"(386, Terrace Avenue, Jersey City, Hudson Coun..."
3,191 NELSON AVE.,CITY OF JERSEY CITY,280 GROVE ST.,"JERSEY CITY, NJ 07302",,40.758234,-74.049186,"191 Nelson Ave. Jersey City, Nj 07302","(191, Nelson Avenue, Jersey City, Hudson Count..."
4,194 BLEECKER ST.,CITY OF JERSEY CITY,280 GROVE ST,"JERSEY CITY, N.J. 07307",,40.753281,-74.055391,"194 Bleecker St. Jersey City, N.J. 07307","(194, Bleecker Street, Jersey City, Hudson Cou..."
...,...,...,...,...,...,...,...,...,...
317,PRINCETON AVE,CITY OF JERSEY CITY,280 GROVE ST,JERSEY CITY N J 07302,,40.692029,-74.088734,Princeton Ave Jersey City N J 07302,"(Princeton Avenue, Jersey City, Hudson County,..."
319,15 E. LINDEN AVE.,CITY OF JERSEY CITY,280 GROVE STREET,"JERSEY CITY, NJ 07302",1959.0,40.692343,-74.089259,"15 E. Linden Ave. Jersey City, Nj 07302","(15, Linden Avenue, Greenville, Jersey City, H..."
320,35 LINDEN AVE. EAST,CITY OF JERSEY CITY,280 GROVE ST.,"JERSEY CITY, NJ 07302",,40.691042,-74.087948,"35 Linden Ave. East Jersey City, Nj 07302","(35, Linden Avenue East, Jersey City, Hudson C..."
321,9 EAST LINDEN AVE.,CITY OF JERSEY CITY,280 GROVE STREET,"JERSEY CITY, NJ 07302",1.0,40.692201,-74.089141,"9 East Linden Ave. Jersey City, Nj 07302","(9, Linden Avenue, Greenville, Jersey City, Hu..."


In [34]:
urban_renewal

Unnamed: 0,property_location,owners_name,owners_mailing_address,city_state_zip,Yr. Built,Latitude,Longitude,property_full_address,gcode
2,100 PATERSON PLANK RD.,"BRASS WORKS URBAN RENEWAL CO., LLC",300 COLES ST SUITE #2,"JERSEY CITY , NEW JERSEY 07310",2020.0,40.741164,-74.043599,"100 Paterson Plank Rd. Jersey City , New Jerse...","(The Cliffs, 100, Paterson Plank Road, Jersey ..."
4,255 BRUNSWICK ST.,"364 NINTH STREET URBAN RENEWAL, LLC",155 SECOND STREET,"JERSEY CITY, N J 07302",,40.729291,-74.050437,"255 Brunswick St. Jersey City, N J 07302","(True Dental Care for Kids & Teens, 255, Bruns..."
5,255 BRUNSWICK ST.,"364 NINTH STREET URBAN RENEWAL, LLC",155 SECOND STREET,"JERSEY CITY, N J 07302",,40.729291,-74.050437,"255 Brunswick St. Jersey City, N J 07302","(True Dental Care for Kids & Teens, 255, Bruns..."
6,372 NINTH ST.,"NINTH STREET TWO URBAN RENEWAL,LLC",155 SECOND STREET,"JERSEY CITY, N J 07302",2019.0,39.545981,-74.885141,"372 Ninth St. Jersey City, N J 07302","(Ninth Street, Buena Vista Township, Atlantic ..."
7,31 RIVER COURT,TOWER EAST URBAN RENEWAL COMPANY,40 WEST 57TH ST 23RD FL,"NEW YORK, NY 10019",1997.0,42.035914,-73.778315,"31 River Court New York, Ny 10019","(31, River Court, Town of Gallatin, Columbia C..."
...,...,...,...,...,...,...,...,...,...
439,455 OCEAN AVE.,GENESIS OCEAN URBAN RENEWAL ASSOC.,455 OCEAN AVE.,"JERSEY CITY , NEW JERSEY 07305",2018.0,40.702127,-74.082018,"455 Ocean Ave. Jersey City , New Jersey 07305","(455, Ocean Avenue, Greenville, Jersey City, H..."
452,143 CHAPEL AVE,"HUDSON MAIN URBAN RENEWAL,LLC",17 INDEPENDENCE WAY,"JERSEY CITY, NJ 07302",,40.694191,-74.084210,"143 Chapel Ave Jersey City, Nj 07302","(143, Chapel Avenue, Jersey City, Hudson Count..."
453,143 CHAPEL AVE,"HUDSON MAIN URBAN RENEWAL,LLC",17 INDEPENDENCE WAY,"JERSEY CITY, NJ 07302",2017.0,40.694191,-74.084210,"143 Chapel Ave Jersey City, Nj 07302","(143, Chapel Avenue, Jersey City, Hudson Count..."
454,100 SUMMIT PLACE,"SUMMIT/GREENWICH URBAN RENEWAL, LLC",100 SUMMIT PLACE,"JERSEY CITY, NJ 07305",2000.0,40.683216,-74.093701,"100 Summit Place Jersey City, Nj 07305","(100, Summit Place, Jersey City, Hudson County..."


In [35]:
test_data = js_ha.append([city_of_js, urban_renewal])
test_data.reset_index(inplace=True, drop=True)

In [42]:
test_data

Unnamed: 0,property_location,owners_name,owners_mailing_address,city_state_zip,Yr. Built,Latitude,Longitude,property_full_address,gcode
0,514 NEWARK AVE.,HOUSING AUTHORITY OF JERSEY CITY,514 NEWARK AVENUE,"JERSEY CITY, NJ 07306",,40.730082,-74.055283,"514 Newark Ave. Jersey City, Nj 07306","(514, Newark Avenue, Jersey City, Hudson Count..."
1,88 ERIE ST.,HOUSING AUTHORITY OF JERSEY CITY,400 U.S. HIGHWAY #1,"JERSEY CITY, NJ 07306",1870.0,40.724742,-74.043713,"88 Erie St. Jersey City, Nj 07306","(88, Erie Street, Newport, Jersey City, Hudson..."
2,57 DALES AVE.,HOUSING AUTHORITY OF JERSEY CITY,400 ROUTE 1,"JERSEY CITY, NJ 07306",1941.0,40.735895,-74.076336,"57 Dales Ave. Jersey City, Nj 07306","(57, Dales Avenue, Marion, Jersey City, Hudson..."
3,547 MONTGOMERY ST.,HOUSING AUTHORITY OF JERSEY CITY,400 U.S. HIGHWAY #1,"JERSEY CITY, NJ 07306",,40.722429,-74.060221,"547 Montgomery St. Jersey City, Nj 07306","(547, Montgomery Street, Bergen, Jersey City, ..."
4,20 FLORENCE ST.,HOUSING AUTHORITY OF JERSEY CITY,400 U.S. HIGHWAY #1,"JERSEY CITY, NJ 07306",,40.723183,-74.060375,"20 Florence St. Jersey City, Nj 07306","(Florence Street, Bergen, Jersey City, Hudson ..."
...,...,...,...,...,...,...,...,...,...
631,455 OCEAN AVE.,GENESIS OCEAN URBAN RENEWAL ASSOC.,455 OCEAN AVE.,"JERSEY CITY , NEW JERSEY 07305",2018.0,40.702127,-74.082018,"455 Ocean Ave. Jersey City , New Jersey 07305","(455, Ocean Avenue, Greenville, Jersey City, H..."
632,143 CHAPEL AVE,"HUDSON MAIN URBAN RENEWAL,LLC",17 INDEPENDENCE WAY,"JERSEY CITY, NJ 07302",,40.694191,-74.084210,"143 Chapel Ave Jersey City, Nj 07302","(143, Chapel Avenue, Jersey City, Hudson Count..."
633,143 CHAPEL AVE,"HUDSON MAIN URBAN RENEWAL,LLC",17 INDEPENDENCE WAY,"JERSEY CITY, NJ 07302",2017.0,40.694191,-74.084210,"143 Chapel Ave Jersey City, Nj 07302","(143, Chapel Avenue, Jersey City, Hudson Count..."
634,100 SUMMIT PLACE,"SUMMIT/GREENWICH URBAN RENEWAL, LLC",100 SUMMIT PLACE,"JERSEY CITY, NJ 07305",2000.0,40.683216,-74.093701,"100 Summit Place Jersey City, Nj 07305","(100, Summit Place, Jersey City, Hudson County..."


In [57]:
# this groupby result will be used for the associatedProperties feature
# ie, the number of properties associated with a selected owner
# it will become the associatedProperties column

# cell 15 performed a similar function, however, this is more accurate
test_data.groupby('owners_name')['property_location'].count()

owners_name
140 BAY STREET URBAN RENEWAL, LLC        1
160 FIRST STREET URBAN RENEWAL, LLC    104
170 LAFAYETTE URBAN RENEWAL,LLC          2
272 GROVE STREET URBAN RENEWAL, LLC      1
280 FAIRMOUNT URBAN RENEWAL, LLC.        2
364 NINTH STREET URBAN RENEWAL, LLC      2
380 NEWARK REALTY URBAN RENEWAL LLC      5
400 CLAREMENT URBAN RENEWAL LLC          1
400 CLAREMONT URBAN RENEWAL LLC          1
456 GRAND REALTY URBAN RENEWAL, LLC      1
70 COLUMBUS URBAN RENEWAL, LLC           2
95 VAN DAM URBAN RENEWAL, L.L.C.       129
ADELE FASHION URBAN RENEWAL A.,LLC       1
ASH URBAN RENEWAL DEVELOPMENT,LLC        2
AUDOBON URBAN RENEWAL PROPERTIES,LP      1
AUDOBON URBAN RENEWAL% SIG MGNT,LLC      1
AUDOBON URBAN RENEWAL% SIG MNGT,LLC      1
AUDOBON URBAN RENEWAL%AMISTAD MGT.       1
AUDOBON URBAN RENEWAL%SIG MNGT, LLC      2
BERGEN MANOR URBAN RENEWAL LLC           3
BERGENVIEW URBAN RENEWAL, LLC.           1
BRASS WORKS URBAN RENEWAL CO., LLC       1
CITY OF JERSEY CITY                    252

In [58]:
jersey_city.groupby('owners_name')['property_location'].count()

owners_name
0LALEKAN, RODA                      1
1 CAVEN POINT ROAD ASSOC. L.L.C.    1
1 EDWARD HART ROAD LLC              1
1 EXCHANGE JC, LLC % RYAN, LLC      2
1-12 CATHERINE COURT JC LLC         1
                                   ..
ZYSKOWSKA, ANNA                     1
ZYZYCK, STACEY                      1
\OA 99 HUDSON,LLC                   1
`ENG, MENG RAN                      1
`WEKE, IKEENA & PRECIOUS            1
Name: property_location, Length: 49247, dtype: int64

In [61]:
test_data.to_csv('test_data.csv', index=None)

In [24]:
# we'll export this dataframe to a csv which we'll then convert to GEOjson
js_ha.to_csv('jersey_city_HA.csv', index=None)

In [27]:
[geolocator.geocode(address) for address in js_ha.property_full_address.unique()]

[None,
 Location(514, Newark Avenue, Jersey City, Hudson County, New Jersey, 07306, United States, (40.730081875, -74.05528274999999, 0.0)),
 Location(88, Erie Street, Newport, Jersey City, Hudson County, New Jersey, 07302, United States, (40.724742, -74.043713, 0.0)),
 Location(57, Dales Avenue, Marion, Jersey City, Hudson County, New Jersey, 07306, United States, (40.73589526923077, -74.07633576923077, 0.0)),
 Location(547, Montgomery Street, Bergen, Jersey City, Hudson County, New Jersey, 07302, United States, (40.7224293, -74.0602214, 0.0)),
 Location(Florence Street, Bergen, Jersey City, Hudson County, New Jersey, 07302:07306, United States, (40.7231827, -74.0603751, 0.0)),
 Location(194, Cornelison Avenue, Jersey City, Hudson County, New Jersey, 07304, United States, (40.72162223593433, -74.06287254958836, 0.0)),
 Location(198, Cornelison Avenue, Jersey City, Hudson County, New Jersey, 07304, United States, (40.721711, -74.062783, 0.0)),
 Location(200, Cornelison Avenue, Bergen, 

### a list comprehension is used to return the raw data of the address
### a nested enumerating list comprehension is used to return the indices of unclean addresses
#### the example below is actually a list comprehension within a list comprehension 

In [22]:
[i for i,e in enumerate([geolocator.geocode(address) for address in js_ha.property_full_address]) if e == None]

[0, 4, 14, 15, 23, 24, 25, 26, 29, 30, 35, 36, 37, 39, 45, 54]

### We know the addresses are unclean when we substitute parts of the address
### For example, "Sixteenth" for "16th": 

In [162]:
geolocator.geocode('235 16th St. Jersey City, NJ 07307')

Location(235, 16th Street, Jersey City, Hudson County, New Jersey, 07310, United States, (40.73314597560976, -74.04304924390243, 0.0))

In [168]:
js_ha.property_full_address[23]

'9-31 Wilmot Ave. Jersey City, NJ 07307'

In [123]:
geolocator.geocode('61 Merritt St. Jersey City, NJ 07307').point

Point(40.6888175, -74.0973369, 0.0)

In [38]:
jersey_city[jersey_city.owners_name == 'CITY OF JERSEY CITY']['property_location']

5               SECAUCUS RD.
33           929 SECAUCUS RD
56          386 TERRACE AVE.
98           191 NELSON AVE.
1438        194 BLEECKER ST.
                ...         
63760    35 LINDEN AVE. EAST
63761     9 EAST LINDEN AVE.
63762           CENTRAL R.R.
63763      BROWN PL TO GATES
63765     13 LINDEN AVE.EAST
Name: property_location, Length: 325, dtype: object

In [14]:
hackensack.owners_name.value_counts().head(50)

CITY OF HACKENSACK                     69
WORLD PLAZA AC, LLC                    29
ESSEX COURT REALTY CORP.               28
HACKENSACK DEVELOPERS, LLC             25
TERRACE SQUARE CONDOS % J LOMBARDO     21
N Y S & W R R C/O N. STECKLER          19
HACK & N.Y. R.R. CO C/O LAND & TAX     19
SKYVIEW AT HACKENSACK, LLC             19
COUNTY OF BERGEN                       16
WORLD PLAZA PROPERTIES, LLC            13
ATTESSA PROPERTIES LLC                 13
HEKEMIAN,SAMUEL %THE S HEKEMIAN GRP    12
BD OF ED CITY OF HACK                  12
ELYASH, ADELAIDA                       10
FAIRLEIGH DICKINSON UNIVERSITY          9
400 E MAIN STREET, LLC                  9
NEW HOPE BAPTIST CHURCH                 8
RUDDOCK, WAYNE & ROSA                   7
DI CAROLIS REALTY CO.                   7
JACKSON PINK LLC                        7
NESS REALTY 1 LLC                       7
MAIN PORTFOLIO LLC                      6
US BANK TRUST NA TRSTE                  6
CYPRUS HOLDING COMPANY LLC        