The following is our initial draft. We are looking to add more maps and analysis, and improve our ML models. However, we would like any comments or suggestions if any. 

# Part 1: Obtaining and Organizing Data

Using Socrata, we imported two datasets using the New York City Open Data Portal APIs. It included a dataset of property valuations in New York City, and another dataset listing the reassesment actions. App_tokens were used to adjust limits for valuation results and testing our code on smaller/larger datasets. 

In [1]:
import pandas as pd
import geopandas as gpd
from sodapy import Socrata
import matplotlib.pyplot as plt
import contextily as ctx

In [2]:
client = Socrata("data.cityofnewyork.us", '9llM0ejMVTKfRxS1XlvL7gXjU')

#first one is the property valuation and assessment dataset
vresults = client.get("yjxr-fw8i", content_type='geojson', year = '2017/18', limit=10000000)

#second one is the assessment actions dataset
aresults = client.get("4nft-bihw", content_type='json', limit=100000) 

In [3]:
value_gdf = gpd.GeoDataFrame.from_features(vresults, crs='EPSG:4326')
actions_df = pd.DataFrame.from_records(aresults)

Dropping properties in reassessment dataset with no actual reassessments, and dropping properties from valuation dataset that do not actually have values.

In [4]:
len(actions_df)

12321

In [5]:
actions_df['granted_reduction_amount']=pd.to_numeric(actions_df['granted_reduction_amount'])
actions_df=actions_df[actions_df['granted_reduction_amount']>0]
len(actions_df)

11773

In [6]:
len(value_gdf)

1110058

In [7]:
value_gdf['avtot']=pd.to_numeric(value_gdf['avtot'])
value_gdf=value_gdf[value_gdf['avtot']>0]
len(value_gdf)

1094727

In order to combine the valuation and assessment datasets, a new column was made that combines the borough, block, and lot numbers (keeping it as a string).

In [8]:
value_gdf['BBB'] = value_gdf['boro'] + '-' + value_gdf['block'] + '-' + value_gdf['lot']
value_gdf['BBB']

actions_df['BBB'] = actions_df['borough_code'] + '-' + actions_df['block_number'] + '-' + actions_df['lot_number']
actions_df['BBB']

actions_df.set_index('BBB', inplace=True)
value_gdf.set_index('BBB', inplace=True)

We then conducted an inner join, so that each property reassessment had linked with its valuation.

In [9]:
inner_joined_gdf = value_gdf.join(actions_df, how='inner', rsuffix=('_actions'))

And then creating a new column that shows the percent reduction

In [10]:
#making them numeric first
inner_joined_gdf['granted_reduction_amount']=pd.to_numeric(inner_joined_gdf['granted_reduction_amount'])
inner_joined_gdf['avtot']=pd.to_numeric(inner_joined_gdf['avtot'])

#making the new column
inner_joined_gdf['reduction_scaled']=inner_joined_gdf['granted_reduction_amount']/inner_joined_gdf['avtot']

print(len(inner_joined_gdf))
inner_joined_gdf.head()

11722


Unnamed: 0_level_0,geometry,nta,avland,latitude,zip,stories,avtot,easement,valtype,exland,...,ltdepth,borough_code,block_number,lot_number,tax_year,owner_name,property_address,granted_reduction_amount,tax_class_code,reduction_scaled
BBB,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1-10-14,POINT (-74.01304 40.70331),Battery Park City-Lower Manhattan,7875000,40.703312,10004,30,38897100,,AC-TR,0,...,161,1,10,14,2018,BROAD FINANCIAL CENTE,33 WHITEHALL STREET,1755500.0,4,0.045132
1-10-15,POINT (-74.01309 40.70352),Battery Park City-Lower Manhattan,406800,40.703517,10004,7,1720350,,AC-TR,0,...,58,1,10,15,2018,MSA TWINS LTD,27 WHITEHALL STREET,92350.0,4,0.053681
1-10-33,POINT (-74.01264 40.70403),Battery Park City-Lower Manhattan,3690450,40.704025,10004,43,35656200,,AC-TR,0,...,125,1,10,33,2018,AL STONE GROUND TENAN,8 STONE STREET,2158050.0,4,0.060524
1-100-1001,POINT (-74.00603 40.71154),Battery Park City-Lower Manhattan,65790,40.711541,10038,23,1183950,,AC-TR,0,...,0,1,100,1001,2018,THE BRAUSER GROUP #1,150 NASSAU STREET,247400.0,4,0.208962
1-100-1201,POINT (-74.00540 40.71124),Battery Park City-Lower Manhattan,5841450,40.711245,10038,76,140023350,,AC-TR,2759170,...,0,1,100,1201,2018,FC 8 SPRUCE STREET RE,8 SPRUCE STREET,3949350.0,2,0.028205


In [11]:
print(len(inner_joined_gdf[inner_joined_gdf['reduction_scaled']==float('inf')]))

0


In [12]:
print(len(inner_joined_gdf[inner_joined_gdf['reduction_scaled']==0]))

0


In [13]:
print(len(inner_joined_gdf[inner_joined_gdf['reduction_scaled']>1]))

58


In [14]:
import numpy as np
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
inner_joined_gdf[inner_joined_gdf['reduction_scaled']>1]

Unnamed: 0_level_0,geometry,nta,avland,latitude,zip,stories,avtot,easement,valtype,exland,blddepth,year,taxclass,longitude,bldfront,bldgcl,block,avtot2,excd1,bble,staddr,exmptcl,avland2,census_tract,lot,boro,ltfront,fullval,ext,bin,excd2,owner,extot,extot2,exland2,community_board,borough,period,council_district,ltdepth,borough_code,block_number,lot_number,tax_year,owner_name,property_address,granted_reduction_amount,tax_class_code,reduction_scaled
BBB,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1
1-1722-63,POINT (-73.94446 40.80733),Central Harlem South,104400,40.80733,10027.0,1.0,743850,,AC-TR,0,100,2017/18,4,-73.944458,51,K1,1722,563400.0,,1017220063,64 WEST 125 STREET,,104400.0,200.0,63,1,51,1653000,,1053482.0,,64 WEST LLC,0,,,110.0,MANHATTAN,FINAL,9.0,100,1,1722,63,2018,"64 WEST, LLC",64 WEST 125 STREET,757750.0,4,1.018687
1-203-23,POINT (-73.99569 40.71687),SoHo-TriBeCa-Civic Center-Little Italy,203400,40.716871,10013.0,8.0,788075,,AC-TR,0,100,2017/18,4,-73.995686,25,O6,203,719303.0,,1002030023,78 BOWERY,,203400.0,41.0,23,1,25,1751277,,1002610.0,,BOWERY TOWER LLC,0,,,102.0,MANHATTAN,FINAL,1.0,100,1,203,23,2018,BOWERY TOWER LLC,78 BOWERY,919950.0,4,1.167338
1-276-1103,POINT (-73.99591 40.71144),Chinatown,6103,40.711442,10002.0,6.0,73806,,AC-TR,0,65,2017/18,2,-73.995913,25,R4,276,64220.0,,1002761103,11 MONROE STREET,,6103.0,8.0,1103,1,25,164014,,1087344.0,,THE EXCEL CONDOMINIUM,0,,,103.0,MANHATTAN,FINAL,1.0,101,1,276,1103,2018,"11 MONROE REALTY, INC",11 MONROE STREET,264750.0,2,3.587107
1-44-1,,,4158000,,,,4158000,E,AC-TR,4158000,0,2017/18,4,,0,Z7,44,3944970.0,2262.0,1000440001E,NASSAU STREET,X1,3944970.0,,1,1,220,9240000,,,,NYC DEPT OF HIGHWAYS,4158000,3944970.0,3944970.0,,,FINAL,,409,1,44,1,2017,SUMMIT GLORY PROPERTY,28 LIBERTY STREET,8623850.0,4,2.074038
1-44-1,,,4158000,,,,4158000,E,AC-TR,4158000,0,2017/18,4,,0,Z7,44,3944970.0,2262.0,1000440001E,NASSAU STREET,X1,3944970.0,,1,1,220,9240000,,,,NYC DEPT OF HIGHWAYS,4158000,3944970.0,3944970.0,,,FINAL,,409,1,44,1,2018,SUMMIT GLORY PROPERTY,28 LIBERTY STREET,15005800.0,4,3.608899
1-595-1201,POINT (-74.01018 40.72437),SoHo-TriBeCa-Civic Center-Little Italy,135000,40.724367,10013.0,7.0,636750,,AC-TR,0,0,2017/18,4,-74.010178,0,RK,595,597672.0,,1005951201,459 WASHINGTON STREET,,135000.0,39.0,1201,1,0,1415000,,1010328.0,,"HDK HOLDING, LLC",0,,,101.0,MANHATTAN,FINAL,1.0,0,1,595,1201,2018,HDK HOLDING LLC,459 WASHINGTON STREET,908550.0,4,1.426855
1-788-71,POINT (-73.98966 40.75468),Midtown-Midtown South,284400,40.75468,10018.0,14.0,1099350,,AC-TR,0,99,2017/18,4,-73.989659,38,L1,788,1046880.0,,1007880071,244 WEST 39 STREET,,284400.0,113.0,71,1,37,2443000,,1014488.0,,"244 W. 39 ST REALTY,I",0,,,105.0,MANHATTAN,FINAL,3.0,98,1,788,71,2018,244 WEST 39TH STREET,244 WEST 39 STREET,1131100.0,4,1.028881
2-2597-1,,,4500,,,,4500,F,AC-TR,4500,0,2017/18,4,,0,Z7,2597,,2172.0,2025970001F,ROSE FEISS BOULEVARD,,,,1,2,5,10000,,,,DEPT OF WATER RESOURC,4500,,,,,FINAL,,225,2,2597,1,2018,SPRAGUE OPERATING RES,939 EAST 138 STREET,2188900.0,4,486.422222
2-2597-1,,,3150,,,,3600,E,AC-TR,3150,0,2017/18,4,,0,Z7,2597,,3400.0,2025970001E,EAST 138 STREET,X1,,,1,2,50,8000,,,,CITY OF NEW YORK,3600,,,,,FINAL,,128,2,2597,1,2018,SPRAGUE OPERATING RES,939 EAST 138 STREET,2188900.0,4,608.027778
2-3298-16,POINT (-73.88556 40.87124),Bedford Park-Fordham North,173250,40.871236,10458.0,6.0,829800,,AC-TR,0,83,2017/18,2,-73.885558,100,D1,3298,726300.0,,2032980016,2966 BRIGGS AVENUE,,76050.0,415.0,16,2,100,1844000,,2016953.0,,NYSANDY3 NBP7 LLC,0,,,207.0,BRONX,FINAL,11.0,110,2,3298,16,2018,NYSANDY3 NBP7 LLC,2966 BRIGGS AVENUE,1106050.0,2,1.332912


In [15]:
inner_joined_gdf[inner_joined_gdf['exmptcl']=='X1']

Unnamed: 0_level_0,geometry,nta,avland,latitude,zip,stories,avtot,easement,valtype,exland,blddepth,year,taxclass,longitude,bldfront,bldgcl,block,avtot2,excd1,bble,staddr,exmptcl,avland2,census_tract,lot,boro,ltfront,fullval,ext,bin,excd2,owner,extot,extot2,exland2,community_board,borough,period,council_district,ltdepth,borough_code,block_number,lot_number,tax_year,owner_name,property_address,granted_reduction_amount,tax_class_code,reduction_scaled
BBB,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1
1-1084-1,POINT (-73.99203 40.76943),Clinton,3614850,40.76943,10019.0,6.0,20610450,,AC-TR,0,200,2017/18,4,-73.992032,201,G8,1084,18393226.0,,1010840001,802 11 AVENUE,X1,3614850.0,135.0,1,1,200,45801000,,1080969.0,,"VW CREDIT, INC.",0,,,104.0,MANHATTAN,FINAL,6.0,200,1,1084,1,2018,"VW CREDIT, INC.",802 11 AVENUE,1228750.0,4,0.059618
1-1118-52,POINT (-73.98028 40.77305),Lincoln Square,2250450,40.773055,10023.0,3.5,5940900,,AC-TR,0,88,2017/18,4,-73.98028,174,O2,1118,5308760.0,,1011180052,56 WEST 66 STREET,X1,2250180.0,149.0,52,1,174,13202000,,1028172.0,,AMERICAN BROADCTG COI,0,,,107.0,MANHATTAN,FINAL,6.0,100,1,1118,52,2018,AMERICAN BROADCASTING,56 WEST 66 STREET,324750.0,4,0.054663
1-1214-10,POINT (-73.97516 40.78520),Upper West Side,720000,40.785202,10024.0,5.0,5600250,,AC-TR,0,92,2017/18,4,-73.975156,59,G1,1214,4155660.0,,1012140010,157 WEST 83 STREET,X1,720000.0,169.0,10,1,59,12445000,,1032112.0,,BILLIG REALTY CO,0,,,107.0,MANHATTAN,FINAL,6.0,102,1,1214,10,2017,KINNEY WEST 83RD ST.,157 WEST 83 STREET,1700250.0,4,0.303603
1-1214-10,POINT (-73.97516 40.78520),Upper West Side,720000,40.785202,10024.0,5.0,5600250,,AC-TR,0,92,2017/18,4,-73.975156,59,G1,1214,4155660.0,,1012140010,157 WEST 83 STREET,X1,720000.0,169.0,10,1,59,12445000,,1032112.0,,BILLIG REALTY CO,0,,,107.0,MANHATTAN,FINAL,6.0,102,1,1214,10,2018,KINNEY WEST 83RD ST.,157 WEST 83 STREET,1718700.0,4,0.306897
1-1220-1,POINT (-73.97353 40.78971),Upper West Side,5850000,40.789706,10024.0,18.0,32893200,,AC-TR,0,134,2017/18,2,-73.973529,200,D6,1220,28539130.0,5116.0,1012200001,601 AMSTERDAM AVENUE,X1,5850000.0,173.0,1,1,201,73096000,,1085485.0,,"LPF SAGAMORE, INC.",12407956,10666328.0,,107.0,MANHATTAN,FINAL,6.0,200,1,1220,1,2018,"LPF SAGAMORE, INC.",601 AMSTERDAM AVENUE,4682950.0,2,0.142368
1-1226-29,POINT (-73.96795 40.79229),Upper West Side,7740000,40.79229,10025.0,16.0,29176650,,AC-TR,0,150,2017/18,2,-73.967952,194,D6,1226,22512510.0,,1012260029,720 COLUMBUS AVENUE,X1,7740000.0,181.0,29,1,201,64837000,,1082743.0,,ASN WESTMONT LLC,0,,,107.0,MANHATTAN,FINAL,6.0,150,1,1226,29,2018,ASN WESTMONT LLC,720 COLUMBUS AVENUE,1071500.0,2,0.036725
1-1746-21,POINT (-73.94336 40.80201),East Harlem North,675000,40.802013,10035.0,9.0,8726400,,AC-TR,649884,0,2017/18,2,-73.943361,0,D4,1746,7085760.0,5114.0,1017460021,1831 MADISON AVENUE,X1,675000.0,198.0,21,1,100,19392000,,1086498.0,,NYC HOUSING PARTNERSH,8701284,7060644.0,649884.0,111.0,MANHATTAN,FINAL,9.0,175,1,1746,21,2018,MADISON PARK APARTMEN,1831 MADISON AVENUE,348450.0,2,0.039931
1-1775-28,POINT (-73.93638 40.80483),East Harlem North,270000,40.804826,10035.0,5.0,1177200,,AC-TR,0,89,2017/18,4,-73.936383,46,E7,1775,1084450.0,,1017750028,155 EAST 126 STREET,X1,255330.0,242.0,28,1,125,2616000,E,1081564.0,,RESNICK 126TH STREET,0,,,111.0,MANHATTAN,FINAL,9.0,99,1,1775,28,2018,RESNICK 126TH STREET,155 EAST 126 STREET,222050.0,4,0.188626
1-181-20,POINT (-74.00915 40.71933),SoHo-TriBeCa-Civic Center-Little Italy,161100,40.719333,10013.0,4.0,742050,,AC-TR,0,55,2017/18,4,-74.009149,24,K4,181,604351.0,,1001810020,173 FRANKLIN STREET,X1,161100.0,39.0,20,1,24,1649000,E,1002078.0,,JACK WEISBERG,0,,,101.0,MANHATTAN,FINAL,1.0,88,1,181,20,2018,JACK WEISBERG,173 FRANKLIN STREET,94600.0,4,0.127485
1-1846-3,POINT (-73.95945 40.80142),Central Harlem South,180000,40.801416,10026.0,6.0,1390950,,AC-TR,0,94,2017/18,2,-73.959449,40,C4,1846,1134270.0,,1018460003,246 MANHATTAN AVENUE,X1,180000.0,19702.0,3,1,40,3091000,,1055751.0,,"246 EQUITIES, LLC",0,,,110.0,MANHATTAN,FINAL,9.0,110,1,1846,3,2017,"246 EQUITIES, L.L.C.",246 MANHATTAN AVENUE,65950.0,2,0.047414


In [16]:
inner_joined_gdf[inner_joined_gdf['granted_reduction_amount']>1000000]

Unnamed: 0_level_0,geometry,nta,avland,latitude,zip,stories,avtot,easement,valtype,exland,blddepth,year,taxclass,longitude,bldfront,bldgcl,block,avtot2,excd1,bble,staddr,exmptcl,avland2,census_tract,lot,boro,ltfront,fullval,ext,bin,excd2,owner,extot,extot2,exland2,community_board,borough,period,council_district,ltdepth,borough_code,block_number,lot_number,tax_year,owner_name,property_address,granted_reduction_amount,tax_class_code,reduction_scaled
BBB,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1
1-10-14,POINT (-74.01304 40.70331),Battery Park City-Lower Manhattan,7875000,40.703312,10004.0,30.0,38897100,,AC-TR,0,160,2017/18,4,-74.013038,58,O4,10,37711690.0,,1000100014,33 WHITEHALL STREET,,7875000.0,9.0,14,1,82,86438000,,1000023.0,,BROAD FINANCIAL CENTE,0,,,101.0,MANHATTAN,FINAL,1.0,161,1,10,14,2018,BROAD FINANCIAL CENTE,33 WHITEHALL STREET,1755500.0,4,0.045132
1-10-33,POINT (-74.01264 40.70403),Battery Park City-Lower Manhattan,3690450,40.704025,10004.0,43.0,35656200,,AC-TR,0,125,2017/18,4,-74.012638,71,H2,10,31040240.0,,1000100033,8 STONE STREET,,3690450.0,9.0,33,1,71,79236000,,1087618.0,,B.H. 8 STONE STREET C,0,,,101.0,MANHATTAN,FINAL,1.0,125,1,10,33,2018,AL STONE GROUND TENAN,8 STONE STREET,2158050.0,4,0.060524
1-100-1201,POINT (-74.00540 40.71124),Battery Park City-Lower Manhattan,5841450,40.711245,10038.0,76.0,140023350,,AC-TR,2759170,0,2017/18,2,-74.005396,0,RR,100,111857319.0,5116.0,1001001201,8 SPRUCE STREET,,5841450.0,1501.0,1201,1,0,311163000,,1087485.0,,FC 8 SPRUCE STREET RE,136941070,108775039.0,2759170.0,101.0,MANHATTAN,FINAL,1.0,0,1,100,1201,2018,FC 8 SPRUCE STREET RE,8 SPRUCE STREET,3949350.0,2,0.028205
1-1000-29,POINT (-73.98154 40.75839),Midtown-Midtown South,134548200,40.758393,10036.0,45.0,398219850,,AC-TR,0,421,2017/18,4,-73.981537,188,O4,1000,356064118.0,1985.0,1010000029,1211 AVENUE OF THE AMER,,134549280.0,125.0,29,1,200,884933000,G,1022678.0,,1211 6TH AVENUE SYNDI,25672320,25672320.0,,105.0,MANHATTAN,FINAL,4.0,440,1,1000,29,2018,1211 6TH AVENUE PROPE,1211 AVENUE OF THE AM,11932000.0,4,0.029963
1-1000-62,POINT (-73.98426 40.75969),Midtown-Midtown South,1575000,40.759694,10036.0,4.0,8193150,,AC-TR,0,69,2017/18,4,-73.984265,25,K4,1000,6393990.0,,1010000062,717 7 AVENUE,,1575000.0,125.0,62,1,25,18207000,E,1022686.0,,"MOORE, DOLORITA F/B/O",0,,,105.0,MANHATTAN,FINAL,4.0,79,1,1000,62,2018,STRATFORD WALLACE & W,717 7 AVENUE,3246700.0,4,0.39627
1-1000-7,POINT (-73.98334 40.75868),Midtown-Midtown South,4009500,40.758678,10036.0,10.0,16756650,,AC-TR,0,100,2017/18,4,-73.983342,80,HB,1000,16293620.0,,1010000007,157 WEST 47 STREET,,4009500.0,125.0,7,1,80,37237000,,1022676.0,,157 WEST 47TH STREET,0,,,105.0,MANHATTAN,FINAL,4.0,100,1,1000,7,2018,157 WEST 47TH STREET,157 WEST 47 STREET,2014900.0,4,0.120245
1-1001-1002,POINT (-73.98397 40.76010),Midtown-Midtown South,3870000,40.760103,10019.0,,19540800,,AC-TR,0,0,2017/18,4,-73.983973,0,RB,1001,17658990.0,,1010011002,729 7 AVENUE,,3870000.0,125.0,1002,1,100,43424000,,1022699.0,,729 ACQUISITION LLC C,0,,,105.0,MANHATTAN,FINAL,4.0,100,1,1001,1002,2018,729 ACQUISITION LLC,729 7 AVENUE,1122350.0,4,0.057436
1-1001-29,POINT (-73.98214 40.75910),Midtown-Midtown South,197042400,40.759101,10036.0,51.0,475000000,,AC-TR,0,494,2017/18,4,-73.982136,192,O4,1001,444619419.0,,1010010029,1221 AVENUE OF THE AMER,,197616960.0,125.0,29,1,200,1055555555,,1022693.0,,1221 AVENUE HOLDINGS,0,,,105.0,MANHATTAN,FINAL,4.0,525,1,1001,29,2018,1221 AVENUE HOLDINGS,1221 AVENUE OF THE AM,6731250.0,4,0.014171
1-1005-1,POINT (-73.98210 40.76267),Midtown-Midtown South,77400000,40.762674,10019.0,49.0,198138600,,AC-TR,0,296,2017/18,4,-73.982095,198,H1,1005,188391963.0,,1010050001,811 7 AVENUE,,77400000.0,131.0,1,1,200,440308000,,1023159.0,,"HOST MARRIOTT, L.P.",0,,,105.0,MANHATTAN,FINAL,4.0,305,1,1005,1,2018,SNYT LLC,811 7 AVENUE,11345100.0,4,0.057258
1-1006-13,POINT (-73.98105 40.76262),Midtown-Midtown South,57600000,40.762625,10019.0,35.0,134538300,,AC-TR,0,201,2017/18,4,-73.981052,162,O4,1006,133416860.0,,1010060013,141 WEST 53 STREET,,57600000.0,131.0,13,1,162,298974000,,1076175.0,,1325 AVENUE OF THE AM,0,,,105.0,MANHATTAN,FINAL,4.0,200,1,1006,13,2017,1325 AVENUE OF THE AM,141 WEST 53 STREET,4538300.0,4,0.033732


In [17]:
actions_df_2017=actions_df[actions_df['tax_year']=='2017']
actions_df_2017.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2253 entries, 1-7-29 to 5-72830-5
Data columns (total 8 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   borough_code              2253 non-null   object 
 1   block_number              2253 non-null   object 
 2   lot_number                2253 non-null   object 
 3   tax_year                  2253 non-null   object 
 4   owner_name                2253 non-null   object 
 5   property_address          2227 non-null   object 
 6   granted_reduction_amount  2253 non-null   float64
 7   tax_class_code            2253 non-null   object 
dtypes: float64(1), object(7)
memory usage: 158.4+ KB


In [18]:
actions_df_2018=actions_df[actions_df['tax_year']=='2018']
actions_df_2018.info()

<class 'pandas.core.frame.DataFrame'>
Index: 9520 entries, 1-7-29 to 5-72830-5
Data columns (total 8 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   borough_code              9520 non-null   object 
 1   block_number              9520 non-null   object 
 2   lot_number                9520 non-null   object 
 3   tax_year                  9520 non-null   object 
 4   owner_name                9520 non-null   object 
 5   property_address          9473 non-null   object 
 6   granted_reduction_amount  9520 non-null   float64
 7   tax_class_code            9520 non-null   object 
dtypes: float64(1), object(7)
memory usage: 669.4+ KB


# Part 1.5: Obtaining and Organizing Census Data

We looked to analyze our datasets with connection to Census Tract information. We imported demographic information using Cenpy, and obtained it by County rather than the whole city due to mapping issues

In [22]:
from cenpy import products

def borough_census(borough):
    df=products.ACS(2017).from_county(borough+', NY', level='tract',
                                        variables=['B19019_001E', 'B01003_001E', '^B02001', 'B03003_003E', 'B25003_001E', 'B25003_002E', 'B25003_003E', 'B09001_001E', 'B01002_001E'])
    df.rename(columns={'B19019_001E':'median_HH_income', 'B01003_001E':'total_population', 'B02001_001E':'total_population_race','B02001_002E':'total_white'}, inplace=True)
    df.rename(columns={'B02001_003E':'total_black', 'B02001_004E':'total_americanindian', 'B02001_005E':'total_asian','B02001_006E':'total_hawaiian'}, inplace=True)
    df.rename(columns={'B02001_007E':'total_otherrace', 'B02001_008E':'total_twoplusraces', 'B03003_003E':'total_hisp_latino','B09001_001E':'pop_under18','B01002_001E':'median_age'}, inplace=True)
    df.drop(columns=['total_population_race', 'B02001_009E', 'B02001_010E'], inplace=True)
    df['pct_renter'] = df['B25003_003E']/df['B25003_001E']*100
    return df

In [None]:
#getting census data for each borough
manhattandf=borough_census('New York County')
brooklyndf=borough_census('King County')
bronxdf=borough_census('Bronx County')
queensdf=borough_census('Queens County')
statendf=borough_census('Richmond County')

boroughdfs=[manhattandf, brooklyndf, bronxdf, queensdf, statendf]
censusGdf=pd.concat(boroughdfs)
censusGdf.drop(columns=['B25003_001E', 'B25003_002E',
       'B25003_003E', 'NAME', 'state'], inplace=True)

In [28]:
sjoindf = censusGdf.to_crs("EPSG:3857").sjoin(inner_joined_gdf.to_crs("EPSG:3857"),how='left')
sjoindf.head()

Unnamed: 0,GEOID,median_age,total_population,total_white,total_black,total_americanindian,total_asian,total_hawaiian,total_otherrace,total_twoplusraces,total_hisp_latino,pop_under18,median_HH_income,county,tract,pct_renter,geometry,index_right,nta,avland,latitude,zip,stories,avtot,easement,valtype,exland,blddepth,year,taxclass,longitude,bldfront,bldgcl,block,avtot2,excd1,bble,staddr,exmptcl,avland2,census_tract,lot,boro,ltfront,fullval,ext,bin,excd2,owner,extot,extot2,exland2,community_board,borough,period,council_district,ltdepth,borough_code,block_number,lot_number,tax_year,owner_name,property_address,granted_reduction_amount,tax_class_code,reduction_scaled
0,36061031900,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,61,31900,,"POLYGON ((inf inf, inf inf, inf inf, inf inf, ...",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,36061006900,35.2,2568.0,2037.0,53.0,0.0,293.0,0.0,57.0,128.0,161.0,341.0,198636.0,61,6900,60.673235,"POLYGON ((inf inf, inf inf, inf inf, inf inf, ...",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,36061010300,33.6,1674.0,1130.0,90.0,4.0,324.0,5.0,10.0,111.0,143.0,75.0,98901.0,61,10300,87.114846,"POLYGON ((inf inf, inf inf, inf inf, inf inf, ...",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,36061008700,35.8,6815.0,5493.0,85.0,0.0,579.0,0.0,478.0,180.0,1165.0,610.0,153350.0,61,8700,65.910868,"POLYGON ((inf inf, inf inf, inf inf, inf inf, ...",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,36061011100,33.4,5012.0,3111.0,367.0,0.0,1241.0,0.0,110.0,183.0,925.0,440.0,105887.0,61,11100,89.008942,"POLYGON ((inf inf, inf inf, inf inf, inf inf, ...",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [None]:
#making a new df with the total reduction amount granted per tract and joining it to census df
reductionpertract = sjoindf.groupby('tract')[['granted_reduction_amount']].sum()
reductionpertract.rename(columns={'granted_reduction_amount':'total_tract_reduction'}, inplace=True)
censusjoined = censusGdf.join(reductionpertract, on='tract')

In [None]:
#making a new df with the total number of reductions per tract
#adding another column for the total number of property reductions in a census tract
numberreductions = sjoindf.groupby('tract')[['tract']].count()
numberreductions.rename(columns={'tract':'number_reductions'}, inplace=True)
censusjoined = censusjoined.join(numberreductions, on='tract')
censusjoined.head()

# Part 2: Mapping

This map shows the percent reductions across NYC.

In [None]:
fig, ax = plt.subplots(figsize=(20,20))
inner_joined_gdf.to_crs('EPSG:3857').plot(ax=ax, markersize=inner_joined_gdf['reduction_scaled']*5)
ctx.add_basemap(ax)

The two following maps use Census data to map the total value of reductions granted per Census tract, and the number of reductions given per Census tract.

In [None]:
fig, ax = plt.subplots(2, figsize=(15,15))
ax1, ax2 = ax

censusjoined.plot(column='total_tract_reduction', ax=ax1, missing_kwds= dict(color = "grey"), alpha=.7, legend=True)
ctx.add_basemap(ax1)
ax1.set_title("Total Value of Reductions Granted Per Census Tract")

censusjoined.plot(column='number_reductions', ax=ax2, missing_kwds= dict(color = "grey"), alpha=.7, legend=True)
ctx.add_basemap(ax2)
ax2.set_title("Number of Reductions Granted Per Census Tract")

# Part 3: Machine Learning

### 1. Assessing the Total Value of Reductions in a Census Tract

We wanted to use Census information to see what variables are used to predict the total amount of reductions in a given Census tract.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor

xvars = ['median_age', 'total_population', 'total_white',
       'total_black', 'total_americanindian', 'total_asian', 'total_hawaiian',
       'total_otherrace', 'total_twoplusraces', 'total_hisp_latino',
       'pop_under18', 'median_HH_income', 'county', 'pct_renter']

yvar = 'total_tract_reduction'

df_to_fit = censusjoined[xvars+[yvar]].dropna()

X_train, X_test, y_train, y_test = train_test_split(df_to_fit[xvars], df_to_fit[yvar], test_size = 0.25, random_state = 1)

# check we have a reasonable split
print(len(X_train), len(y_train) )
print(len(X_test), len(y_test) )

In [None]:
rf = RandomForestRegressor(n_estimators = 50, random_state = 1)
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)
print(len(X_test), len(y_pred))

In [None]:
import numpy as np
from sklearn import metrics

print('Mean Absolute Error (MAE):', metrics.mean_absolute_error(y_test, y_pred))
print('Mean Squared Error (MSE):', metrics.mean_squared_error(y_test, y_pred))
print('Root Mean Squared Error (RMSE):', metrics.mean_squared_error(y_test, y_pred, squared=False))
print('Mean Absolute Percentage Error (MAPE):', metrics.mean_absolute_percentage_error(y_test, y_pred))
print('Explained Variance Score:', metrics.explained_variance_score(y_test, y_pred))
print('Max Error:', metrics.max_error(y_test, y_pred))
print('Mean Squared Log Error:', metrics.mean_squared_log_error(y_test, y_pred))
print('Median Absolute Error:', metrics.median_absolute_error(y_test, y_pred))
print('R^2:', metrics.r2_score(y_test, y_pred))
print('Mean Poisson Deviance:', metrics.mean_poisson_deviance(y_test, y_pred))

In [None]:
import seaborn as sns

importances = rf.feature_importances_

# convert to a series, and give the index labels from our X_train dataframe
forest_importances = pd.Series(importances, index=X_train.columns)

# get the standard deviations to be able to plot the error bars
std = np.std([tree.feature_importances_ for tree in rf.estimators_], axis=0)

# sort the importances in descending order
forest_importances.sort_values(inplace=True, ascending=False)

# plot
fig, ax = plt.subplots(figsize=(4,15))
sns.barplot(x=forest_importances.values[:10], y=forest_importances.index[:10],yerr=std[:10], ax=ax)
ax.set_title("Feature importances using MDI")
ax.set_ylabel("Mean decrease in impurity")

### 2. Assessing the Total Number of Reductions in a Census Tract

We wanted to use Census information to see what variables are used to predict the number of reductions in a given Census tract.

In [None]:
xvars = ['median_age', 'total_population', 'total_white',
       'total_black', 'total_americanindian', 'total_asian', 'total_hawaiian',
       'total_otherrace', 'total_twoplusraces', 'total_hisp_latino',
       'pop_under18', 'median_HH_income', 'county', 'pct_renter']

yvar = 'number_reductions'

df_to_fit = censusjoined[xvars+[yvar]].dropna()

X_train, X_test, y_train, y_test = train_test_split(df_to_fit[xvars], df_to_fit[yvar], test_size = 0.25, random_state = 1)

# check we have a reasonable split
print(len(X_train), len(y_train) )
print(len(X_test), len(y_test) )

In [None]:
rf = RandomForestRegressor(n_estimators = 50, random_state = 1)
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)
print(len(X_test), len(y_pred))

In [None]:
print('Mean Absolute Error (MAE):', metrics.mean_absolute_error(y_test, y_pred))
print('Mean Squared Error (MSE):', metrics.mean_squared_error(y_test, y_pred))
print('Root Mean Squared Error (RMSE):', metrics.mean_squared_error(y_test, y_pred, squared=False))
print('Mean Absolute Percentage Error (MAPE):', metrics.mean_absolute_percentage_error(y_test, y_pred))
print('Explained Variance Score:', metrics.explained_variance_score(y_test, y_pred))
print('Max Error:', metrics.max_error(y_test, y_pred))
print('Mean Squared Log Error:', metrics.mean_squared_log_error(y_test, y_pred))
print('Median Absolute Error:', metrics.median_absolute_error(y_test, y_pred))
print('R^2:', metrics.r2_score(y_test, y_pred))
print('Mean Poisson Deviance:', metrics.mean_poisson_deviance(y_test, y_pred))

In [None]:
importances = rf.feature_importances_

# convert to a series, and give the index labels from our X_train dataframe
forest_importances = pd.Series(importances, index=X_train.columns)

# get the standard deviations to be able to plot the error bars
std = np.std([tree.feature_importances_ for tree in rf.estimators_], axis=0)

# sort the importances in descending order
forest_importances.sort_values(inplace=True, ascending=False)

# plot
fig, ax = plt.subplots(figsize=(4,15))
sns.barplot(x=forest_importances.values[:10], y=forest_importances.index[:10],yerr=std[:10], ax=ax)
ax.set_title("Feature importances using MDI")
ax.set_ylabel("Mean decrease in impurity")

### 3. Assessing what contributes to a property getting a reassessment or not

To begin, I wanted to create a column that tells us whether a property from the assesment dataset got a reduction or not. So I created a join, while removing duplicates, that would left join the actions to the valuations.

In [None]:
actions_df.index.is_unique
value_gdf.index.is_unique

In [None]:
#creating two new datasets just without duplicates

print('Before dropping duplicates: {}'.format(len(actions_df)))
actions_nodupl = actions_df.groupby('BBB').first()
print('After dropping duplicates: {}'.format(len(actions_nodupl)))
actions_nodupl.index.is_unique

#same thing with the other
print('Before dropping duplicates: {}'.format(len(value_gdf)))
value_nodupl = value_gdf.groupby('BBB').first()
print('After dropping duplicates: {}'.format(len(value_nodupl)))
value_nodupl.index.is_unique

In [None]:
# the join

NewDf_for_ML = value_nodupl.join(actions_nodupl, how='left')
print('Number of valuations: {}'.format(len(NewDf_for_ML)))
print('Number of reductions: {}'.format(NewDf_for_ML['granted_reduction_amount'].count()))

#Almost all of the reductions got added so i think this is fine

In [None]:
NewDf_for_ML.granted_reduction_amount.fillna(0,inplace=True)
NewDf_for_ML['granted_reduction_amount'] = NewDf_for_ML['granted_reduction_amount'].astype(float)
NewDf_for_ML['got_reduction'] = NewDf_for_ML['granted_reduction_amount'] > 0
NewDf_for_ML.head()

In [None]:
NewDf_for_ML.columns

In [None]:
#converting the variables to numbers
columns=['avland', 'latitude', 'stories', 'avtot', 'exland', 'longitude', 'extot', 'fullval']
for column in columns:
    NewDf_for_ML[column]=pd.to_numeric(NewDf_for_ML[column])
NewDf_for_ML.info()

In [None]:
from sklearn.model_selection import train_test_split

#we can put more interesting variables as we sort them out, but for now i used these 3
#xvars = ['avtot','avland', 'stories', 'exland', 'extot', 'fullval']

#variables added by Jacob, I think we should discuss adding one for tax class
xvars = ['avland', 'latitude', 'stories', 'avtot', 'exland', 'longitude', 'extot', 'fullval', 'boro']

yvar = 'got_reduction'

df_to_fit = NewDf_for_ML[xvars+[yvar]].dropna()

X_train, X_test, y_train, y_test = train_test_split(
    df_to_fit[xvars], df_to_fit[yvar], test_size = 0.25, random_state = 1)

print(len(X_train), len(y_train) )
print(len(X_test), len(y_test) )

In [None]:
from sklearn.ensemble import RandomForestClassifier # note there is also a RandomForestRegressor

rf = RandomForestClassifier(n_estimators = 50, random_state = 1)
rf.fit(X_train, y_train)

y_pred = rf.predict(X_test)
y_pred

In [None]:
print(len(X_test), len(y_pred))

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score, ConfusionMatrixDisplay

print(confusion_matrix(y_test, y_pred))
confusion_matrix

In [None]:
ConfusionMatrixDisplay.from_predictions(y_test, y_pred)

In [None]:
import numpy as np
importances = rf.feature_importances_

forest_importances = pd.Series(importances, index=X_train.columns)

std = np.std([tree.feature_importances_ for tree in rf.estimators_], axis=0)

fig, ax = plt.subplots()
forest_importances.plot.bar(yerr=std, ax=ax)
ax.set_title("Feature importances using MDI")
ax.set_ylabel("Mean decrease in impurity")

In [None]:
importances = rf.feature_importances_

# convert to a series, and give the index labels from our X_train dataframe
forest_importances = pd.Series(importances, index=X_train.columns)

# get the standard deviations to be able to plot the error bars
std = np.std([tree.feature_importances_ for tree in rf.estimators_], axis=0)

# sort the importances in descending order
forest_importances.sort_values(inplace=True, ascending=False)

# plot
fig, ax = plt.subplots(figsize=(4,15))
sns.barplot(x=forest_importances.values[:10], y=forest_importances.index[:10],yerr=std[:10], ax=ax)
ax.set_title("Feature importances using MDI")
ax.set_ylabel("Mean decrease in impurity")