# Analyzing Assessed Value of the Downtown Mall

## Goals: 

    *Provide descriptive statistics of assessed value over time for the City of Charlottesville's downtown mall

    *Map assessed values over time

## Step 1: Aquire Data

### Assessment Values
    Charlottesville's Open Data Portal : http://opendata.charlottesville.org/

    Real Estate (All Assessments) Dataset : http://opendata.charlottesville.org/datasets/real-estate-all-assessments

        * On the Real Estate dataset page, in the upper right corner of the window under the map click the 'APIs' drop down
        * Copy the GeoJSON link
        * Use the GeoJSON link to pull data directly from the Open Data portal using the code below

    Parcel Area Data : http://opendata.charlottesville.org/datasets/parcel-boundary-area

### List of properties to use in analysis

Charlottesville GIS Viewer: https://gisweb.charlottesville.org/GisViewer/

    * Under Map option, turn on 'Parcels & Buildings' > 'Parcels'. Turn everything else off.

    * Zoom to area of interest on map

    * Under 'Tools' select 'Identify'

    * In 'Identify' toolbar select 'Custom Shape' and under 'Layer' select 'Parcels'

    * Using mouse on map, click boundary around area of interest

    * A list will appear in the left panel of the web page

    * In the panel click 'Tools' > 'Export All to Excel'

    * A window named 'Export Results' will open when your download is ready.

    * Click 'View Export' and save file to your project directory

<img src="https://github.com/strmwtr/downtown_assessments/blob/master/img/getting_pin_list.png?raw=true">

# Step 2: Prepare Data

## Looking at the .xls retrieved from the GIS Viewer

In [1]:
#Import pandas module
import pandas as pd 

#Path to the .xls retrieve from the GIS Viewer
f = r'/home/bob/projects/dt/data/pin_exp.xls'

#Create a dataframe that reads the .xls file
df = pd.read_excel(f)

### Let's look at the first 5 rows of our .xls

In [2]:
df.head()

Unnamed: 0,FullAddress,OBJECTID,PIN,GPIN,ParcelNumber,OwnerName,CurrentAssessedValue,CurrentTaxYear,CurrentAssessedValueWithLabel,PicturePath,...,MULTIPIN,OwnerAddress,OwnerCityState,OwnerZipCode,SHAPE.STArea(),SHAPE.STLength(),cvGIS.CITY.parcel_area.CreatedBy,cvGIS.CITY.parcel_area.CreatedDate,cvGIS.CITY.parcel_area.ModifiedBy,cvGIS.CITY.parcel_area.ModifiedDate
0,0 3RD ST SE,24846953,280036300,7309,280036300,"LITTLE MOOSE, LLC",204100,2019 Value:,"2019 Value: 204,100",http://realestate.charlottesville.org/IMAGES\P...,...,,P O BOX 4226,CHARLOTTESVILLE VA,22905,1110.75,178.976929,,,,
1,0 E MARKET ST,24848655,330245100,14744,330245100,FIRST AND MAIN CHARLOTTESVILLE LLC,1122200,2019 Value:,"2019 Value: 1,122,200",,...,,224 14TH STREET NW,CHARLOTTESVILLE VA,22903,11316.25,425.844859,,,,
2,0 W MARKET ST & 2ND ST NW,24841086,330262000,6656,330262000,"SPENCER, HAWES, ETAL, TR PROTICO PROP LD TR",418400,2019 Value:,"2019 Value: 418,400",http://realestate.charlottesville.org/IMAGES\P...,...,,700 E HIGH ST,CHARLOTTESVILLE VA,22902,3802.25,254.696684,,,,
3,100 5TH ST SE,24845005,530065300,7426,530065300,"MAIN, RALPH TR OF BLACK DUCK LD TR",664000,2019 Value:,"2019 Value: 664,000",http://realestate.charlottesville.org/IMAGES\P...,...,,P O BOX 2378,CHARLOTTESVILLE VA,22902,787.375,118.075826,,,,
4,100 E MAIN ST,24839773,280020000,7021,280020000,ONE HUNDRED EAST MAIN LTD PART,1904200,2019 Value:,"2019 Value: 1,904,200",http://realestate.charlottesville.org/IMAGES\P...,...,,MSC BOX 5186,CHARLOTTESVILLE VA,22905,5102.75,418.0865,,,,


We can quickly see what is provided by this .xls sheet and the general format of the data it contains. We can see that there are 23 columns by the 5 rows x 23 columns line printed at the end of .head(). 

Looking at the column headers

In [3]:
df.columns

Index(['FullAddress', 'OBJECTID', 'PIN', 'GPIN', 'ParcelNumber', 'OwnerName',
       'CurrentAssessedValue', 'CurrentTaxYear',
       'CurrentAssessedValueWithLabel', 'PicturePath',
       'CurrentAssessedValueYearLabel', 'CurrentAssessedValueText',
       'MultiParcel', 'MULTIPIN', 'OwnerAddress', 'OwnerCityState',
       'OwnerZipCode', 'SHAPE.STArea()', 'SHAPE.STLength()',
       'cvGIS.CITY.parcel_area.CreatedBy',
       'cvGIS.CITY.parcel_area.CreatedDate',
       'cvGIS.CITY.parcel_area.ModifiedBy',
       'cvGIS.CITY.parcel_area.ModifiedDate'],
      dtype='object')

Let's take a closer look at the column PIN

In [4]:
df['PIN'].head()

0    280036300
1    330245100
2    330262000
3    530065300
4    280020000
Name: PIN, dtype: object

In [5]:
df['PIN'].shape

(207,)

In [6]:
#The dataframe looks like it contains all of the PIN's for the area, but it looks like there are duplucates
#Let's call the same dataframe, but with the .unique() function, to isolate the unique values
print(df['PIN'].unique())
print(df['PIN'].unique().shape)

[280036300 330245100 330262000 530065300 280020000 280019000 280020100
 330255000 330248000 530065400 280021000 330244000 '280051A00' 530065500
 280022000 330232000 330256000 530065600 '280051B00' 330241000 280016100
 530058000 530065700 280026100 330258000 280013000 530057000 330242000
 280023000 330278000 330225000 280018000 330254000 530056000 330224000
 330259000 330222000 280024000 330265000 280028000 330260000 280025000
 330245000 530160000 280026000 330261000 330219000 280027000 330243000
 280010000 330263000 330240100 280031000 280012000 330266000 330240000
 330238000 330270000 280034000 330271000 330268000 280035000 '330155L00'
 330269000 280036000 330237000 280036200 330276000 330272000 330277000
 330235000 '2800371C0' 330234000 330273000 330233000 330274000 280001000
 330155300 330155100 280040000 330231000 330223000 330230000 330220000
 280041000 330229000 280042000 330228000 330227000 280043000 330226000
 280044000 280045000 530059000 280058000 280046000 280047000 28004800

Now we have a list of unique PIN's that we can use to build our desired Assessments over time dataframe from. Let's set this to it's own dataframe so we can easily call it in the future

In [7]:
#Create dataframe unique_pins_df from unique PIN's in df  
unique_pins_df = pd.DataFrame(df['PIN'].unique())
#Set column name to PIN
unique_pins_df.columns = ['PIN']
unique_pins_df.head()

Unnamed: 0,PIN
0,280036300
1,330245100
2,330262000
3,530065300
4,280020000


## Obtaining annual assessment data

Charlottesville's Open Data Portal : http://opendata.charlottesville.org/

Real Estate (All Assessments) Dataset : http://opendata.charlottesville.org/datasets/real-estate-all-assessments

* On the Real Estate dataset page, in the upper right corner of the window under the map click the 'APIs' drop down
* Copy the GeoJSON link
* Use the GeoJSON link to pull data directly from the Open Data portal using the code below

In [8]:
# importing the requests library 
import requests

formatted_pins = [f'%27{x}%27' for x in df['PIN'].unique()]
formatted_pins_1 = formatted_pins[:75]
formatted_pins_2 = formatted_pins[75:]
formatted_pins_1 = ','.join(formatted_pins_1)
formatted_pins_2 = ','.join(formatted_pins_2)

url1 = f"https://gisweb.charlottesville.org/arcgis/rest/services/OpenData_2/MapServer/2/query?where=UPPER(ParcelNumber)%20in%20({formatted_pins_1})%20&outFields=ParcelNumber,LandValue,ImprovementValue,TotalValue,TaxYear&outSR=4326&f=json"
url2 = f"https://gisweb.charlottesville.org/arcgis/rest/services/OpenData_2/MapServer/2/query?where=UPPER(ParcelNumber)%20in%20({formatted_pins_2})%20&outFields=ParcelNumber,LandValue,ImprovementValue,TotalValue,TaxYear&outSR=4326&f=json"

r1 = requests.get(url1)
r2 = requests.get(url2)

d1 = r1.json()
d2 = r2.json()
print(r1,r2)

<Response [200]> <Response [200]>


After testing the requests.get(url), I can request up to 120 parcels at a time before recieving a 404 Error. I have 126 parcels of interest. I will break my request up into 2 parts, [:75] and [75:].

Check data type

In [9]:
type(d1)

dict

In [10]:
d1.keys()

dict_keys(['displayFieldName', 'fieldAliases', 'fields', 'features'])

In [11]:
d1['features']

[{'attributes': {'ParcelNumber': '280010000',
   'LandValue': 2191100,
   'ImprovementValue': 3812400,
   'TotalValue': 6003500,
   'TaxYear': '2019'}},
 {'attributes': {'ParcelNumber': '280010000',
   'LandValue': 2140600,
   'ImprovementValue': 4038400,
   'TotalValue': 6179000,
   'TaxYear': '2018'}},
 {'attributes': {'ParcelNumber': '280010000',
   'LandValue': 2140600,
   'ImprovementValue': 4051208,
   'TotalValue': 6191808,
   'TaxYear': '2017'}},
 {'attributes': {'ParcelNumber': '280010000',
   'LandValue': 1719500,
   'ImprovementValue': 4271200,
   'TotalValue': 5990700,
   'TaxYear': '2016'}},
 {'attributes': {'ParcelNumber': '280010000',
   'LandValue': 1563200,
   'ImprovementValue': 1120000,
   'TotalValue': 2683200,
   'TaxYear': '2015'}},
 {'attributes': {'ParcelNumber': '280010000',
   'LandValue': 1563200,
   'ImprovementValue': 1120000,
   'TotalValue': 2683200,
   'TaxYear': '2014'}},
 {'attributes': {'ParcelNumber': '280010000',
   'LandValue': 1488800,
   'Improve

In [12]:
type(d1['features'])

list

In [13]:
df1 = pd.DataFrame(d1['features'])
df1.shape

(1683, 1)

In [14]:
df2 = pd.DataFrame(d2['features'])
df2.shape

(1173, 1)

In [15]:
df1 = df1.append(df2, ignore_index = True)

In [16]:
df1.shape

(2856, 1)

In [17]:
df1.head()

Unnamed: 0,attributes
0,"{'ParcelNumber': '280010000', 'LandValue': 219..."
1,"{'ParcelNumber': '280010000', 'LandValue': 214..."
2,"{'ParcelNumber': '280010000', 'LandValue': 214..."
3,"{'ParcelNumber': '280010000', 'LandValue': 171..."
4,"{'ParcelNumber': '280010000', 'LandValue': 156..."


In [18]:
df1.keys()

Index(['attributes'], dtype='object')

In [19]:
type(df1['attributes'])

pandas.core.series.Series

In [20]:
assessments = pd.DataFrame([x for x in df1['attributes']])
assessments.head()

Unnamed: 0,ImprovementValue,LandValue,ParcelNumber,TaxYear,TotalValue
0,3812400,2191100,280010000,2019,6003500
1,4038400,2140600,280010000,2018,6179000
2,4051208,2140600,280010000,2017,6191808
3,4271200,1719500,280010000,2016,5990700
4,1120000,1563200,280010000,2015,2683200


In [21]:
assessments.keys()

Index(['ImprovementValue', 'LandValue', 'ParcelNumber', 'TaxYear',
       'TotalValue'],
      dtype='object')

In [22]:
assessments.shape

(2856, 5)

In [23]:
assessments['ParcelNumber'].describe()

count          2856
unique          126
top       280040000
freq             23
Name: ParcelNumber, dtype: object

In [24]:
assessments.isnull().any()

ImprovementValue    False
LandValue           False
ParcelNumber        False
TaxYear             False
TotalValue          False
dtype: bool

In [25]:
assessments['TaxYear'].describe()

count     2856
unique      23
top       2015
freq       126
Name: TaxYear, dtype: object

In [26]:
assessments['TaxYear'].min(), assessments['TaxYear'].max()

('1997', '2019')

In [27]:
assessments['ImprovementValue'].describe()

count    2.856000e+03
mean     1.238526e+06
std      2.481600e+06
min      0.000000e+00
25%      2.615500e+05
50%      4.872000e+05
75%      9.760250e+05
max      3.545040e+07
Name: ImprovementValue, dtype: float64

In [28]:
assessments['LandValue'].describe()

count    2.856000e+03
mean     3.697564e+05
std      6.358546e+05
min      0.000000e+00
25%      8.380000e+04
50%      1.701000e+05
75%      3.785000e+05
max      6.139900e+06
Name: LandValue, dtype: float64

In [29]:
assessments['TotalValue'].describe()

count    2.856000e+03
mean     1.608282e+06
std      3.024799e+06
min      0.000000e+00
25%      3.735000e+05
50%      6.736500e+05
75%      1.326125e+06
max      4.159030e+07
Name: TotalValue, dtype: float64

In [30]:
taxyearmin = assessments['TaxYear'] == assessments['TaxYear'].min()
assessments[taxyearmin].describe()

Unnamed: 0,ImprovementValue,LandValue,TotalValue
count,122.0,122.0,122.0
mean,561465.6,128888.5,690354.1
std,1138852.0,187502.7,1310656.0
min,0.0,0.0,0.0
25%,136500.0,43500.0,190700.0
50%,238100.0,71100.0,294050.0
75%,375550.0,129725.0,514775.0
max,8221600.0,1528400.0,9750000.0


In [31]:
taxyearmax = assessments['TaxYear'] == assessments['TaxYear'].max()
assessments[taxyearmax].describe()

Unnamed: 0,ImprovementValue,LandValue,TotalValue
count,126.0,126.0,126.0
mean,1955444.0,755712.7,2711156.0
std,4152663.0,1029963.0,5047571.0
min,0.0,0.0,0.0
25%,422475.0,248075.0,737025.0
50%,746250.0,431850.0,1162900.0
75%,1547975.0,757400.0,2232350.0
max,35450400.0,6139900.0,41590300.0


In [32]:
assessments[taxyearmax].describe()-assessments[taxyearmin].describe()

Unnamed: 0,ImprovementValue,LandValue,TotalValue
count,4.0,4.0,4.0
mean,1393978.0,626824.2,2020802.0
std,3013812.0,842460.8,3736914.0
min,0.0,0.0,0.0
25%,285975.0,204575.0,546325.0
50%,508150.0,360750.0,868850.0
75%,1172425.0,627675.0,1717575.0
max,27228800.0,4611500.0,31840300.0


## Creating json file for parcel areas

In [33]:
formatted_gpins = [str(x) for x in df['GPIN'].unique()]
formatted_gpins = formatted_gpins
formatted_gpins = ','.join(formatted_gpins)

parcel_area_url = f"https://gisweb.charlottesville.org/arcgis/rest/services/OpenData_1/MapServer/43/query?where=GPIN%20in%20({formatted_gpins})&outFields=*&outSR=4326&f=json"

r_parcel_area = requests.get(parcel_area_url)

print(r_parcel_area)

parcel_area_json = r_parcel_area.json()
print(parcel_area_json, file=open('./parcels.geojson', 'w'))

parcel_area = './parcels.json'

<Response [200]>


In [39]:
import folium
m = folium.Map(location=[38.0304,-78.4804], tiles='OpenStreetMap', zoom_start=16)
folium.GeoJson(parcel_area, name="geojson").add_to(m)
folium.LayerControl().add_to(m)
m