# Analyzing Assessed Value of the Downtown Mall

## Goals: 

    *Provide descriptive statistics of assessed value over time for the City of Charlottesville's downtown mall

    *Map and chart assessed values over time

## Step 1: Aquire Data

### Assessment Values
    Charlottesville's Open Data Portal : http://opendata.charlottesville.org/

    Real Estate (All Assessments) Dataset : http://opendata.charlottesville.org/datasets/real-estate-all-assessments

        * On the Real Estate dataset page, in the upper right corner of the window under the map click the APIs             drop down
        * Copy the GeoJSON link
        * Use the GeoJSON link to pull data directly from the Open Data portal using the code below

    Parcel Area Data : http://opendata.charlottesville.org/datasets/parcel-boundary-area

### List of properties to use in analysis

Charlottesville GIS Viewer: https://gisweb.charlottesville.org/GisViewer/

    * Under Map option, turn on 'Parcels & Buildings' > 'Parcels'. Turn everything else off.

    * Zoom to area of interest on map

    * Under 'Tools' select 'Identify'

    * In 'Identify' toolbar select 'Custom Shape' and under 'Layer' select 'Parcels'

    * Using mouse on map, click boundary around area of interest

    * A list will appear in the left panel of the web page

    * In the panel click 'Tools' > 'Export All to Excel'

    * A window named 'Export Results' will open when your download is ready.

    * Click 'View Export' and save file to your project directory

<img src="https://github.com/strmwtr/downtown_assessments/blob/master/img/getting_pin_list.png?raw=true">

# Step 2: Prepare Data

## Looking at the .xls retrieved from the GIS Viewer

In [1]:
#Import pandas module
import pandas as pd 

#Path to the .xls retrieve from the GIS Viewer
f = r'./data/pin_exp.xls'

#Create a dataframe that reads the .xls file
df = pd.read_excel(f)

## Remove all MULTIPIN parcels from df

In [2]:
#Identify all rows in df where MULTIPIN column is not equal to 1
not_multipin = df['MULTIPIN'] != 1
#Create a new dataframe that only contains the rows identified in not_multipin
df = df[not_multipin]
df

Unnamed: 0,FullAddress,OBJECTID,PIN,GPIN,ParcelNumber,OwnerName,CurrentAssessedValue,CurrentTaxYear,CurrentAssessedValueWithLabel,PicturePath,...,MULTIPIN,OwnerAddress,OwnerCityState,OwnerZipCode,SHAPE.STArea(),SHAPE.STLength(),cvGIS.CITY.parcel_area.CreatedBy,cvGIS.CITY.parcel_area.CreatedDate,cvGIS.CITY.parcel_area.ModifiedBy,cvGIS.CITY.parcel_area.ModifiedDate
0,0 3RD ST SE,24846953,280036300,7309,280036300,"LITTLE MOOSE, LLC",204100,2019 Value:,"2019 Value: 204,100",http://realestate.charlottesville.org/IMAGES\P...,...,,P O BOX 4226,CHARLOTTESVILLE VA,22905,1110.750000,178.976929,,,,
1,0 E MARKET ST,24848655,330245100,14744,330245100,FIRST AND MAIN CHARLOTTESVILLE LLC,1122200,2019 Value:,"2019 Value: 1,122,200",,...,,224 14TH STREET NW,CHARLOTTESVILLE VA,22903,11316.250000,425.844859,,,,
2,0 W MARKET ST & 2ND ST NW,24841086,330262000,6656,330262000,"SPENCER, HAWES, ETAL, TR PROTICO PROP LD TR",418400,2019 Value:,"2019 Value: 418,400",http://realestate.charlottesville.org/IMAGES\P...,...,,700 E HIGH ST,CHARLOTTESVILLE VA,22902,3802.250000,254.696684,,,,
3,100 5TH ST SE,24845005,530065300,7426,530065300,"MAIN, RALPH TR OF BLACK DUCK LD TR",664000,2019 Value:,"2019 Value: 664,000",http://realestate.charlottesville.org/IMAGES\P...,...,,P O BOX 2378,CHARLOTTESVILLE VA,22902,787.375000,118.075826,,,,
4,100 E MAIN ST,24839773,280020000,7021,280020000,ONE HUNDRED EAST MAIN LTD PART,1904200,2019 Value:,"2019 Value: 1,904,200",http://realestate.charlottesville.org/IMAGES\P...,...,,MSC BOX 5186,CHARLOTTESVILLE VA,22905,5102.750000,418.086500,,,,
5,100-106 W MAIN ST,24839771,280019000,6984,280019000,"KUTTNER, LUDWIG G, SUC TR TERRACES LD TR",11289500,2019 Value:,"2019 Value: 11,289,500",http://realestate.charlottesville.org/IMAGES\P...,...,,P O BOX 359,KEENE VA,22946,13906.125000,585.282100,,,,
6,101 E WATER ST,24839808,280020100,7197,280020100,ONE HUNDRED EAST MAIN LTD PART,2234000,2019 Value:,"2019 Value: 2,234,000",http://realestate.charlottesville.org/IMAGES\P...,...,,MSC BOX 5186,CHARLOTTESVILLE VA,22905,3556.500000,245.065807,,,,
7,101 W MAIN ST,24849585,330255000,6819,330255000,"WILLIAMS, J & D PETTIT, TR H&M BLDG LD TR",1391200,2019 Value:,"2019 Value: 1,391,200",http://realestate.charlottesville.org/IMAGES\P...,...,,2088 UNION ST STE 1,SAN FRANCISCO CA,94123,3187.000000,272.089204,,,,
8,101-111 E MAIN ST,24850226,330248000,6876,330248000,FIRST AND MAIN CHARLOTTESVILLE LLC,3683300,2019 Value:,"2019 Value: 3,683,300",http://realestate.charlottesville.org/IMAGES\P...,...,,224 14TH STREET NW,CHARLOTTESVILLE VA,22903,8734.250000,374.252065,,,,
9,102 5TH ST SE,24845009,530065400,7447,530065400,"MAIN, RALPH TR OF BLACK DUCK LD TR",518400,2019 Value:,"2019 Value: 518,400",http://realestate.charlottesville.org/IMAGES\P...,...,,P O BOX 2378,CHARLOTTESVILLE VA,22902,930.375000,125.686015,,,,


We can quickly see what is provided by this .xls sheet and the general format of the data it contains. We can see that there are 23 columns by the 5 rows x 23 columns line printed at the end of .head(). 

Looking at the column headers

In [3]:
print("df.head()\n", df.head())
print('-'*80)
print("\ndf.columns:\n", df.columns)
print('-'*80)
print("\ndf['PIN'].head():\n", df['PIN'].head())
print('-'*80)
print("\ndf['PIN'].shape:\n",df['PIN'].shape)
print('-'*80)
print("\ndf['PIN'].unique():\n", df['PIN'].unique())
print('-'*80)
print("\ndf['PIN'].unique().shape:", df['PIN'].unique().shape)

df.head()
                   FullAddress  OBJECTID        PIN   GPIN ParcelNumber  \
0                0 3RD ST SE   24846953  280036300   7309    280036300   
1              0 E MARKET ST   24848655  330245100  14744    330245100   
2  0 W MARKET ST & 2ND ST NW   24841086  330262000   6656    330262000   
3              100 5TH ST SE   24845005  530065300   7426    530065300   
4              100 E MAIN ST   24839773  280020000   7021    280020000   

                                     OwnerName  CurrentAssessedValue  \
0                            LITTLE MOOSE, LLC                204100   
1           FIRST AND MAIN CHARLOTTESVILLE LLC               1122200   
2  SPENCER, HAWES, ETAL, TR PROTICO PROP LD TR                418400   
3           MAIN, RALPH TR OF BLACK DUCK LD TR                664000   
4               ONE HUNDRED EAST MAIN LTD PART               1904200   

  CurrentTaxYear CurrentAssessedValueWithLabel  \
0    2019 Value:          2019 Value:  204,100   
1    2019 V

## Access json file for parcel areas

In [4]:
formatted_gpins = [str(x) for x in df['GPIN'].unique()]
formatted_gpins = formatted_gpins
formatted_gpins = ','.join(formatted_gpins)

parcel_area_url = f"https://gisweb.charlottesville.org/arcgis/rest/services/OpenData_1/MapServer/43/query?where=GPIN%20in%20({formatted_gpins})&outFields=*&outSR=4326&f=json"

print(parcel_area_url)

https://gisweb.charlottesville.org/arcgis/rest/services/OpenData_1/MapServer/43/query?where=GPIN%20in%20(7309,14744,6656,7426,7021,6984,7197,6819,6876,7447,7029,6724,7511,7484,7042,7005,6807,7515,7543,6923,17202,7128,7541,7199,6805,7120,7111,6877,7047,6569,7053,6958,6674,7082,7037,6668,6930,7063,6666,7213,6740,7078,6897,7279,7091,6726,6966,7104,6806,6869,6633,6881,7272,7068,6703,6947,6832,6625,7141,6604,6696,7162,6393,6694,7171,6848,7346,6537,6662,6493,7017,7330,7025,6655,7033,6644,6861,6689,6267,7232,6979,6918,7059,6938,7249,7070,7259,7087,7096,7266,7103,7275,7308,7163,7510,7319,7329,7339,7185,7348,7072,7198,7358,7207,7374,7390,7441,7456,7471,7664,7482,7717,7507,7524,17098,17097,7184,7476)&outFields=*&outSR=4326&f=json


## Preparing annual assessment data

Charlottesville's Open Data Portal : http://opendata.charlottesville.org/

Real Estate (All Assessments) Dataset : http://opendata.charlottesville.org/datasets/real-estate-all-assessments

* On the Real Estate dataset page, in the upper right corner of the window under the map click the 'API Explorer' tab
* Copy the Query URL link and augment the link to match your query
* Use the augmented link to pull data directly from the Open Data portal using the code below

In [5]:
# importing the requests library 
import requests

formatted_pins = [f'%27{x}%27' for x in df['PIN'].unique()]
formatted_pins_1 = formatted_pins[:75]
formatted_pins_2 = formatted_pins[75:]
formatted_pins_1 = ','.join(formatted_pins_1)
formatted_pins_2 = ','.join(formatted_pins_2)

url1 = f"https://gisweb.charlottesville.org/arcgis/rest/services/OpenData_2/MapServer/2/query?where=UPPER(ParcelNumber)%20in%20({formatted_pins_1})%20&outFields=ParcelNumber,LandValue,ImprovementValue,TotalValue,TaxYear&outSR=4326&f=json"
url2 = f"https://gisweb.charlottesville.org/arcgis/rest/services/OpenData_2/MapServer/2/query?where=UPPER(ParcelNumber)%20in%20({formatted_pins_2})%20&outFields=ParcelNumber,LandValue,ImprovementValue,TotalValue,TaxYear&outSR=4326&f=json"

r1 = requests.get(url1)
r2 = requests.get(url2)

d1 = r1.json()
d2 = r2.json()
print(r1,r2)

<Response [200]> <Response [200]>


After testing the requests.get(url), I can request up to 120 parcels at a time before recieving a 404 Error. I have 126 parcels of interest. I will break my request up into 2 parts, [:75] and [75:].

On a future date I will write a function that checks for responses under 400 based on length of results. This will eliminate the need for writing out formatted_pins_1 .. 2 .. 3. 

Check data type

In [6]:
#Check data type
print(type(d1))

print()
#Check keys
print(d1.keys())

print()
#check features key
print(d1['features'][:5])

print()
#Check type of d1['features']
type(d1['features'])

<class 'dict'>

dict_keys(['displayFieldName', 'fieldAliases', 'fields', 'features'])

[{'attributes': {'ParcelNumber': '530056000', 'LandValue': 159400, 'ImprovementValue': 463500, 'TotalValue': 622900, 'TaxYear': '2019'}}, {'attributes': {'ParcelNumber': '530056000', 'LandValue': 155700, 'ImprovementValue': 437000, 'TotalValue': 592700, 'TaxYear': '2018'}}, {'attributes': {'ParcelNumber': '530056000', 'LandValue': 155700, 'ImprovementValue': 436549, 'TotalValue': 592249, 'TaxYear': '2017'}}, {'attributes': {'ParcelNumber': '530056000', 'LandValue': 94900, 'ImprovementValue': 379300, 'TotalValue': 474200, 'TaxYear': '2016'}}, {'attributes': {'ParcelNumber': '530056000', 'LandValue': 86300, 'ImprovementValue': 379300, 'TotalValue': 465600, 'TaxYear': '2015'}}]



list

## Create date series based on features and combine data frames into a single df

In [7]:
df1 = pd.DataFrame(d1['features'])
df2 = pd.DataFrame(d2['features'])
print('.shape of df1, df2: ', df1.shape, df2.shape)
df1 = df1.append(df2, ignore_index = True)
print('.shape of df1 after appending df2: ', df1.shape)

print('\ndf1.head(): \n',df1.head())

print('\ndf1.keys(): ',df1.keys())

print('\ntype(df1["attributes"]): ',type(df1['attributes']))

.shape of df1, df2:  (1683, 1) (989, 1)
.shape of df1 after appending df2:  (2672, 1)

df1.head(): 
                                           attributes
0  {'ParcelNumber': '530056000', 'LandValue': 159...
1  {'ParcelNumber': '530056000', 'LandValue': 155...
2  {'ParcelNumber': '530056000', 'LandValue': 155...
3  {'ParcelNumber': '530056000', 'LandValue': 949...
4  {'ParcelNumber': '530056000', 'LandValue': 863...

df1.keys():  Index(['attributes'], dtype='object')

type(df1["attributes"]):  <class 'pandas.core.series.Series'>


## Create a single data frame based on combined series data

In [8]:
assessments = pd.DataFrame([x for x in df1['attributes']])
assessments.head()

Unnamed: 0,ImprovementValue,LandValue,ParcelNumber,TaxYear,TotalValue
0,463500,159400,530056000,2019,622900
1,437000,155700,530056000,2018,592700
2,436549,155700,530056000,2017,592249
3,379300,94900,530056000,2016,474200
4,379300,86300,530056000,2015,465600


In [9]:
print('assessments.keys():\n',assessments.keys())

print('\nassessments.shape: ', assessments.shape)

print('\nassessments.isnull().any():\n', assessments.isnull().any())

assessments.keys():
 Index(['ImprovementValue', 'LandValue', 'ParcelNumber', 'TaxYear',
       'TotalValue'],
      dtype='object')

assessments.shape:  (2672, 5)

assessments.isnull().any():
 ImprovementValue    False
LandValue           False
ParcelNumber        False
TaxYear             False
TotalValue          False
dtype: bool


Create data frame from df that holds PIN and GPIN. Will be used to joined assessments, so that parcel area can be joined with assessments

In [16]:
df_key = df[['PIN','GPIN']]
df_key.head()

Unnamed: 0,PIN,GPIN
0,280036300,7309
1,330245100,14744
2,330262000,6656
3,530065300,7426
4,280020000,7021


In [14]:
t = pd.merge(assessments, df_key, how='inner', left_on=['ParcelNumber'], right_on=['PIN'])
t.head()

Unnamed: 0,ImprovementValue,LandValue,ParcelNumber,TaxYear,TotalValue,PIN,GPIN
0,693300,260900,2800371C0,2019,954200,2800371C0,7330
1,654000,254900,2800371C0,2018,908900,2800371C0,7330
2,647565,254900,2800371C0,2017,902465,2800371C0,7330
3,333600,123200,2800371C0,2016,456800,2800371C0,7330
4,333600,112000,2800371C0,2015,445600,2800371C0,7330
5,333600,89600,2800371C0,2014,423200,2800371C0,7330
6,362700,60500,2800371C0,2013,423200,2800371C0,7330
7,362700,60500,2800371C0,2012,423200,2800371C0,7330
8,362700,60500,2800371C0,2011,423200,2800371C0,7330
9,362700,60500,2800371C0,2010,423200,2800371C0,7330


print("\nassessments['ParcelNumber'].describe()\n", assessments['ParcelNumber'].describe())

print("\nassessments['TaxYear'].describe()\n", assessments['TaxYear'].describe())

print("\nassessments['TaxYear'].min(), assessments['TaxYear'].max()\n", assessments['TaxYear'].min(), assessments['TaxYear'].max())

print("\nassessments['ImprovementValue'].describe()\n", assessments['ImprovementValue'].describe())

print("\nassessments['LandValue'].describe()\n", assessments['LandValue'].describe())

print("\nassessments['TotalValue'].describe()\n", assessments['TotalValue'].describe())

taxyearmin = assessments['TaxYear'] == assessments['TaxYear'].min()
assessments[taxyearmin].describe()

taxyearmax = assessments['TaxYear'] == assessments['TaxYear'].max()
assessments[taxyearmax].describe()

## assessments[taxyearmax].describe()-assessments[taxyearmin].describe()

import folium

print(parcel_area_url)

m = folium.Map(location=[38.0309,-78.4804],tiles='Stamen Terrain',zoom_start=17)
folium.GeoJson(parcel_area_url,name='Parcels', style_function=style_function).add_to(m)
folium.LayerControl().add_to(m)
m

# Join df with assessments

# Join Parcel Area with assessments

# Plot each parcels total, land, and improvement value across all years on 3 line graphs, one for each assessment type

# Map the same data as above via folium

# Create time lapse of maps