# Mapping the assessed value of properties on the downtown mall over time

# insert section on going through the data science process

## Obtaining Data
 
### Determining which properties to include in the analysis

* Charlottesville GIS Viewer: https://gisweb.charlottesville.org/GisViewer/

* Under Map option, turn on 'Parcels & Buildings' > 'Parcels'. Turn everything else off.

* Zoom to area of interest on map

* Under 'Tools' select 'Identify'

* In 'Identify' toolbar select 'Custom Shape' and under 'Layer' select 'Parcels'

* Using mouse on map, click boundary around area of interest

* A list will appear in the left panel of the web page

* In the panel click 'Tools' > 'Export All to Excel'

* A window named 'Export Results' will open when your download is ready.

* Click 'View Export' and save file to your project directory


<img src="getting_pin_list.png">

## Looking at the .xls retrieved from the GIS Viewer

In [1]:
#Import pandas module
import pandas as pd 

#Path to the .xls retrieve from the GIS Viewer
file = r'/home/bob/projects/assessments/pin_exp.xls'

#Create a dataframe that reads the .xls file
df = pd.read_excel(file)

### Let's look at the first 5 rows of our .xls

In [2]:
#Print the first 5 rows
print(df.head())

                  FullAddress  OBJECTID        PIN   GPIN ParcelNumber  \
0                0 3RD ST SE   24846953  280036300   7309    280036300   
1              0 E MARKET ST   24848655  330245100  14744    330245100   
2  0 W MARKET ST & 2ND ST NW   24841086  330262000   6656    330262000   
3              100 5TH ST SE   24845005  530065300   7426    530065300   
4              100 E MAIN ST   24839773  280020000   7021    280020000   

                                     OwnerName  CurrentAssessedValue  \
0                            LITTLE MOOSE, LLC                204100   
1           FIRST AND MAIN CHARLOTTESVILLE LLC               1122200   
2  SPENCER, HAWES, ETAL, TR PROTICO PROP LD TR                418400   
3           MAIN, RALPH TR OF BLACK DUCK LD TR                664000   
4               ONE HUNDRED EAST MAIN LTD PART               1904200   

  CurrentTaxYear CurrentAssessedValueWithLabel  \
0    2019 Value:          2019 Value:  204,100   
1    2019 Value:      

We can quickly see what is provided by this .xls sheet and the general format of the data it contains. We can see that there are 23 columns by the 5 rows x 23 columns line printed at the end of .head(). 

Let's take a closer look at the column PIN

In [3]:
#Prints just the 'PIN' column of our data frame
print(df['PIN'])
#Prints the size of our data frame
print(df['PIN'].shape)

0      280036300
1      330245100
2      330262000
3      530065300
4      280020000
5      280019000
6      280020100
7      330255000
8      330248000
9      530065400
10     280021000
11     330244000
12     280051A00
13     530065500
14     280022000
15     330232000
16     330256000
17     530065600
18     280051B00
19     330241000
20     280016100
21     530058000
22     530065700
23     280026100
24     330258000
25     280013000
26     530057000
27     330242000
28     280023000
29     330278000
         ...    
177    530072000
178    530072000
179    530072000
180    530072000
181    530072000
182    530072000
183    530072000
184    530072000
185    530072000
186    530072000
187    530072000
188    530072000
189    530072000
190    530072000
191    530072000
192    530072000
193    530072000
194    530072000
195    530072000
196    530072000
197    280016000
198    280016000
199    280016000
200    280016000
201    280016000
202    280016000
203    280016000
204    2800160

In [4]:
#The dataframe looks like it contains all of the PIN's for the area, but it looks like there are duplucates
#Let's call the same dataframe, but with the .unique() function, to isolate the unique values
print(df['PIN'].unique())
print(df['PIN'].unique().shape)

[280036300 330245100 330262000 530065300 280020000 280019000 280020100
 330255000 330248000 530065400 280021000 330244000 '280051A00' 530065500
 280022000 330232000 330256000 530065600 '280051B00' 330241000 280016100
 530058000 530065700 280026100 330258000 280013000 530057000 330242000
 280023000 330278000 330225000 280018000 330254000 530056000 330224000
 330259000 330222000 280024000 330265000 280028000 330260000 280025000
 330245000 530160000 280026000 330261000 330219000 280027000 330243000
 280010000 330263000 330240100 280031000 280012000 330266000 330240000
 330238000 330270000 280034000 330271000 330268000 280035000 '330155L00'
 330269000 280036000 330237000 280036200 330276000 330272000 330277000
 330235000 '2800371C0' 330234000 330273000 330233000 330274000 280001000
 330155300 330155100 280040000 330231000 330223000 330230000 330220000
 280041000 330229000 280042000 330228000 330227000 280043000 330226000
 280044000 280045000 530059000 280058000 280046000 280047000 28004800

Now we have a list of unique PIN's that we can use to build our desired Assessments over time dataframe from. Let's set this to it's own dataframe so we can easily call it in the future

In [5]:
unique_pins_df = pd.DataFrame(df['PIN'].unique())
print(unique_pins_df)

             0
0    280036300
1    330245100
2    330262000
3    530065300
4    280020000
5    280019000
6    280020100
7    330255000
8    330248000
9    530065400
10   280021000
11   330244000
12   280051A00
13   530065500
14   280022000
15   330232000
16   330256000
17   530065600
18   280051B00
19   330241000
20   280016100
21   530058000
22   530065700
23   280026100
24   330258000
25   280013000
26   530057000
27   330242000
28   280023000
29   330278000
..         ...
96   280047000
97   280048000
98   530060000
99   280049000
100  530054000
101  530061000
102  280050000
103  530062000
104  280051000
105  280052000
106  530065200
107  530065000
108  530067000
109  530074000
110  530068000
111  530075000
112  530070000
113  530077A00
114  530077B00
115  530077C00
116  530080000
117  530164000
118  330257100
119  530055000
120  530054101
121  530091000
122  280037A00
123  530064000
124  530072000
125  280016000

[126 rows x 1 columns]


## Obtaining annual assessment data

Charlottesville's Open Data Portal : http://opendata.charlottesville.org/

Real Estate (All Assessments) Dataset : http://opendata.charlottesville.org/datasets/real-estate-all-assessments

* On the Real Estate dataset page, in the upper right corner of the window under the map click the 'APIs' drop down
* Copy the GeoJSON link
* Use the GeoJSON link to pull data directly from the Open Data portal using the code below

In [6]:
# importing the requests library 
import requests

#api endpoint, copied from the API tab, under GeoJSON
url = r'https://opendata.arcgis.com/datasets/b993cd4e2e1b4ba097fb58c90725f5da_2.geojson'
url = r'https://gisweb.charlottesville.org/arcgis/rest/services/OpenData_2/MapServer/2/query?where=1%3D1&outFields=*&outSR=4326&f=json'
# sending get request and saving the response as response object 
r = requests.get(url) 

# extracting data in json format 
data = r.json() 

Check data type

In [7]:
print(type(data))

<class 'dict'>


Check length of data dictionary

In [8]:
print(len(data))

2


print the keys in data

In [9]:
print(data.keys())

dict_keys(['type', 'features'])


data has two keys: 'type' and 'features'

print the first key: 'type'

In [10]:
print(data["type"])

FeatureCollection


Next, we want to print the second key 'features', but this code will print out all of the features in the API. 
It routinely crashing my computer due to it's size. 

We can find out how many data points there are by printing the number of features in data['features']

In [11]:
print(len(data['features']))

331443


Similar to using .head() on our dataframe earlier, let's look at the first 5 values of data['features']

In [12]:
print(data["features"][:5])

[{'type': 'Feature', 'properties': {'RecordID_Int': 1, 'ParcelNumber': '010001000', 'LandValue': 29172900, 'ImprovementValue': 149711300, 'TotalValue': 178884200, 'TaxYear': '2019', 'StreetName': 'EMMET ST N', 'StreetNumber': '1117', 'Unit': ''}, 'geometry': None}, {'type': 'Feature', 'properties': {'RecordID_Int': 2, 'ParcelNumber': '010001000', 'LandValue': 25005400, 'ImprovementValue': 146802400, 'TotalValue': 171807800, 'TaxYear': '2018', 'StreetName': 'EMMET ST N', 'StreetNumber': '1117', 'Unit': ''}, 'geometry': None}, {'type': 'Feature', 'properties': {'RecordID_Int': 3, 'ParcelNumber': '010001000', 'LandValue': 24449500, 'ImprovementValue': 142363700, 'TotalValue': 166813200, 'TaxYear': '2017', 'StreetName': 'EMMET ST N', 'StreetNumber': '1117', 'Unit': ''}, 'geometry': None}, {'type': 'Feature', 'properties': {'RecordID_Int': 4, 'ParcelNumber': '010001000', 'LandValue': 22848500, 'ImprovementValue': 100504900, 'TotalValue': 123353400, 'TaxYear': '2016', 'StreetName': 'EMMET ST

After looking at the data, we can determine that we are interested in several of the keys. 

In [13]:
#Some key values we are going to want to look at are ParcelNumber, TaxYear, LandValue, and ImprovementValue
keys_to_pull = ['ParcelNumber', 'TaxYear', 'LandValue', 'ImprovementValue']

#for the first five values in data['features']
for values in data['features'][0:5]: 
    #print(values['properties'].keys())
    #print(values['properties'].items())
    #for each key in keys_to_pull
    for key_to_pull in keys_to_pull:
        #print the value at key
        print(f"{key_to_pull}: {values['properties'].get(key_to_pull)}")
    print()

ParcelNumber: 010001000
TaxYear: 2019
LandValue: 29172900
ImprovementValue: 149711300

ParcelNumber: 010001000
TaxYear: 2018
LandValue: 25005400
ImprovementValue: 146802400

ParcelNumber: 010001000
TaxYear: 2017
LandValue: 24449500
ImprovementValue: 142363700

ParcelNumber: 010001000
TaxYear: 2016
LandValue: 22848500
ImprovementValue: 100504900

ParcelNumber: 010001000
TaxYear: 2015
LandValue: 22848500
ImprovementValue: 98390000



After taking a look at that historic assessment records data from the GeoJSON we can see a lot of similarities between the .xls file we pulled prevoiusly. A <i>key</i> match up is that both the .xls data and the GeoJSON data have a field in common. In the .xls there is a PIN field and in the GeoJSON there is a ParcelNumber field. We should be able to use this common key to join the two tables.

331,441 features is a lot of data to work with, next we will figure out how to only pull out the values outlined in our .xls from our API

In [None]:
#api endpoint, copied from the API tab, under GeoJSON
url = r'https://gisweb.charlottesville.org/arcgis/rest/services/OpenData_2/MapServer/2/query?where=1%3D1&outFields=*&outSR=4326&f=json'

# sending get request and saving the response as response object 
r = requests.get(url) 

# extracting data in json format 
data = r.json() 

payload_tuples = [('key1', 'value1'), ('key1', 'value2')]
r1 = requests.get('https://httpbin.org/post', data=payload_tuples)