# Convert a pandas dataframe to geojson for web-mapping

In [1]:
import pandas as pd, json as json, urllib

First download data from the city of Berkeley's API. You can use Socrata's $limit parameter to specify how many rows to grab (otherwise the default is 1,000 rows of data): https://dev.socrata.com/docs/paging.html

In [2]:
# API endpoint for city of Berkeley's 311 calls
endpoint_url = 'https://data.cityofberkeley.info/resource/k489-uv4i.json?$limit=5000'

In [3]:
# open a connection to the URL
connection = urllib.urlopen(endpoint_url)

# download the results
results = connection.read()

# parse the string into a Python data structure
data = json.loads(results)

Next, turn the json data into a dataframe and clean it up a bit: drop unnecessary columns and any rows that lack lat-long data. We want to make our json file as small as possible (prefer under 5 mb) so that it can be loaded over the Internet to anyone viewing your map, without taking forever to download a huge file.

In [4]:
# turn the json data into a dataframe and see how many rows and what columns we have
df = pd.DataFrame(data)

print 'We have {} rows'.format(len(df))
str(df.columns.tolist())

We have 5000 rows


"[u'apn', u'city', u'indbdate', u'issue_description', u'issue_type', u'latitude', u'location', u'longitude', u'neighborhood_district', u'object_type', u'secondary_issue_type', u'state', u'street_address', u'ticket_closed_date_time', u'ticket_created_date_time', u'ticket_id', u'ticket_status']"

In [5]:
# convert lat-long to floats and change address from ALL CAPS to regular capitalization
df['latitude'] = df['latitude'].astype(float)
df['longitude'] = df['longitude'].astype(float)
df['street_address'] = df['street_address'].str.title()

In [6]:
# we don't need all those columns - only keep useful ones
cols = ['issue_description', 'issue_type', 'latitude', 'longitude', 'street_address', 'ticket_status']
df_subset = df[cols]

In [7]:
# drop any rows that lack lat/long data
df_geo = df_subset.dropna(subset=['latitude', 'longitude'], axis=0, inplace=False)

print 'We have {} geotagged rows'.format(len(df_geo))
df_geo.tail()

We have 2341 geotagged rows


Unnamed: 0,issue_description,issue_type,latitude,longitude,street_address,ticket_status
4991,Residential Bulky Pickup,Refuse and Recycling,37.857347,-122.278136,1537 Stuart St,Closed
4993,Residential Missed Pickup Integration,Refuse and Recycling,37.879342,-122.260038,2540 Cedar St,Closed
4994,Miscellaneous Service Request,General Questions/information,37.855705,-122.287036,2734 Wallace St,Closed
4996,Illegal Dumping - City Property,"Streets, Utilities, and Transportation",37.86216,-122.272241,2554 M L King Jr Way,Closed
4997,Commercial Service Day Change,Refuse and Recycling,37.867434,-122.252197,2400 Piedmont Ave,Closed


In [8]:
# what is the distribution of issue types?
df_geo['issue_type'].value_counts()

Refuse and Recycling                            1719
Streets, Utilities, and Transportation           246
General Questions/information                    212
Parks, Trees and Vegetation                       54
Environmental Services and Programs               33
Facilities, Electrical & Property Management      23
Business License                                  22
Traffic and Transportation                        15
Graffiti and Vandalism                            10
Other Account Services and Billing                 4
Equipment Maintenance                              3
dtype: int64

Finally, convert each row in the dataframe to a geojson-formatted feature and save the result as a file. The format is pretty simple and you can see it here: http://geojson.org/

In [9]:
# create a new python dict to contain our geojson data, using geojson format
geojson = {'type':'FeatureCollection', 'features':[]}

In [10]:
# loop through each row in the dataframe and convert each row to geojson format
for _, row in df_geo.iterrows():
    feature = {'type':'Feature',
               'geometry':{
                   'type':'Point',
                   'coordinates':[row['longitude'],row['latitude']]},
               'properties': {
                   'street_address':row['street_address'],
                   'issue_description':row['issue_description'], 
                   'issue_type':row['issue_type'],
                   'ticket_status':row['ticket_status']}}
    
    # add this feature (aka, converted dataframe row) to the list of features inside our dict
    geojson['features'].append(feature)

In [11]:
# save the geojson result to a file
output_filename = 'dataset.js'
with open(output_filename, 'wb') as output_file:
    output_file.write('var dataset = ')
    json.dump(geojson, output_file, indent=2)  
    
# how many features did we save to the geojson file?
print '{} geotagged features saved to file'.format(len(geojson['features']))

2341 geotagged features saved to file


Now just load that dataset.js file with leaflet to map it. See berkeley-311-map.html for an example of creating the map, and see sample-blog-post.html for an example of how to display this map inside another web page.