# Web to GIS

***
**This program will scrape data from a webpage, geocode the data into latitude and longitude coordinates, and then create a map for viewing. The website chosen is [Travel + Leisure's article on the Top 15 Cities in the United States](http://www.travelandleisure.com/worlds-best/cities-in-us).**

[MAP - Top 15 Cities in the U.S.](http://students.washington.edu/sheenaw/top15.html)

In [1]:
# import packages
import geopandas as gpd
import pandas, folium, lxml, cssselect, urllib, urllib2, json, shapely, shapely.geometry, os
from lxml import html
from bs4 import BeautifulSoup
%matplotlib inline

### 1) Scrape Website

In [2]:
# define URL and create html element
url = 'http://www.travelandleisure.com/worlds-best/cities-in-us'
request = urllib2.Request(url)       # make request to URL
connect = urllib2.urlopen(request)   # connect to URL
doc_text = connect.read()            # read URL as string
doc = lxml.html.fromstring(doc_text) # convert string to html element
doc.make_links_absolute(url)         # make all links in the document absolute

In [3]:
# # code from Data Journalism Handbook scraper example code 

# # check error
# try:
#     f = urllib2.urlopen(url)
# except urllib2.HTTPError, e:
#     print e.fp.read()

In [4]:
# # code from Data Journalism Handbook scraper example code 

# req = urllib2.Request(url, headers={'User-Agent' : "Magic Browser"}) 
# con = urllib2.urlopen( req )
# doc_text = con.read()
# doc = lxml.html.fromstring(doc_text)
# doc.make_links_absolute(url)

In [5]:
# store the html element in BeautifulSoup format for easier reading
soup = BeautifulSoup(doc_text, 'html.parser')
# print soup.prettify()

In [6]:
# use CSS Selector 'h1' to get title of document
title = doc.cssselect('h1')
title = title[0].text
print title

# create list for cities and dictionary for mapping city name with it's rank
cities = []
top15 = {}

# loop through html element and use CSS Selector 'h2' to get city names and add them to list
for row in doc.cssselect('h2'):
    city = row.text.strip(' \n\t\r') # convert row to text, remove whitespace, line breaks, tabs, and returns from either side of string
    cities.append(city)
# print cities

# loop through list to get city names and rankings and add them to dictionary
for city in cities:
    items = city.split('. ') # split the string to get rank and city name
    rank = items[0]
    name = items[1]
    top15[name] = rank
print top15

The Top 15 Cities in the United States
{'Santa Fe, New Mexico': '2', 'Asheville, North Carolina': '9', 'Williamsburg, Virginia': '13', 'Boston, Massachusetts': '14', 'New York City': '7', 'Portland, Oregon': '15', 'San Antonio, Texas': '10', 'Nashville, Tennessee': '5', 'Savannah, Georgia': '3', 'Charleston, South Carolina': '1', 'San Francisco, California': '11', 'Honolulu, Hawaii': '6', 'New Orleans, Louisiana': '4', 'Austin, Texas': '8', 'Chicago, Illinois': '12'}


### 2) Geocode Locations

In [7]:
# uses the Nominatim API to get spatial information of each city
# inputs:
#   place (required): the city to search for
#   params: the parameters for the API call
#   baseurl: the base url for the API call
# returns: a connection to the URL in json format
def nominatim(place, 
              params = {},
              baseurl = 'https://nominatim.openstreetmap.org/search'):
    params['q'] = place
    params['format'] = 'json'
    url = baseurl + '?' + urllib.urlencode(params)
    print params['q'] + ': ' + url
    return urllib2.urlopen(url)

In [8]:
# create list for jsons of each city
jsons = []

# loop through dictionary and call method 'nominatim' to get json for each city and add them to list
for city in top15:
    place = city + ', USA'
    text = nominatim(place).read() # create string by calling 'nominatim' method
    jsonDocs = json.loads(text)    # convert string to json
    jsonDoc = jsonDocs[0]          # use the first json returned (the top result from Nominatim API call)
    jsonDoc['name'] = city
    jsons.append(jsonDoc)

Santa Fe, New Mexico, USA: https://nominatim.openstreetmap.org/search?q=Santa+Fe%2C+New+Mexico%2C+USA&format=json
Asheville, North Carolina, USA: https://nominatim.openstreetmap.org/search?q=Asheville%2C+North+Carolina%2C+USA&format=json
Williamsburg, Virginia, USA: https://nominatim.openstreetmap.org/search?q=Williamsburg%2C+Virginia%2C+USA&format=json
Boston, Massachusetts, USA: https://nominatim.openstreetmap.org/search?q=Boston%2C+Massachusetts%2C+USA&format=json
New York City, USA: https://nominatim.openstreetmap.org/search?q=New+York+City%2C+USA&format=json
Portland, Oregon, USA: https://nominatim.openstreetmap.org/search?q=Portland%2C+Oregon%2C+USA&format=json
San Antonio, Texas, USA: https://nominatim.openstreetmap.org/search?q=San+Antonio%2C+Texas%2C+USA&format=json
Nashville, Tennessee, USA: https://nominatim.openstreetmap.org/search?q=Nashville%2C+Tennessee%2C+USA&format=json
Savannah, Georgia, USA: https://nominatim.openstreetmap.org/search?q=Savannah%2C+Georgia%2C+USA&form

In [9]:
# create list for coordinates of each city
data = []

# loop through list of jsons to get coordinates of each city and add them to list of coordinates
for city in jsons:
    name = city['name']
    lat = float(city['lat'])
    lon = float(city['lon'])
    place = [name, lat, lon]
    data.append(place)

# create DataFrame for coordinates of each city
df = pandas.DataFrame(data, columns = ['name', 'lat', 'lon'])
# print df

# create list for latitute/longiture pairs of each city
coords = zip(df['lon'], df['lat'])
# print coords

# create Point objects for each city latitude/longitude pair
geom = [shapely.geometry.Point(city) for city in coords]
# print geom

# create GeoSeries for Point objects
gs = gpd.GeoSeries(geom)
# print gs

# create GeoDataFrame using DataFrame and GeoSeries
gdf = gpd.GeoDataFrame(df, geometry = gs)
print "GeoDataFrame:"
gdf.head()

GeoDataFrame:


Unnamed: 0,name,lat,lon,geometry
0,"Santa Fe, New Mexico",35.687,-105.9378,POINT (-105.9377997 35.6869996)
1,"Asheville, North Carolina",35.60095,-82.554016,POINT (-82.5540161 35.6009498)
2,"Williamsburg, Virginia",37.270879,-76.707404,POINT (-76.7074042 37.2708788)
3,"Boston, Massachusetts",42.360482,-71.059568,POINT (-71.0595678 42.3604823)
4,New York City,40.730646,-73.986614,POINT (-73.9866136 40.7306458)


### 3) Create Shapefile and Map

In [10]:
# convert GeoDataFrame to shapefile and save file
gdf.to_file('C:\\Users\\sheen\\Documents\\GitHub\\Portfolio\\WebToGIS\\shapefile\\geodataframe.shp')

In [11]:
# create list for ranks
ranks = []

# loop through dictionary and add the city's rank to the list
for city in top15:
    ranks.append(top15[city])

# add list to GeoDataFrame
gdf['rank'] = ranks

In [12]:
# makes jsons readable
# inputs:
#   obj (required): the json to format
# returns: a json in readable format
def pretty(obj):
    return json.dumps(obj, sort_keys=True, indent=2)

In [13]:
# create map
mapTop15 = folium.Map(location = [34.153475, -117.176504], 
                      zoom_start = 3, 
                      tiles = 'Stamen Toner')

# # create list for styles
# style = []
# i = 1
# while i <= len(gdf):
#     style.append({'color': '#ffffff', 'marker-size': 'medium', 'marker-symbol': 'city'})
#     i += 1

# # add list to GeoDataFrame
# gdf['style'] = style

# convert GeoDataFrame to GeoJSON
s = gdf.to_json()
gj = json.loads(s) 
# print pretty(gj)

# loop through json and map each city with popups
i = 0
while i < 15:
    lat = gj['features'][i]['geometry']['coordinates'][1]
    lon = gj['features'][i]['geometry']['coordinates'][0]
    name = gj['features'][i]['properties']['name']
    rank = gj['features'][i]['properties']['rank']
    folium.Marker(
        [lat, lon], 
        popup = '#<b>' + rank + '</b><br>' + name,
        icon = folium.Icon(color = 'green', icon = 'star')
    ).add_to(mapTop15)
    i += 1

# save map as html document
mapTop15.save('top15.html')

# view map
mapTop15

References:
- [Splitting on first occurence (stackoverflow)](https://stackoverflow.com/questions/6903557/splitting-on-first-occurrence)
- [Errors and Exceptions (python docs)](https://docs.python.org/2/tutorial/errors.html)
- [Data Journalism Handbook scraper example code.ipynb (canvas)](https://canvas.uw.edu/courses/1127520/files/45627733/download?verifier=hObkCclQ1c4JyDJn7L7k8yOSphV3Mac3XvVPcIYt&wrap=1)
- [Geopanda write GeoDataFrame into shapefile or spatialite (stack exchange)](https://gis.stackexchange.com/questions/237162/geopanda-write-geodataframe-into-shapefile-or-spatialite)
- [Markers (python-visualization.github.io/folium)](https://python-visualization.github.io/folium/quickstart.html#Markers)
- [Geopandas.ipynb (github)](https://github.com/python-visualization/folium/blob/master/examples/Geopandas.ipynb)