### Geospatial data manupulation using geocoding and table joins
- How to convert names of places to geographic coordinates
- How to join information from multiple GeoDataFrames

### Geocoding
Geocoding is the process of converting the name of a place or an address to a location on a map.
For example, in Google Maps geocoder is used to look up a gegraphic location.

In [39]:
# Import libraries
# We will use geopandas tools for that
from geopandas.tools import geocode
import pandas as pd
import numpy as np
import geopandas as gpd

import folium
from folium import Choropleth, Circle, Marker
from folium.plugins import HeatMap, MarkerCluster

- To use geocoder, we need to provide:
        - the name or address as pyton string
        - the name of provider
            - to avoid having to provide an API key, we will use
                -OpenStreetMap Nominatim geocoder
- it will return GeoDataFrame with two columns: geometry and address

In [11]:
# !pip install geopy
result = geocode("The Great Pramid of Giza", provider=None)
result

Unnamed: 0,geometry,address
0,POINT (31.08151 29.98449),"The Ring Road, Giza, Egypt"


- So we see that geometry column is a Point object
- So we can get latitude and longitude from the y and x atrributes

In [15]:
#result.geometry.x
point = result.geometry.iloc[0]

print('Latitude:', point.y)
print('Longitude:', point.x)

Latitude: 29.984491259822
Longitude: 31.0815092373299


### Obtaining location of top universities in Europe from csv file by geocoding

In [19]:
# Here we will geocode many different addresses
# reading the csv file 
universities = pd.read_csv(r'C:\Users\Rabbil\Documents\GeoPython\GeoSpatial_Analysis\top_universities.csv')
universities.head()

Unnamed: 0,Name
0,University of Oxford
1,University of Cambridge
2,Imperial College London
3,ETH Zurich
4,UCL


- Now we will use lambda function to apply the geocoder to every row in the dataframe
- We also use try/except satement to account for the case that the geocoding is unsuccessful

In [21]:
# creating a function for geocoder
def my_geocoder(row):
    try:
        point = geocode(row, provider=None).geometry.iloc[0]
        return pd.Series({'Latitude':point.y, 'Longitude':point.x, 'geometry':point})
    except:
        return None

# apply the function to the dataframe
universities[['Latitude', 'Longitude', 'geometry']] = universities.apply(lambda x:my_geocoder(x['Name']), axis=1)
universities.head()

Unnamed: 0,Name,Latitude,Longitude,geometry
0,University of Oxford,51.756802,-1.254726,POINT (-1.25472605228205 51.7568016052847)
1,University of Cambridge,52.205303,0.116613,POINT (0.11661300063119 52.2053031921855)
2,Imperial College London,51.498997,-0.175495,POINT (-0.17549499869328 51.4989967346843)
3,ETH Zurich,47.376415,8.548102,POINT (8.548102378841399 47.3764152526776)
4,UCL,51.523815,-0.13306,POINT (-0.13305999338621 51.5238151550844)


In [24]:
# Let's print the % of address are geocoded
print("{}% of addresses were geocoded!".format(
    (1 - sum(np.isnan(universities["Latitude"])) / len(universities)) * 100))

99.0% of addresses were geocoded!


In [35]:
# Drop universities that were not successfully geocoded
universities = universities.loc[~np.isnan(universities['Latitude'])]

type(universities)

pandas.core.frame.DataFrame

In [37]:
# Change the panda dataframe into GeoDataFrame
universities = gpd.GeoDataFrame(universities, geometry=universities.geometry)
universities.crs = {'init': 'epsg:4326'}
print(type(universities))
universities.head()

<class 'geopandas.geodataframe.GeoDataFrame'>


  return _prepare_from_string(" ".join(pjargs))


Unnamed: 0,Name,Latitude,Longitude,geometry
0,University of Oxford,51.756802,-1.254726,POINT (-1.25473 51.75680)
1,University of Cambridge,52.205303,0.116613,POINT (0.11661 52.20530)
2,Imperial College London,51.498997,-0.175495,POINT (-0.17549 51.49900)
3,ETH Zurich,47.376415,8.548102,POINT (8.54810 47.37642)
4,UCL,51.523815,-0.13306,POINT (-0.13306 51.52382)


### Vizualiazation of all the locations

In [41]:
# Creat a map
map = folium.Map(location=[54, 15], tiles='openstreetmap', zoom_start=2)

# Add a points to the map
for idx, row in universities.iterrows():
    folium.Marker([row['Latitude'], row['Longitude']], popup=row['Name']).add_to(map)
    
# Display the map
map
    
    

- So noticibale that certain locations are certainly inaccurate, as they are not in Europe

### Table joins

- We wil combine data from different sources using
    - Attribute join 
    - Spatial join

##### Attribute join 
- It is similar to pd.DataFrame.join() having a shared index
- For geodataframe we use gpd.GeoDataFrame.merge()

In [43]:
# Let's read the data file containing boundaries of every country in Europe
# reading data
# First we read the world boundaries data
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
world.head()

Unnamed: 0,pop_est,continent,name,iso_a3,gdp_md_est,geometry
0,920938,Oceania,Fiji,FJI,8374.0,"MULTIPOLYGON (((180.00000 -16.06713, 180.00000..."
1,53950935,Africa,Tanzania,TZA,150600.0,"POLYGON ((33.90371 -0.95000, 34.07262 -1.05982..."
2,603253,Africa,W. Sahara,ESH,906.5,"POLYGON ((-8.66559 27.65643, -8.66512 27.58948..."
3,35623680,North America,Canada,CAN,1674000.0,"MULTIPOLYGON (((-122.84000 49.00000, -122.9742..."
4,326625791,North America,United States of America,USA,18560000.0,"MULTIPOLYGON (((-122.84000 49.00000, -120.0000..."


In [44]:
# now slicing the data into Europe boundaries data
europe = world.loc[world.continent=='Europe'].reset_index(drop=True)
europe.head()

Unnamed: 0,pop_est,continent,name,iso_a3,gdp_md_est,geometry
0,142257519,Europe,Russia,RUS,3745000.0,"MULTIPOLYGON (((178.725 71.099, 180.000 71.516..."
1,5320045,Europe,Norway,-99,364700.0,"MULTIPOLYGON (((15.143 79.674, 15.523 80.016, ..."
2,67106161,Europe,France,-99,2699000.0,"MULTIPOLYGON (((-51.658 4.156, -52.249 3.241, ..."
3,9960487,Europe,Sweden,SWE,498100.0,"POLYGON ((11.027 58.856, 11.468 59.432, 12.300..."
4,9549747,Europe,Belarus,BLR,165400.0,"POLYGON ((28.177 56.169, 29.230 55.918, 29.372..."


In [45]:
# let' create sub geodataframe from europe geodataframe
europe_stats = europe[["name", "pop_est", "gdp_md_est"]]
europe_boundaries = europe[['name', 'geometry']]

In [46]:
europe_boundaries.head()

Unnamed: 0,name,geometry
0,Russia,"MULTIPOLYGON (((178.725 71.099, 180.000 71.516..."
1,Norway,"MULTIPOLYGON (((15.143 79.674, 15.523 80.016, ..."
2,France,"MULTIPOLYGON (((-51.658 4.156, -52.249 3.241, ..."
3,Sweden,"POLYGON ((11.027 58.856, 11.468 59.432, 12.300..."
4,Belarus,"POLYGON ((28.177 56.169, 29.230 55.918, 29.372..."


In [47]:
europe_stats.head()

Unnamed: 0,name,pop_est,gdp_md_est
0,Russia,142257519,3745000.0
1,Norway,5320045,364700.0
2,France,67106161,2699000.0
3,Sweden,9960487,498100.0
4,Belarus,9549747,165400.0


- Now we will join these two dataframes
- We do the attribute join
- The on argument is set to the column name that is used to match rows in europe_boundaries to rows in europe_stats.

In [49]:
# Attribute join to merge data 
europe = europe_boundaries.merge(europe_stats, on='name')
europe.head()

Unnamed: 0,name,geometry,pop_est,gdp_md_est
0,Russia,"MULTIPOLYGON (((178.725 71.099, 180.000 71.516...",142257519,3745000.0
1,Norway,"MULTIPOLYGON (((15.143 79.674, 15.523 80.016, ...",5320045,364700.0
2,France,"MULTIPOLYGON (((-51.658 4.156, -52.249 3.241, ...",67106161,2699000.0
3,Sweden,"POLYGON ((11.027 58.856, 11.468 59.432, 12.300...",9960487,498100.0
4,Belarus,"POLYGON ((28.177 56.169, 29.230 55.918, 29.372...",9549747,165400.0


#### Spatial join
- combine GeoDataFrame based on the spatial relationship between objects in the 'geometry' column
- For example, we have GeoDataFrame 'universities' that contain geocode address of European universities
- We can  use spatial join to match each university to its corresponding country of europe GeoDataFrame
- We will use gpd.sjoin() method

In [54]:
# Spatial join to match universities to countires in Europe
european_universities = gpd.sjoin(universities, europe)
european_universities.head()

Use `to_crs()` to reproject one of the input geometries to match the CRS of the other.

Left CRS: +init=epsg:4326 +type=crs
Right CRS: EPSG:4326

  


Unnamed: 0,Name,Latitude,Longitude,geometry,index_right,name,pop_est,gdp_md_est
0,University of Oxford,51.756802,-1.254726,POINT (-1.25473 51.75680),28,United Kingdom,64769452,2788000.0
1,University of Cambridge,52.205303,0.116613,POINT (0.11661 52.20530),28,United Kingdom,64769452,2788000.0
2,Imperial College London,51.498997,-0.175495,POINT (-0.17549 51.49900),28,United Kingdom,64769452,2788000.0
4,UCL,51.523815,-0.13306,POINT (-0.13306 51.52382),28,United Kingdom,64769452,2788000.0
5,London School of Economics and Political Science,51.513889,-0.11694,POINT (-0.11694 51.51389),28,United Kingdom,64769452,2788000.0


In [55]:
print("We located {} universities.".format(len(universities)))
print("Only {} of the universities were located in Europe (in {} different countries).".format(
    len(european_universities), len(european_universities.name.unique())))

We located 99 universities.
Only 97 of the universities were located in Europe (in 15 different countries).


- The spatial join above looks at the "geometry" columns in both GeoDataFrames. 
- If a Point object from the universities GeoDataFrame intersects a Polygon object from the europe DataFrame, the corresponding rows are combined and added as a single row of the european_universities DataFrame. 
- Otherwise, countries without a matching university (and universities without a matching country) are omitted from the results.