# Daedalus Placename Analysis

## Identify geographical locations using SWE-NER
See [https://docs.google.com/document/d/1a7nJV-KX2H1Wr4K-5MmjX8-HIoGg-R9Ke3jADamZraQ/edit]

In [3]:
%autosave 60
import pandas as pd
import geopandas                                    # HOWTO install: http://geoffboeing.com/2014/09/using-geopandas-windows/
from geopandas.tools import geocode                 # uses geopy
import geocoder as geocoder                         # alternative to geopy: pip install geocoder
from geopy.geocoders import GeoNames, Nominatim, GoogleV3     # if explicit use of geopy
import numpy as np

Autosaving every 60 seconds


## Geocode locations

1. Load the NER data from file into a dataframe. (This data is the result data from running the SWE-NER software).
2. Filter out locations from the data (entities tagged as "LOC")
3. Filter out all unique locations
4. Geocode the unique locations
5. Apply the geocoded coordinates back to the location data
6. Create statistics!
7. Plot!

The gecoding is done using the *geopy* library [https://github.com/geopy/geopy].

In [4]:
df = pd.read_fwf('Daedalus1931-79.tags.tsv', header=None, encoding='utf-8', names=['year', 'position', 'offset', 'category', 'subcategory', 'entity'])

In [5]:
dfloc = df.loc[df.category=='LOC',['year', 'entity']]
dfunique = dfloc['entity'].drop_duplicates().to_frame()
dfunique['processed'] = None
dfunique['latitude'] = np.nan
dfunique['longitude'] = np.nan
dfunique['reversename'] = np.nan
#dfunique = dfunique.drop('geocode',axis=1)

#df_year_count = df.groupby(['year', 'entity']).size().reset_index(name='counts')


In [12]:
dfunique.info()
dfunique.head()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2793 entries, 6 to 54504
Data columns (total 5 columns):
entity         2793 non-null object
processed      0 non-null object
latitude       0 non-null float64
longitude      0 non-null float64
reversename    0 non-null float64
dtypes: float64(3), object(2)
memory usage: 210.9+ KB


Unnamed: 0,entity,processed,latitude,longitude,reversename
6,Wien,,,,
11,Norge,,,,
19,Sverige,,,,
20,Falu gruva,,,,
22,Falun,,,,


In [20]:
dfunique = dfunique.set_index('entity')

In [97]:
i = 0
geolocator = GoogleV3(api_key='AIzaSyAUPl7HOuaq1rF_PmMykx1G0JMjeNJZzBQ', timeout=5)
#geolocator = GeoNames(country_bias='Sweden', username='humlab')
#geolocator = Nominatim() # OpenStreetMaps

#dfunique['processed'] = None

for index, row in dfunique.iterrows():
    
    if not row['processed'] is None:
        continue
        
    dfunique.loc[index,'processed'] = True        
    location = geolocator.geocode(index) # dfunique.loc[index,'entity'])
    
    if not location is None:
        dfunique.loc[index,'latitude'] = location.latitude
        dfunique.loc[index,'longitude'] = location.longitude
        
        point = [ location.latitude, location.longitude ]
    
        reverseName = geolocator.reverse(point, exactly_one=True)
        
        if not reverseName is None:
            # print("{0} ==> {1}".format(row['entity'], reverseName[0]))
            print("{0} ==> {1}".format(index, reverseName[0]))
            dfunique.loc[index,'reversename'] = reverseName[0]
    
    if i > 50:
        break
    i += 1
    
writer = pd.ExcelWriter('C:\TEMP\daedalus_ner_geocoded_NEW.xlsx')
dfunique.to_excel(writer,'Sheet1')
writer.save()


In [63]:
writer = pd.ExcelWriter('C:\TEMP\daedalus_ner_geocoded_NEW13.xlsx')
dfunique.to_excel(writer,'Sheet1')
writer.save()

In [79]:
dfu = pd.read_excel(open('C:\TEMP\daedalus_ner_geocoded_NEW13.xlsx','rb'), sheetname='Sheet1',index='entity')
dfu = dfu.set_index('entity')

map(lambda x: x == 1.0, dfu['processed'])


In [86]:
dfu.loc[(dfu['processed']!=1.0)]

Unnamed: 0_level_0,processed,latitude,longitude,reversename
entity,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
St Pauls,,,,
City of London,,,,
Halvvägs,,,,
Peterskyrkan,,,,
Östafrika,,,,
Medelhavsområdet,,,,
"Rom,",,,,
Wien. Vid Wallensteins,,,,
Västerviks,,,,
Söderhamns,,,,


In [88]:
dfu

Unnamed: 0_level_0,processed,latitude,longitude,reversename
entity,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Wien,<map object at 0x000001DC357A0A20>,48.208174,16.373819,
Norge,<map object at 0x000001DC357A0A20>,60.472024,8.468946,"Unnamed Road, 3580 Geilo, Norway"
Sverige,<map object at 0x000001DC357A0A20>,60.128161,18.643501,"Gruvvägen 2, 760 49 Herräng, Sweden"
Falu gruva,<map object at 0x000001DC357A0A20>,60.600216,15.616582,"GRUVPLATSEN 5, 791 61 Falun, Sweden"
Falun,<map object at 0x000001DC357A0A20>,60.606460,15.635500,"Bergmästaregatan 11, 791 30 Falun, Sweden"
Lesjöfors,<map object at 0x000001DC357A0A20>,59.977054,14.184545,"Parkgatan 9, 680 96 Lesjöfors, Sweden"
Garphyttan,<map object at 0x000001DC357A0A20>,59.303709,14.945119,"Kilsvägen 2A, 719 40 Garphyttan, Sweden"
Gunnebo,<map object at 0x000001DC357A0A20>,57.720821,16.526140,"Västrumsvägen 15, 590 93 Gunnebo, Sweden"
Sveriges,<map object at 0x000001DC357A0A20>,60.128161,18.643501,"Gruvvägen 2, 760 49 Herräng, Sweden"
Tabergs,<map object at 0x000001DC357A0A20>,57.677732,14.087904,"Bergslagsvägen 24, 562 41 Taberg, Sweden"
