# PERDIDO Geoparser


[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://colab.research.google.com/github/ludovicmoncla/perdido/blob/main/notebooks/demo_Geoparser.ipynb)





## Installation

In [None]:
!pip install --upgrade perdido

## Import

In [None]:
from perdido.geoparser import Geoparser

## Quick start

### Run geoparser

In [None]:
geoparser = Geoparser(lang='fr')
doc = geoparser('Je visite la ville de Lyon, Annecy et Chamonix.')

### Get tokens

In [None]:
for token in doc:
    print(f'token: {token.text}\tlemma: {token.lemma}\tpos: {token.pos}')

### Get the XML-TEI output

In [None]:
doc.tei

### Get the GeoJSON output

In [None]:
doc.geojson

### Get the list of named entities

In [None]:
for entity in doc.ne:
    print(f'entity: {entity.text}\ttag: {entity.tag}')
    if entity.tag == 'place':
        for t in entity.toponyms:
            print(f' latitude: {t.lat}\tlongitude: {t.lng}\tsource: {t.source}')

### Get the list of nested named entities

In [None]:
for nestedEntity in doc.nne:
    print(f'entity: {nestedEntity.text}\ttag: {nestedEntity.tag}')
    if nestedEntity.tag == 'place':
        for t in nestedEntity.toponyms:
            print(f' latitude: {t.lat}\tlongitude: {t.lng}\tsource: {t.source}')

### Display tagged entities

In [None]:
from spacy import displacy

In [None]:
displacy.render(doc.to_spacy_doc(), style="ent", jupyter=True) 


In [None]:
displacy.render(doc.to_spacy_doc(), style="span", jupyter=True)

### Display a map (using folium library)


[https://python-visualization.github.io/folium/](https://python-visualization.github.io/folium/)

In [None]:
m = doc.get_folium_map()
m

## Going deeper

### Geocoding settings

#### Choosing gazetteers

In [None]:
sources = ['nominatim', 'geonames'] # possible values: 'nominatim' (default), 'geonames', 'ign', 'wiki_gaz'

geoparser = Geoparser(lang = 'fr', sources = sources)
doc = geoparser('Je visite la ville de Lyon, Annecy et Chamonix.')

In [None]:
for entity in doc.ne:
    print(f'entity: {entity.text}\ttag: {entity.tag}')
    if entity.tag == 'place':
        for t in entity.toponyms:
            print(f' latitude: {t.lat}\tlongitude: {t.lng}\tsource: {t.source}')

In [None]:
doc.get_folium_map()

#### Choosing the maximum number of matches from gazetteer

In [None]:

geoparser = Geoparser(max_rows = 3, sources = sources)
doc = geoparser('Je visite la ville de Lyon, Annecy et Chamonix.')


In [None]:
for entity in doc.ne:
    print(f'entity: {entity.text}\ttag: {entity.tag}')
    if entity.tag == 'place':
        for t in entity.toponyms:
            print(f' latitude: {t.lat}\tlongitude: {t.lng}\tsource: {t.source}')

In [None]:
doc.get_folium_map()

#### Setting a country code to limit search results to a specific country

In [None]:
geoparser = Geoparser(max_rows = 3, sources = sources, country_code = 'fr')
doc = geoparser('Je visite la ville de Lyon, Annecy et Chamonix.')

In [None]:
for entity in doc.ne:
    print(f'entity: {entity.text}\ttag: {entity.tag}')
    if entity.tag == 'place':
        for t in entity.toponyms:
            print(f' latitude: {t.lat}\tlongitude: {t.lng}\tsource: {t.source}')

In [None]:
doc.get_folium_map()

#### Defining a bounding box to limit the search results to a specific area

In [None]:
bbox = [-5.225, 41.333, 9.55, 51.2]  # France | format: [west, south, east, north]
geoparser = Geoparser(max_rows = 3, sources = sources, bbox = bbox)
doc = geoparser('Je visite la ville de Lyon, Annecy et Chamonix.')


In [None]:
for entity in doc.ne:
    print(f'entity: {entity.text}\ttag: {entity.tag}')
    if entity.tag == 'place':
        for t in entity.toponyms:
            print(f' latitude: {t.lat}\tlongitude: {t.lng}\tsource: {t.source}')

In [None]:
doc.get_folium_map()

### Geotagging settings

#### Geoparsing encyclopedia articles

[https://geode-project.github.io](https://geode-project.github.io)

In [None]:
content = "Grenoble, Gratianopolis, ville de France, capitale du Dauphiné, avec un évêché suffragant de Vienne, et un parlement érigé en 1493 par Louis XI. qui n'étoit encore que dauphin ; mais son pere ratifia cette érection deux ans après."
#content = "Grenoble, ancienne ville de France" -> bug ?
content += "Grenoble est sur l'Isere, à onze lieues S O. de Chambéri, quarante-deux N. O. de Turin, seize S. E. de Vienne, cent vingt-quatre S. O. de Paris. Long. suivant Harris, 23d. 31'. 15\". suivant Cassini, 23d. 14'. 15\". latit 45d. 11'."

geoparser = Geoparser(version = 'Encyclopedie')
doc = geoparser(content)
