# PERDIDO Geoparser


[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://colab.research.google.com/github/ludovicmoncla/perdido/blob/main/notebooks/demo_Geoparser.ipynb)





## Installation

In [None]:
!pip install --upgrade perdido
!pip install display-xml

## Import

In [None]:
from perdido.geoparser import Geoparser

## Quick start

### Run geoparser

In [None]:
text = "J'ai rendez-vous proche de la place Bellecour, de la place des Célestins, au sud de la fontaine des Jacobins et près du pont Bonaparte."

In [None]:
geoparser = Geoparser(version='Standard')
doc = geoparser(text)

* The `version` parameter can take 2 values: *Standard* (default), *Encyclopedie*.

### Get tokens

* Access token attributes:

In [None]:
for token in doc:
    print(f'token: {token.text}\tlemma: {token.lemma}\tpos: {token.pos}')

* Get the IOB format:

In [None]:
for token in doc.tokens:
    print(token.iob_format())

* Get a TSV-IOB format:

In [None]:
for token in doc.tokens:
    print(token)    # or print(token.tsv_format())

### Get the XML-TEI output

In [None]:
doc.tei

* Use [display_xml](https://github.com/mpacer/display_xml) library with syntax highlighting:

In [None]:
from display_xml import XML

XML(doc.tei, style='lovelace')

### Get the GeoJSON output

In [None]:
doc.geojson

### Get the list of named entities

In [None]:
for entity in doc.named_entities:
    print(f'entity: {entity.text}\ttag: {entity.tag}')
    if entity.tag == 'place':
        for t in entity.toponym_candidates:
            print(f' latitude: {t.lat}\tlongitude: {t.lng}\tsource: {t.source}')

### Get the list of nested named entities

In [None]:
for nestedEntity in doc.nested_named_entities:
    print(f'entity: {nestedEntity.text}\ttag: {nestedEntity.tag}')
    if nestedEntity.tag == 'place':
        for t in nestedEntity.toponym_candidates:
            print(f' latitude: {t.lat}\tlongitude: {t.lng}\tsource: {t.source}')

### Display tagged entities

In [None]:
from spacy import displacy

In [None]:
displacy.render(doc.to_spacy_doc(), style="ent", jupyter=True) 

In [None]:
displacy.render(doc.to_spacy_doc(), style="span", jupyter=True)

### Display a map (using folium library)


[https://python-visualization.github.io/folium/](https://python-visualization.github.io/folium/)

In [None]:
doc.get_folium_map()

## Going deeper

### Geocoding settings

#### Choosing gazetteers

In [None]:
sources = ['nominatim', 'geonames'] # possible values: 'nominatim' (default), 'geonames', 'ign', 'whg', 'pleiades'

geoparser = Geoparser(lang = 'fr', sources = sources)
doc = geoparser(text)

In [None]:
for entity in doc.named_entities:
    print(f'entity: {entity.text}\ttag: {entity.tag}')
    if entity.tag == 'place':
        for t in entity.toponym_candidates:
            print(f' latitude: {t.lat}\tlongitude: {t.lng}\tsource: {t.source}')

In [None]:
doc.get_folium_map()

#### Choosing the maximum number of matches from gazetteer

In [None]:

geoparser = Geoparser(max_rows = 3, sources = sources)
doc = geoparser(text)


In [None]:
for entity in doc.named_entities:
    print(f'entity: {entity.text}\ttag: {entity.tag}')
    if entity.tag == 'place':
        for t in entity.toponym_candidates:
            print(f' latitude: {t.lat}\tlongitude: {t.lng}\tsource: {t.source}')

In [None]:
doc.get_folium_map()

#### Setting a country code to limit search results to a specific country

In [None]:
geoparser = Geoparser(max_rows = 3, sources = sources, country_code = 'fr')
doc = geoparser(text)

In [None]:
for entity in doc.named_entities:
    print(f'entity: {entity.text}\ttag: {entity.tag}')
    if entity.tag == 'place':
        for t in entity.toponym_candidates:
            print(f' latitude: {t.lat}\tlongitude: {t.lng}\tsource: {t.source}')

In [None]:
doc.get_folium_map()

#### Defining a bounding box to limit the search results to a specific area

In [None]:
bbox = [-5.225, 41.333, 9.55, 51.2]  # France | format: [west, south, east, north]
geoparser = Geoparser(max_rows = 3, sources = sources, bbox = bbox)
doc = geoparser(text)


In [None]:
for entity in doc.named_entities:
    print(f'entity: {entity.text}\ttag: {entity.tag}')
    if entity.tag == 'place':
        for t in entity.toponym_candidates:
            print(f' latitude: {t.lat}\tlongitude: {t.lng}\tsource: {t.source}')

In [None]:
doc.get_folium_map()

#### Disambiguation using the minimal distances heuristic

In [None]:
geocoder = Geoparser(sources = ['wiki_gaz'], max_rows = 50)
doc = geocoder(['Lyon', 'Annecy', 'Chamonix'])

In [None]:
doc.get_folium_map()

In [None]:
doc.minimal_distances_disambiguation()

In [None]:
doc.get_folium_map()

### Geotagging settings

#### Geoparsing encyclopedia articles (historical documents)

We are developping a custom version of the Perdido library for geoparsing encyclopedia articles ([https://geode-project.github.io](https://geode-project.github.io)).
To use this version, you just need to specify the name of the version while creating the geoparser object:

In [None]:
content = "Grenoble, Gratianopolis, ville de France, capitale du Dauphiné, avec un évêché suffragant de Vienne, et un parlement érigé en 1493 par Louis XI. qui n'étoit encore que dauphin ; mais son pere ratifia cette érection deux ans après."
content += "Grenoble est sur l'Isere, à onze lieues S O. de Chambéri, quarante-deux N. O. de Turin, seize S. E. de Vienne, cent vingt-quatre S. O. de Paris. Long. suivant Harris, 23d. 31'. 15\". suivant Cassini, 23d. 14'. 15\". latit 45d. 11'."

geoparser = Geoparser(version = 'Encyclopedie')
doc = geoparser(content)

In [None]:
for entity in doc.named_entities:
    print(f'entity: {entity.text}\ttag: {entity.tag}')
    if entity.tag == 'place':
        for t in entity.toponym_candidates:
            print(f' latitude: {t.lat}\tlongitude: {t.lng}\tsource: {t.source}')

In [None]:
doc.get_folium_map()