# PERDIDO Geoparser


[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://colab.research.google.com/github/ludovicmoncla/nlp-tools/blob/main/perdido.ipynb)

https://github.com/ludovicmoncla/perdido/

## Import packages

In [10]:
from perdido.geoparser import Geoparser
from display_xml import XML
from spacy import displacy
import os

## Parse text

In [2]:
text = "ABYDE ou ABYDOS, sub. Ville maritime de Phrygie vis-à-vis de Sestos."
text += "Xercès joignit ces deux endroits éloignés l'un de l'autre de sept stades, par le pont qu'il jetta sur l'Hellespont."


In [3]:
geoparser = Geoparser(version='Encyclopedie', pos_tagger='stanza')
doc = geoparser(text)

* The `version` parameter can take 2 values: *Standard* (default), *Encyclopedie*.

* Print annotations per token:

In [4]:
for token in doc:
    print(f'token: {token.text}\tlemma: {token.lemma}\tpos: {token.pos}\tner: {token.tags}')

token: ABYDE	lemma: ABYDE	pos: PROPN	ner: ['B-LOC']
token: ou	lemma: ou	pos: CCONJ	ner: ['O']
token: ABYDOS	lemma: ABYDOS	pos: PROPN	ner: ['B-LOC']
token: ,	lemma: 	pos: PUNCT	ner: ['O']
token: sub	lemma: sub	pos: X	ner: ['O']
token: .	lemma: 	pos: PUNCT	ner: ['O']
token: Ville	lemma: ville	pos: NOUN	ner: ['B-LOC']
token: maritime	lemma: maritime	pos: ADJ	ner: ['I-LOC']
token: de	lemma: de	pos: ADP	ner: ['I-LOC']
token: Phrygie	lemma: Phrygie	pos: PROPN	ner: ['I-LOC']
token: vis-à-vis	lemma: vis-à-vis	pos: ADV	ner: ['O']
token: de	lemma: de	pos: ADP	ner: ['O']
token: Sestos.Xercès	lemma: SestosXercès	pos: PROPN	ner: ['B-OTHER']
token: joignit	lemma: joindre	pos: VERB	ner: ['O']
token: ces	lemma: ce	pos: DET	ner: ['O']
token: deux	lemma: deux	pos: NUM	ner: ['O']
token: endroits	lemma: endroit	pos: NOUN	ner: ['O']
token: éloignés	lemma: éloigner	pos: VERB	ner: ['O']
token: l'	lemma: le	pos: DET	ner: ['O']
token: un	lemma: un	pos: PRON	ner: ['O']
token: de	lemma: de	pos: ADP	ner: ['O']
to

* Get the XML-TEI output:

In [5]:
XML(doc.tei, style='lovelace')

* Get the list of named entities:

In [6]:
for entity in doc.named_entities:
    print(f'entity: {entity.text}\ttag: {entity.tag}')
    if entity.tag == 'place':
        for t in entity.toponym_candidates:
            print(f' latitude: {t.lat}\tlongitude: {t.lng}\tsource: {t.source}')

entity: ABYDE	tag: place
entity: ABYDOS	tag: place
 latitude: 26.411269	longitude: 40.194296	source: nominatim
entity: Ville maritime de Phrygie	tag: place
entity: Sestos.Xercès	tag: unknown
entity: Hellespont	tag: place
 latitude: -90.452881	longitude: 38.132867	source: nominatim


In [7]:
displacy.render(doc.to_spacy_doc(), style="ent", jupyter=True) 

In [8]:
displacy.render(doc.to_spacy_doc(), style="span", jupyter=True)

* Write the doc to a file (conll format):

In [11]:
doc.to_iob(os.path.join('output', 'sample_perdido.tsv'))
doc.to_xml(os.path.join('output', 'sample_perdido.xml'))