# PERDIDO Geoparser


[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](http://colab.research.google.com/github/ludovicmoncla/perdido/blob/main/notebooks/demo_Geoparser.ipynb)





## Installation

In [2]:
!pip install --upgrade perdido

Collecting perdido
  Downloading perdido-0.0.8-py3-none-any.whl (8.3 kB)
Installing collected packages: perdido
Successfully installed perdido-0.0.8


## Import

In [3]:
from perdido.geoparser import Geoparser

## Quick start

### Run geoparser

In [4]:
geoparser = Geoparser(lang='fr')
doc = geoparser('Je visite la ville de Lyon, Annecy et le Mont-Blanc.')

### Get tokens

In [5]:
for token in doc:
    print(f'token: {token.text}\tlemma: {token.lemma}\tpos: {token.pos}')

token: Je	lemma: je	pos: PRO
token: visite	lemma: visiter	pos: V
token: la	lemma: le	pos: DET
token: ville	lemma: ville	pos: N
token: de	lemma: de	pos: PREP
token: Lyon	lemma: lyon	pos: NPr
token: ,	lemma: 	pos: PUN
token: Annecy	lemma: annecy	pos: NPr
token: et	lemma: et	pos: CONJC
token: le	lemma: le	pos: DET
token: Mont-Blanc	lemma: mont-blanc	pos: NPr
token: .	lemma: 	pos: SEN


### Get the XML-TEI output

In [6]:
doc.tei

'<TEI><teiheader></teiheader><text><body><s><w lemma="je" type="PRO" subtype="PpvIL" id="w0">Je</w><phr type="motion"><motionmedian><w lemma="visiter" type="V" id="w1">visite</w></motionmedian><rs type="ene" id="en.0"><rs type="place" subtype="ene" id="en.1" start="10" end="12" startT="2" endT="6"><term type="place" start="10" end="12" startT="2" endT="4"><w lemma="le" type="DET" subtype="ART" id="w2">la</w><w lemma="ville" type="N" id="w3">ville</w></term><w lemma="de" type="PREP" id="w4">de</w><rs type="unknown" subtype="no" id="en.2" start="22" end="26" startT="5" endT="6"><name type="unknown" id="en.3"><w lemma="lyon" type="NPr" id="w5">Lyon</w></name></rs><location><geo source="osm" rend="Mus&#233;es Gadagne, Rue de la Fronde, Vieux Lyon, Lyon 5e Arrondissement, Lyon, M&#233;tropole de Lyon, Circonscription d&#233;partementale du Rh&#244;ne, Auvergne-Rh&#244;ne-Alpes, France m&#233;tropolitaine, 69005, France">4.827286 45.76405</geo></location></rs></rs><w type="PUN" lemma="" id="

### Get the GeoJSON output

In [7]:
doc.geojson

{'type': 'FeatureCollection',
 'features': [{'type': 'Feature',
   'geometry': {'type': 'Point', 'coordinates': [4.827286, 45.76405]},
   'properties': {'id': 'en.1',
    'name': 'ville de Lyon',
    'sourceName': 'Musées Gadagne, Rue de la Fronde, Vieux Lyon, Lyon 5e Arrondissement, Lyon, Métropole de Lyon, Circonscription départementale du Rhône, Auvergne-Rhône-Alpes, France métropolitaine, 69005, France',
    'type': '',
    'country': 'France',
    'source': 'osm'}},
  {'type': 'Feature',
   'geometry': {'type': 'Point', 'coordinates': [6.128885, 45.899235]},
   'properties': {'id': 'en.5',
    'name': 'Annecy',
    'sourceName': 'Annecy, Haute-Savoie, Auvergne-Rhône-Alpes, France métropolitaine, France',
    'type': '',
    'country': 'France',
    'source': 'osm'}},
  {'type': 'Feature',
   'geometry': {'type': 'Point', 'coordinates': [6.865171, 45.832706]},
   'properties': {'id': 'en.7',
    'name': 'Mont-Blanc',
    'sourceName': 'Mont Blanc - Monte Bianco, Chamonix-Mont-Blanc

### Get the list of named entities

In [8]:
for entity in doc.ne:
    print(f'entity: {entity.text}\ttag: {entity.tag}')
    if entity.tag == 'place':
        for t in entity.toponyms:
            print(f' latitude: {t.lat}\tlongitude: {t.lng}\tsource {t.source}')

entity: Lyon	tag: unknown
entity: Annecy	tag: place
 latitude: 6.128885	longitude: 45.899235	source osm
entity: Mont-Blanc	tag: place
 latitude: 6.865171	longitude: 45.832706	source osm


### Get the list of nested named entities

In [9]:
for nestedEntity in doc.nne:
    print(f'entity: {nestedEntity.text}\ttag: {nestedEntity.tag}')
    if nestedEntity.tag == 'place':
        for t in nestedEntity.toponyms:
            print(f' latitude: {t.lat}\tlongitude: {t.lng}\tsource {t.source}')

entity: Mont-Blanc	tag: place
 latitude: 4.827286	longitude: 45.76405	source osm


### Display a map (using folium library)


[https://python-visualization.github.io/folium/](https://python-visualization.github.io/folium/)

In [10]:
m = doc.get_folium_map()
m

## Going deeper

### Geocoding settings

#### Choosing gazetteer

In [None]:
sources = {'nominatim' : True, 'geonames' : True, 'ign' : False, 'wikiG' : False}

geoparser = Geoparser(lang = 'fr', sources = sources)
doc = geoparser('Je visite la ville de Lyon, Annecy et Chamonix.')

#### Choosing the maximum number of matches from gazetteer

In [None]:

geoparser = Geoparser(max_records = 3, sources = sources)
doc = geoparser('Je visite la ville de Lyon, Annecy et Chamonix.')


#### Defining a bounding box

In [None]:
geoparser = Geoparser(lang = 'fr', version = 'Encyclopedie', max_records = 3, sources = sources)
doc = geoparser('Je visite la ville de Lyon, Annecy et Chamonix.')


### Geotagging settings

#### Geoparsing encyclopedia articles

[https://geode-project.github.io](https://geode-project.github.io)

In [None]:
geoparser = Geoparser(version = 'Encyclopedie')
doc = geoparser('Je visite la ville de Lyon, Annecy et Chamonix.')
