# <span style="color:purple"> Information extraction: Addresses </span>

## AddressTagger

**AddressTagger** allows us to tag addresses on a **Text** object and extract the street name, house number, zip code, town, and county from the text.

## Usage

The easiest way to use **AddressTagger** is via default resolver, which provides all the necessary preprocessing:

In [1]:
from estnltk import Text

text = Text('Muuseumi postiaadress on Muuseumi tee 2, Tartu 60532.')

text.tag_layer('addresses')

text.addresses

layer name,attributes,parent,enveloping,ambiguous,span count
addresses,"grammar_symbol, TÄNAV, MAJA, ASULA, MAAKOND, INDEKS",,address_parts,True,1

text,grammar_symbol,TÄNAV,MAJA,ASULA,MAAKOND,INDEKS
"['Muuseumi tee', '2', 'Tartu', '60532']",ADDRESS,Muuseumi tee,2,Tartu,,60532


## Usage as a tagger

In [2]:
from estnltk.taggers import AddressPartTagger, AddressGrammarTagger
from estnltk import Text

Tagging addresses is done in two steps: first, the parts are tagged with **AddressPartTagger**, then the parts are joined into addresses with **AddressGrammarTagger**.

In [3]:
part_tagger = AddressPartTagger()

In [4]:
grammar_tagger = AddressGrammarTagger()

To tag addresses, the text needs to be segmented into words:

In [5]:
t = Text('Rävala 5, Tallinn').tag_layer('words')

In [6]:
part_tagger.tag(t)

text
"Rävala 5, Tallinn"

layer name,attributes,parent,enveloping,ambiguous,span count
tokens,,,,False,4
compound_tokens,"type, normalized",,tokens,False,0
words,normalized_form,,,True,4
address_parts,"grammar_symbol, type",,,True,3


In [7]:
grammar_tagger.tag(t)

text
"Rävala 5, Tallinn"

layer name,attributes,parent,enveloping,ambiguous,span count
tokens,,,,False,4
compound_tokens,"type, normalized",,tokens,False,0
words,normalized_form,,,True,4
address_parts,"grammar_symbol, type",,,True,3
addresses,"grammar_symbol, TÄNAV, MAJA, ASULA, MAAKOND, INDEKS",,address_parts,True,1


In [8]:
t.addresses

layer name,attributes,parent,enveloping,ambiguous,span count
addresses,"grammar_symbol, TÄNAV, MAJA, ASULA, MAAKOND, INDEKS",,address_parts,True,1

text,grammar_symbol,TÄNAV,MAJA,ASULA,MAAKOND,INDEKS
"['Rävala', '5', 'Tallinn']",ADDRESS,Rävala,5,Tallinn,,


In [9]:
t.TÄNAV

Unnamed: 0,TÄNAV
0,Rävala


In [10]:
t.MAJA

Unnamed: 0,MAJA
0,5


In [11]:
t.ASULA

Unnamed: 0,ASULA
0,Tallinn
