# Extracting and Storing Addresses

This tutorial demonstrates how to extract addresses from text and store results in Postgres using `PostgresStorage` module.

In [1]:
from estnltk import Text
from estnltk.taggers import AddressPartTagger, AddressGrammarTagger
from estnltk.core import rel_path
from estnltk.storage.postgres import PostgresStorage, JsonbTextQuery, JsonbLayerQuery

In this tutorial we are going to use the following small toy dataset:

In [2]:
text_corpus = [
    'Kontor asub aadressil Rävala 5, Tallinn.',
    'Salong asub uuel aadressil, üle tee asuvas Rävala pst 7 hoones',
    'Korterite müük: Gonsiori tn 36, Tallinn'
]

First, let's save our dataset to the database:

In [3]:
storage = PostgresStorage(pgpass_file=rel_path('storage/postgres/.pgpass'),
                          schema="grammarextractor")
collection = storage.get_collection("texts_with_addresses")
collection.create()

for key, text in enumerate(text_corpus):
    collection.insert(Text(text).tag_layer(['words']), key=key)

Next, we extract addresses and save them in a separate layer:

In [4]:
address_part_tagger = AddressPartTagger(output_layer='address_parts')
address_tagger = AddressGrammarTagger(output_layer='address_layer')

collection.create_layer("address_parts",
                        callable=lambda t: address_part_tagger.tag(t)["address_parts"])
collection.create_layer("address_layer", layers=["address_parts"],
                        callable=lambda t: address_tagger.tag(t)["address_layer"])

Let's now load one text object and see what's inside:

In [5]:
key, text = next(collection.select(layers=["address_parts", "address_layer"]))
text

text
"Kontor asub aadressil Rävala 5, Tallinn."

layer name,attributes,parent,enveloping,ambiguous,span count
tokens,,,,False,8
words,normalized_form,,,False,8
address_layer,"grammar_symbol, TÄNAV, MAJA, ASULA, MAAKOND, INDEKS",,address_parts,False,1
address_parts,"grammar_symbol, type",,,True,4


As we can see, the `address_layer` has attributes TÄNAV, MAJA, ASULA, MAAKOND, INDEKS which  can be used in search. For example, we can search for records containing a street name 'Rävala' and a house number '5':

In [6]:
q = JsonbLayerQuery(layer_table=collection.layer_name_to_table_name("address_layer"),
                    TÄNAV='Rävala', MAJA='5', ambiguous=False)
for key, text in collection.select(layer_query={'address_layer': q}):
    print(text)

Text(text="Kontor asub aadressil Rävala 5, Tallinn.")


Equivalently, we can use a method `find_fingerprint`:

In [7]:
q = {"field": "TÄNAV", "query": ["Gonsiori tn"], "ambiguous": False}
for key, text in collection.find_fingerprint(layer_query={"address_layer": q}):
    print(text)

Text(text="Korterite müük: Gonsiori tn 36, Tallinn")


In [8]:
text

text
"Korterite müük: Gonsiori tn 36, Tallinn"

layer name,attributes,parent,enveloping,ambiguous,span count
tokens,,,,False,8
compound_tokens,"type, normalized",,tokens,False,1
words,normalized_form,,,False,8
