# Storing Text objects in Postgres

This tutorial demonstrates how to store and query estnltk text objects in postgres database.

In [2]:
from estnltk import Text
from estnltk.storage.postgres import PostgresStorage, JsonbQuery as Q
from estnltk.taggers import VabamorfTagger

Connect to an existing postgres database

In [3]:
storage = PostgresStorage(pgpass_file=r"C:\Users\distorti\projects\ut\estnltk\estnltk\storage\postgres\.pgpass")

In [4]:
storage.drop_table_if_exists('my_collection')
storage.drop_table_if_exists('collection_with_layers')

## Collections

Collection stores text objects and provides read/write API.

Create a new collection:

In [5]:
collection = storage.get_collection("my_collection")
collection.create()

Add some data:

In [6]:
text1 = Text('ööbik laulab.').tag_layer(['morph_analysis'])
key1 = collection.insert(text1)
print(key1, text1)

text2 = Text('öökull ei laula.').tag_layer(['morph_analysis'])
key2 = collection.insert(text2, key=7)
print(key2, text2)

1 Text(text="ööbik laulab.")
7 Text(text="öökull ei laula.")


Iterate over collection:

In [7]:
for key, text in collection.select():
    print(key, text)

1 Text(text="ööbik laulab.")
7 Text(text="öökull ei laula.")


Search for a particular entry by key:

In [8]:
txt = collection.select_by_key(7)
print(txt)

Text(text="öökull ei laula.")


Search using layer attributes:

In [9]:
q = Q('morph_analysis', lemma='laulma')
for key, txt in collection.select(query=q):
    print(key, txt)

1 Text(text="ööbik laulab.")
7 Text(text="öökull ei laula.")


Search using multiple layer attributes:

In [10]:
q = Q('morph_analysis', lemma='laulma', form='b')
for key, txt in collection.select(query=q):
    print(key, txt)

1 Text(text="ööbik laulab.")


Search using "OR" query:

In [11]:
q = Q('morph_analysis', lemma='ööbik') | Q('morph_analysis', lemma='öökull')
for key, txt in collection.select(query=q):
    print(key, txt)

1 Text(text="ööbik laulab.")
7 Text(text="öökull ei laula.")


Search using "AND" query:

In [12]:
q = Q('morph_analysis', lemma='ööbik') & Q('morph_analysis', lemma='öökull')
for key, txt in collection.select(query=q):
    print(key, txt)

Search using a composite query:

In [13]:
q = (Q('morph_analysis', lemma='ööbik') | Q('morph_analysis', lemma='öökull')) & Q('morph_analysis', lemma='laulma')
for key, txt in collection.select(query=q):
    print(key, txt)

1 Text(text="ööbik laulab.")
7 Text(text="öökull ei laula.")


or use a convenience method `find_fingerprint`:

In [14]:
for key, txt in collection.find_fingerprint(layer="morph_analysis", 
                                            field="lemma", 
                                            query_list=[{'ööbik', 'laulma'}, {'öökull', 'laulma'}],
                                            ambiguous=True, 
                                            order_by_key=False):
    print(key, txt)

1 Text(text="ööbik laulab.")
7 Text(text="öökull ei laula.")


Delete collection

In [15]:
collection.delete()

## Working with layers

Let's say you want to create a collection which stores only layers up to "sentences":

In [21]:
collection = storage.get_collection("collection_with_layers")
collection.create()

collection.insert(Text('see on esimene lause').tag_layer(["sentences"]))
collection.insert(Text('see on teine lause').tag_layer(["sentences"]));

Check what layers are present:

In [22]:
for key, text in collection.select():
    print(key, text, text.layers.keys())

1 Text(text="see on esimene lause") dict_keys(['sentences', 'words', 'compound_tokens', 'tokens'])
2 Text(text="see on teine lause") dict_keys(['sentences', 'words', 'compound_tokens', 'tokens'])


Now, you want to add a new layer "my_layer", which will store morphological analysis. However, you want to store it in a searate table. For this purpose collection object has a `create_layer` method:

In [23]:
layer = "my_layer"
tagger = VabamorfTagger(disambiguate=False, layer_name=layer)
collection.create_layer(layer, callable=lambda t: tagger.tag(t, return_layer=True))

Make sure the new layer has been created:

In [24]:
collection.get_layer_names()

['my_layer']

Retrieve the new layer using `select` method:

In [25]:
for key, text in collection.select(layers=['my_layer']):
    print(key, text, text.layers.keys())

1 Text(text="see on esimene lause") dict_keys(['sentences', 'words', 'compound_tokens', 'my_layer', 'tokens'])
2 Text(text="see on teine lause") dict_keys(['sentences', 'words', 'compound_tokens', 'my_layer', 'tokens'])


Delete layer

In [26]:
collection.delete_layer("my_layer")

Delete collection

In [27]:
collection.delete()

Close database connection

In [28]:
storage.close()