# Web-taggers

To run a web tagger, you need a running [EstNLTK web-tagger](https://github.com/estnltk/webtagger-service) service. Web services for some tools are also available at the `tartunlp` server.

In [1]:
from estnltk import Text

## Currently available web-taggers

## BertEmbeddingsWebTagger

Tags [Bert's embeddings](https://huggingface.co/tartuNLP/EstBERT) using EstNLTK's [BertTagger](https://github.com/estnltk/estnltk/blob/main/tutorials/nlp_pipeline/E_embeddings/bert_embeddings_tagger.ipynb).

In [2]:
from estnltk.taggers import BertEmbeddingsWebTagger
bert_embeddings_web_tagger = \
    BertEmbeddingsWebTagger(url='https://api.tartunlp.ai/estnltk/tagger/bert')
bert_embeddings_web_tagger

name,output layer,output attributes,input layers
BertEmbeddingsWebTagger,bert_embeddings,"('token', 'bert_embedding')","('words', 'sentences')"

0,1
url,https://api.tartunlp.ai/estnltk/tagger/bert
batch_layer,words
batch_max_size,125
batch_enveloping_layer,sentences


In [3]:
bert_embeddings_web_tagger.about

'Tags BERT embeddings using EstNLTK 1.6.7beta webservice.'

In [4]:
bert_embeddings_web_tagger.status

'OK'

In [5]:
bert_embeddings_web_tagger.is_alive

True

In [6]:
# Create a text and add required layers
text = Text('See on lause.')
text.tag_layer('sentences')

text
See on lause.

layer name,attributes,parent,enveloping,ambiguous,span count
sentences,,,words,False,1
tokens,,,,False,4
compound_tokens,"type, normalized",,tokens,False,0
words,normalized_form,,,True,4


In [7]:
# NBVAL_IGNORE_OUTPUT
# Add bert embeddings
bert_embeddings_web_tagger.tag(text)
text.bert_embeddings

layer name,attributes,parent,enveloping,ambiguous,span count
bert_embeddings,"token, bert_embedding",,,True,4

text,token,bert_embedding
See,see,"[0.2557530701160431, -0.05085883289575577, -0.16551020741462708, 0.2314713299274 ..., type: <class 'list'>, length: 3072"
on,on,"[0.2487327605485916, 0.10666818171739578, 0.05178200453519821, 0.398448169231414 ..., type: <class 'list'>, length: 3072"
lause,lause,"[-1.942150592803955, 0.15984198451042175, -0.4069922864437103, -0.43852004408836 ..., type: <class 'list'>, length: 3072"
.,.,"[-0.051411714404821396, -0.10687203705310822, -0.009526951238512993, -0.07030776 ..., type: <class 'list'>, length: 3072"


Notes: 
   
   * `BertTagger` uses a tokenization that diverges from EstNLTK's default tokenization. For details, see the tutorial: https://github.com/estnltk/estnltk/blob/main/tutorials/nlp_pipeline/E_embeddings/bert_embeddings_tagger.ipynb

## StanzaSyntaxWebTagger

Tags dependency syntactic analysis using EstNLTK `StanzaSyntaxTagger`'s webservice.

In [8]:
from estnltk.taggers import StanzaSyntaxWebTagger
stanza_syntax_web_tagger = \
    StanzaSyntaxWebTagger(url='https://api.tartunlp.ai/estnltk/tagger/stanza_syntax')
stanza_syntax_web_tagger

name,output layer,output attributes,input layers
StanzaSyntaxWebTagger,stanza_syntax,"('id', 'lemma', 'upostag', 'xpostag', 'feats', 'head', 'deprel', 'deps', 'misc')","('words', 'sentences', 'morph_extended')"

0,1
url,https://api.tartunlp.ai/estnltk/tagger/stanza_syntax
batch_layer,words
batch_max_size,125
batch_enveloping_layer,sentences


In [9]:
# Create Text and add required input layers
from estnltk import Text
text = Text('Ilus suur karvane kass nurrus rohelisel diivanil.').tag_layer('morph_extended')
# Tag syntax with web_tagger
stanza_syntax_web_tagger.tag( text )
text.stanza_syntax

layer name,attributes,parent,enveloping,ambiguous,span count
stanza_syntax,"id, lemma, upostag, xpostag, feats, head, deprel, deps, misc",morph_extended,,False,8

text,id,lemma,upostag,xpostag,feats,head,deprel,deps,misc
Ilus,1,ilus,A,A,"{'nom': 'nom', 'pos': 'pos', 'sg': 'sg'}",4,amod,_,_
suur,2,suur,A,A,"{'nom': 'nom', 'pos': 'pos', 'sg': 'sg'}",4,amod,_,_
karvane,3,karvane,A,A,"{'nom': 'nom', 'pos': 'pos', 'sg': 'sg'}",4,amod,_,_
kass,4,kass,S,S,"{'com': 'com', 'nom': 'nom', 'sg': 'sg'}",5,nsubj,_,_
nurrus,5,nurruma,V,V,"{'af': 'af', 'aux': 'aux', 'impf': 'impf', 'indic': 'indic', 'ps': 'ps', 'ps3': 'ps3', 'sg': 'sg'}",0,root,_,_
rohelisel,6,roheline,A,A,"{'ad': 'ad', 'pos': 'pos', 'sg': 'sg'}",7,amod,_,_
diivanil,7,diivan,S,S,"{'ad': 'ad', 'com': 'com', 'sg': 'sg'}",5,obl,_,_
.,8,.,Z,Z,{},5,punct,_,_


## StanzaSyntaxEnsembleWebTagger

Tags dependency syntactic analysis using EstNLTK `StanzaSyntaxEnsembleWebTagger`'s webservice.

In [10]:
from estnltk.taggers import StanzaSyntaxEnsembleWebTagger
stanza_syntax_ensemble_web_tagger = \
    StanzaSyntaxEnsembleWebTagger(url='https://api.tartunlp.ai/estnltk/tagger/stanza_syntax_ensemble')
stanza_syntax_ensemble_web_tagger

name,output layer,output attributes,input layers
StanzaSyntaxEnsembleWebTagger,stanza_ensemble_syntax,"('id', 'lemma', 'upostag', 'xpostag', 'feats', 'head', 'deprel', 'deps', 'misc')","('words', 'sentences', 'morph_extended')"

0,1
url,https://api.tartunlp.ai/estnltk/tagger/stanza_syntax_ensemble
batch_layer,words
batch_max_size,125
batch_enveloping_layer,sentences


In [11]:
# Create Text and add required input layers
from estnltk import Text
text = Text('Ilus suur karvane kass nurrus rohelisel diivanil.').tag_layer('morph_extended')
# Tag syntax with web_tagger
stanza_syntax_ensemble_web_tagger.tag( text )
text.stanza_ensemble_syntax

layer name,attributes,parent,enveloping,ambiguous,span count
stanza_ensemble_syntax,"id, lemma, upostag, xpostag, feats, head, deprel, deps, misc",morph_extended,,False,8

text,id,lemma,upostag,xpostag,feats,head,deprel,deps,misc
Ilus,1,ilus,A,A,"{'nom': 'nom', 'pos': 'pos', 'sg': 'sg'}",4,amod,_,_
suur,2,suur,A,A,"{'nom': 'nom', 'pos': 'pos', 'sg': 'sg'}",4,amod,_,_
karvane,3,karvane,A,A,"{'nom': 'nom', 'pos': 'pos', 'sg': 'sg'}",4,amod,_,_
kass,4,kass,S,S,"{'com': 'com', 'nom': 'nom', 'sg': 'sg'}",5,nsubj,_,_
nurrus,5,nurruma,V,V,"{'af': 'af', 'impf': 'impf', 'indic': 'indic', 'mod': 'mod', 'ps': 'ps', 'ps3': 'ps3', 'sg': 'sg'}",0,root,_,_
rohelisel,6,roheline,A,A,"{'ad': 'ad', 'pos': 'pos', 'sg': 'sg'}",7,amod,_,_
diivanil,7,diivan,S,S,"{'ad': 'ad', 'com': 'com', 'sg': 'sg'}",5,obl,_,_
.,8,.,Z,Z,{},5,punct,_,_


## NeuralMorphDisambWebTagger

`NeuralMorphDisambWebTagger` takes Vabamorf's analyses as an input, and predicts morphological tags (`partofspeech` and `form`) with better accuracy than Vabamorf. See also the [documentation](../../nlp_pipeline/B_morphology/08_neural_morph_tagger_py37.ipynb).

In [12]:
from estnltk.web_taggers import NeuralMorphDisambWebTagger

neural_morph_tagger_web_tagger = NeuralMorphDisambWebTagger(
    url='https://api.tartunlp.ai/estnltk/tagger/neural_morph_disamb',
    output_layer='neural_morph_disamb')
neural_morph_tagger_web_tagger

name,output layer,output attributes,input layers
NeuralMorphDisambWebTagger,neural_morph_disamb,"('morphtag', 'pos', 'form')","('words', 'sentences', 'morph_analysis')"

0,1
url,https://api.tartunlp.ai/estnltk/tagger/neural_morph_disamb
batch_layer,words
batch_max_size,125
batch_enveloping_layer,sentences


In [13]:
from estnltk import Text

# Create Text and add required input layers
text = Text('Kiirelt võetud pangalaen on kärmelt kulunud.')
text.tag_layer('morph_analysis')

# Tag morph with with web_tagger
# Result is a disambiguated morph layer
neural_morph_tagger_web_tagger.tag(text)
text['neural_morph_disamb']

layer name,attributes,parent,enveloping,ambiguous,span count
neural_morph_disamb,"morphtag, pos, form",words,,False,7

text,morphtag,pos,form
Kiirelt,POS=S|NOUN_TYPE=com|NUMBER=sg|CASE=abl,S,sg abl
võetud,POS=A|DEGREE=pos,A,
pangalaen,POS=S|NOUN_TYPE=com|NUMBER=sg|CASE=nom,S,sg n
on,POS=V|VERB_TYPE=aux|MOOD=indic|TENSE=pres|PERSON=ps3|NUMBER=sg|VERB_PS=ps|VERB_POLARITY=af,V,b
kärmelt,POS=D,D,
kulunud,POS=V|VERB_TYPE=main|VERB_FORM=partic|TENSE=past|VERB_PS=ps,V,nud
.,POS=Z|PUNCT_TYPE=Fst,Z,


Note: `NeuralMorphDisambWebTagger` can also be used as a disambiguator of the `morph_analysis` layer, see [documentation](../../nlp_pipeline/B_morphology/08_neural_morph_tagger_py37.ipynb) for details.

## CoreferenceV1WebTagger

`CoreferenceV1WebTagger` aims to find correct references for pronouns. 

In [14]:
from estnltk import Text
from estnltk.web_taggers import CoreferenceV1WebTagger

coref_web_tagger = CoreferenceV1WebTagger(url='https://api.tartunlp.ai/estnltk/tagger/coreference_v1')
coref_web_tagger

name,output layer,output span names,output attributes,input layers
CoreferenceV1WebTagger,coreference_v1,"('pronoun', 'mention')",(),()

0,1
url,https://api.tartunlp.ai/estnltk/tagger/coreference_v1
batch_layer,
batch_max_size,175000
batch_enveloping_layer,


In [15]:
text = Text('Ahto ütles, et tema ei tegele rahadega. Jah, ta tegeleb hoopis suurte plaanidega. Proovib vähendada.')
coref_web_tagger.tag( text )
text['coreference_v1']

layer name,span_names,attributes,ambiguous,relation count
coreference_v1,"pronoun, mention",,False,2

pronoun,mention
tema,Ahto
ta,Ahto


## NerWebTagger

Tags named entities (PER, LOC, ORG) using TartuNLP's NER web service:

In [16]:
from estnltk import Text
from estnltk.web_taggers import NerWebTagger

ner_web_tagger = NerWebTagger()
ner_web_tagger

name,output layers,output mapping,input layers
NerWebTagger,['webner'],{'webner': ['nertag']},"('words',)"

0,1
custom_words_layer,words
url,https://api.tartunlp.ai/bert/ner/v1
batch_size,4500


In [17]:
# Create Text and add required input layers
from estnltk import Text
text = Text('30. juunil 1632. aastal kinnitas Rootsi kuningas Gustav II Adolf oma allkirjaga '+\
"Nürnbergi all sõjalaagris Academia Dorpatensis'e asutamisüriku, mis lubab meie suurkooli ajalugu "+\
"sellest ajast alustada ja auväärseks lugeda.").tag_layer('words')
# Tag web named entities
ner_web_tagger.tag( text )
text['webner']

layer name,attributes,parent,enveloping,ambiguous,span count
webner,nertag,,words,False,4

text,nertag
['Rootsi'],LOC
"['Gustav', 'II', 'Adolf']",PER
['Nürnbergi'],LOC
"['Academia', ""Dorpatensis'e""]",ORG


## Other web-taggers

Before using the following web taggers, you first need to [create a hosting webservice](https://github.com/estnltk/webtagger-service).

## VabamorfWebTagger

See also documentation for `VabamorfTagger`.

In [2]:
from estnltk.taggers import VabamorfWebTagger
vabamorph_web_tagger = VabamorfWebTagger(url='http://127.0.0.1:5000/1.6.7beta/tag/morph_analysis')
vabamorph_web_tagger

name,output layer,output attributes,input layers
VabamorfWebTagger,morph_analysis,"('normalized_text', 'lemma', 'root', 'root_tokens', 'ending', 'clitic', 'form', 'partofspeech')","('words', 'sentences', 'compound_tokens')"

0,1
url,http://127.0.0.1:5000/1.6.7beta/tag/morph_analysis


In [3]:
vabamorph_web_tagger.about

'Tags morphological analysis using EstNLTK 1.6.7beta webservice.'

In [4]:
vabamorph_web_tagger.status

'OK'

In [5]:
vabamorph_web_tagger.is_alive

True

In [6]:
text = Text('See on lause.')
text.tag_layer(['sentences'])

vabamorph_web_tagger.tag(text)

text.pop_layer('tokens')

text

text
See on lause.

layer name,attributes,parent,enveloping,ambiguous,span count
sentences,,,words,False,1
words,normalized_form,,,True,4
morph_analysis,"normalized_text, lemma, root, root_tokens, ending, clitic, form, partofspeech",words,,True,4
