# EventTagger

A class that finds a list of events from **Text** object based on user-provided vocabulary. The events are tagged by several metrics (**start**, **end**, **cstart**, **wstart**) and user-provided classificators.

## Usage

Create file `data/event vocabulary.csv` in *csv* format:
```
term,value,type
Väga sage,sage,sagedus
Sage,sage,sagedus
peavalu,peavalu,sümptom
kõhukinnisus,kõhukinnisus,sümptom
```

There must be one column with the header **term** in the file. That column contains the strings searched from the text. No term may be a substring of another term. Other columns are optional. No column may have heading **start**, **end**, **cstart**, **wstart** or **wend**.

Create **EventTagger** object, **Text** object and list of events.

In [1]:
from episode_miner.event_tagger import EventTagger
from estnltk import Text
from pprint import pprint

In [2]:
event_tagger = EventTagger('data/event vocabulary.csv')

text = Text('Väga sage kõhukinnisus. Sagedane sümptom on peavalu.')

events = event_tagger.tag_events(text, method='ahocorasick')

pprint(events)

[{'cstart': 0,
  'end': 9,
  'start': 0,
  'term': 'Väga sage',
  'type': 'sagedus',
  'value': 'sage',
  'wstart': 0},
 {'cstart': 2,
  'end': 22,
  'start': 10,
  'term': 'kõhukinnisus',
  'type': 'sümptom',
  'value': 'kõhukinnisus',
  'wstart': 1},
 {'cstart': 5,
  'end': 28,
  'start': 24,
  'term': 'Sage',
  'type': 'sagedus',
  'value': 'sage',
  'wstart': 3},
 {'end': 51,
  'start': 44,
  'term': 'peavalu',
  'type': 'sümptom',
  'value': 'peavalu'}]


The **method** is either 'ahocorasick' or 'naive'. 'naive' is slower in general but does not depend on **pyahocorasic** package. The word start 'wstart' and char start 'cstart' are calculated as if all the events consist of one char.

In [3]:
from estnltk import PrettyPrinter
from IPython.display import HTML

text['events'] = events
pp = PrettyPrinter(background='events')
HTML(pp.render(text, True))