# EventText
A subclass of **Text** containing 'events' layer.
# EventTagger
A class that provides a method for **EventText** to create 'events' layer.
## Requirements
pyahocorasick
## Usage
Create file *data/event vocabulary.csv* in *pandas* standard *csv* format:

term,value,type<br />
Väga sage,sage,sagedus<br />
Sage,sage,sagedus<br />
Harv,harv,sagedus<br />
peavalu,peavalu,sümptom<br />
kõhukinnisus,kõhukinnisus,sümptom<br />
iiveldus,iiveldus,sümptom<br />
\*gap\*,\*gap\*,\*gap\*

There must be one column with header **term**. That column contains the strings searched from the text. Other columns are optional. One term may not be a substring of other term.

If there is a word in sense of Estnltk between consecutive terms, then the substring between these two terms is a **\*gap\***. If **consider_gaps=True** in **EventTagger** constructor, there must be a **term** named **\*gap\*** in the table. In this case **\*gap\*** events are created.

Create **EventTagger** object, **EventText** objects and *'events'* layers for **EventText** objects.

In [1]:
from EventText.EventText import EventText, EventTagger
from Winepi.Winepi import collection_of_frequent_episodes_new

event_tagger = EventTagger('/home/paul/workspace/MyTestProject/src/data/event vocabulary.csv', consider_gaps=True)

strings = ('Väga sage kõhukinnisus. Sage peavalu. Harv iiveldus. Väga harv minestus.',
           'Harv kõrvaltoime sellel ravimil on peavalu.')

event_texts = []
for string in strings:
    event_texts.append(EventText(string, event_tagger=event_tagger))

for event_text in event_texts:
    event_text.events()

for event in event_texts[0]['events']:
    print(event['term'], event['wstart'], event['wend'])

Väga sage 0 1
kõhukinnisus 1 2
*gap* 2 3
Sage 3 4
peavalu 4 5
*gap* 5 6
Harv 6 7
iiveldus 7 8
*gap* 8 13


The word start 'wstart', word end 'wend', char start 'cstart', char end 'cend' are calculated as if all the events, except the special event **\*gap\***, consist of one char having char length and word length equal 1.

The **event_sequence** method extracts from 'events' layer event sequence for Winepi algorithm.

In [2]:
event_sequences = []
for event_text in event_texts:
    event_sequences.append (event_text.event_sequence(count_event_time_by='word', classificator='type'))

print(event_sequences[0].start, event_sequences[0].end)
for es in event_sequences[0].sequence_of_events:
    print(es.event_type, es.event_time)

0 13
sagedus 0
sümptom 1
*gap* 2
sagedus 3
sümptom 4
*gap* 5
sagedus 6
sümptom 7
*gap* 8


# Winepi
A partial implementation of Winepi algorithm described by Mannila, Toivonen and Verkamo in *Discovery of Frequent Episodes in Event Sequences*, 1997.

In [3]:
window_width = 5
min_frequency = 0.01
number_of_examples = 0

frequent_episodes, examples = collection_of_frequent_episodes_new(event_sequences, 
                                                                  window_width, 
                                                                  min_frequency, 
                                                                  only_full_windows=False, 
                                                                  gaps_skipping=True, 
                                                                  number_of_examples = number_of_examples)

for episode in frequent_episodes:
    print(episode.relative_frequency, episode.freq_count, episode)

0.5714285714285714 16 ('sagedus',)
0.5714285714285714 16 ('sümptom',)
0.14285714285714285 4 ('sagedus', 'sagedus')
0.35714285714285715 10 ('sagedus', 'sümptom')
0.21428571428571427 6 ('sümptom', 'sagedus')
0.14285714285714285 4 ('sümptom', 'sümptom')
0.07142857142857142 2 ('sagedus', 'sagedus', 'sümptom')
0.14285714285714285 4 ('sagedus', 'sümptom', 'sagedus')
0.07142857142857142 2 ('sagedus', 'sümptom', 'sümptom')
0.14285714285714285 4 ('sümptom', 'sagedus', 'sümptom')
0.07142857142857142 2 ('sagedus', 'sümptom', 'sagedus', 'sümptom')
