# Temporal expression (TIMEX) tagger

## Introduction

Natural language texts often contain temporal expressions referring to calendrical timepoints or periods, such as _24. aprillil_ (on 24th of April), _kaks aastat_ (two years) or _igal aastal_ (annually).
These kinds of temporal expressions can be automatically detected and semantically analysed by EstNLTK's `TimexTagger`. The tool identifies temporal expression phrases ( _timexes_ ) in text and normalizes these expressions, providing corresponding calendrical dates, times and durations. 


The recommended way of using `TimexTagger` is via `TIMEXES_RESOLVER`, which includes tokenization corrections that improve the quality of temporal expression detection. Example:

In [1]:
from estnltk import Text
from estnltk.taggers.standard_taggers.timex_tagger_preprocessing import TIMEXES_RESOLVER

# Create new text object
text = Text('Potsataja ütles eile, et vaatavad nüüd Genaga viie aasta plaanid uuesti üle.')

# Mark creation time of the document
text.meta['document_creation_time'] = '2014-12-03'

# Annotate temporal expressions ('timexes')
text.tag_layer(['timexes'] , resolver=TIMEXES_RESOLVER)

# Browse results
text.timexes[['text','type','value','temporal_function']]

Unnamed: 0,text,type,value,temporal_function
0,['eile'],DATE,2014-12-02,True
1,['nüüd'],DATE,PRESENT_REF,True
2,"['viie', 'aasta']",DURATION,P5Y,False


A detailed description of Estonian temporal expression tagger, including algorithmic details and evaluation of the performance is available in the article [Orasmaa (2012)](http://dx.doi.org/10.5128/ERYa8.10); evaluation results are also summarized in the article [Orasmaa and Kaalep (2017)]( http://www.ep.liu.se/ecp/131/022/ecp17131022.pdf).

Note: if you want to use `TimexTagger` object directly, see the "Technical notes" below for details.

## TIMEX attributes: a breif overview

The attributes of temporal expression are based on the attributes of the TIMEX3 tag in [TimeML](https://en.wikipedia.org/wiki/TimeML). While there are quite a few attributes, at first, you may want to focus on the attributes `type`, `value` and `temporal_function`, which convey the most important normalization details (for a robust practical analysis):

In [2]:
# Browse types, values and temporal functions
text.timexes[['text','type', 'value', 'temporal_function']]

Unnamed: 0,text,type,value,temporal_function
0,['eile'],DATE,2014-12-02,True
1,['nüüd'],DATE,PRESENT_REF,True
2,"['viie', 'aasta']",DURATION,P5Y,False


Attributes explained in more detail:

   * **`type`** -- type of the temporal expression. Can be one of the following:
       * `DATE` -- occurrence dates, such as _24. aprillil_ (on 24th of April), or _eelmise aastal_ (in last year);
       * `TIME` -- occurrence times which have a granularity smaller than day, 
         such as _neljapäeva pärastlõunal_ (on Thursday afternoon),  _eile kell 3 päeval_ (yesterday at 3 p.m.);
       * `DURATION` -- duration specifications, such as _kaks aastat_ (two years), or _pool päeva_ (half a day);
       * `SET` -- recurrence specifications, such as _kolmapäeviti_ (on every Wednesday) or _igal aastal_ (annually);


   * **`value`** -- semantics of the expression (mostly calendrical). Examples:
   
       * Most date and time expressions will be normalized based on the ISO datetime format `yyyy-mm-ddThh:mm`. For instance, _'24. aprillil 2009'_ (on 24th of April, 2009) will be normalized as `value=2009-04-24`;
       * Duration expressions will be normalized based on the ISO duration format `P[n]Y[n]M[n]DT[n]H[n]M[n]S`. For instance, _'kaks aastat'_ (two years) will be normalized as `value=P2Y`;
       * For common non-calendrical time expressions, special labels will be used in the value part. For instance, _'nüüd'_ (now) will be normalised as a reference to the present time (`value=PRESENT_REF`);
       
       See the section "Details of the annotation format" below for more information about possible formats of the `value`;
       
     
   * **`temporal_function`** -- boolean indicating whether the semantics of the expression are relative to the context ( that is: have been calculated/need to be calculated by some function, hence the name `temporal_function` );
   
      * For `DATE` and `TIME` expressions:
      
           * `temporal_function=true` indicates that the expression is relative. For instance, _'eile'_ (yesterday) has `temporal_function=true` because its value will be calculated relative to the context (the creation time of the document);
           
           * `temporal_function=false` indicates that the expression is absolute. For instance, _'2009. aastal'_ (in 2009) has `temporal_function=false` because its value will be copied from the textual part of the expression (no need for calculations);
           
     * For `DURATION` expressions, `temporal_function` is mostly `false`, except for vague durations;
     * For `SET` expressions, `temporal_function` is always `true`;
     
Other timex attributes ( `tid`, `mod`, `anchor_time_id`, `quant`, `freq`, `begin_point`, `end_point` and `part_of_interval` ) will be explained in more detail in the section _"Details of the annotation format"_ below.

## Document creation date

In order to get the semantics of relative date and time expressions correct, you need to provide the _document creation time_ (_DCT_ in short), which is then used to calculate semantics of expressions such as _'eile'_ (yesterday) and _'järgmisel neljapäeval'_ (on next Thursday).

`TimexTagger` looks for the _document creation time_ under the text metadata, searching for keys named `'dct'`, `'creation_time'` or `'document_creation_time'` (in that precise order). Normally, it is expected that _document creation time_ is a string in the ISO datetime format (`YYYY-mm-ddTHH:MM`) or in ISO date format (`YYYY-mm-dd`):

In [3]:
# Create new text object
text = Text('Tulid eile meile, selle asemel et tulla täna?')

# Mark creation time of the document in the metadata
text.meta['document_creation_time'] = '2010-04-26'

# Annotate temporal expressions ('timexes')
text.tag_layer(['timexes'] , resolver=TIMEXES_RESOLVER)

# Browse results
text.timexes[['text', 'type', 'value', 'temporal_function']]

Unnamed: 0,text,type,value,temporal_function
0,['eile'],DATE,2010-04-25,True
1,['täna'],DATE,2010-04-26,True


Optionally, you can also use Python's `datetime` object to specify the creation date:

In [4]:
# Create new text object
text = Text('Tulid eile meile, selle asemel et tulla täna või homme?')

# Mark creation time of the document in the metadata
import datetime
text.meta['document_creation_time'] = datetime.datetime(1986, 12, 21)

# Annotate temporal expressions ('timexes')
text.tag_layer(['timexes'] , resolver=TIMEXES_RESOLVER)

# Browse results
text.timexes[['text', 'type', 'value', 'temporal_function']]

Unnamed: 0,text,type,value,temporal_function
0,['eile'],DATE,1986-12-20,True
1,['täna'],DATE,1986-12-21,True
2,['homme'],DATE,1986-12-22,True


 * **Note:** if there is no _document creation time_ specified in the metadata of the `Text` object, then `TimexTagger` assumes that the execution time of the tagger is the DCT. As a result of tagging, the execution time of the tagger will also be stored in the metadata, under the key `'document_creation_time'`;

### Gaps in document creation date

There can be situations when the exact document creation date cannot be specified. 
For instance, it may be that only year or month when the document was created is known, and there is no information about the exact date. 
In such cases, the string-based document creation date can have gaps: unknown granulatities can be replaced by `'X'` symbols. For instance, if we only know the year of writing (_2009_), we can use `'2009-XX-XX'` as the document creation date:

In [5]:
# Create new text object
text = Text('Homme või järgmisel aastal?')

# Mark creation time of the document
text.meta['document_creation_time'] = '2009-XX-XX'

# Annotate temporal expressions ('timexes')
text.tag_layer(['timexes'] , resolver=TIMEXES_RESOLVER)

# Browse results
text.timexes[['text', 'type', 'value', 'temporal_function']]

Unnamed: 0,text,type,value,temporal_function
0,['Homme'],DATE,XXXX-XX-XX,True
1,"['järgmisel', 'aastal']",DATE,2010,True


Note that using gaps in DCT also affects how relative date and time expressions are normalized. If a relative expression has granularity that is not specified ( such as the expression _homme_ (tomorrow) in the previous example -- it has granularities _day_ and _month_ which cannot be resolved using the given DCT ), then its value is also covered with `'X'` symbols, indicating that there is not enough information to find the exact value. 

 * What to keep in mind when using gaps in DCT:

    * You should start marking `'X'` symbols from the right side of DCT, and the markings should be continuous. Discontinuous gaps (such as `'2009-0X-X1'`) and gaps that cover a granularity only partially (such as `'2009-1X-XX'`) do not work -- they may actually lead to unexpected processing errors. However, DCT formats `XXXX-XX-XX`, `yyyy-XX-XX` and `yyyy-mm-XX` should be safe for usage;

    * Marking gaps in DCT is an _experimental feature_. Do not expect that it always works automagically, rather, test it by yourself and see, if the results fit your purpose;


---

## Details of the annotation format

In this section, we will give more details about the attributes of the layer `'timexes'`. 

Attributes **`tid`, `type`, `value`** and **`temporal_function`** are filled in for every timex. These attributes give the most basic details about (semantics of) the expression.

Attributes **`mod`, `anchor_time_id`, `quant`, `freq`, `begin_point`, `end_point`** and **`part_of_interval`** are filled in only in specific contexts -- they give extra details (about semantics). Otherwise, these attributes will have **`None`** values.

### The attribute `tid`

The attribute `tid` provides a unique identifier for each temporal expression. Identifier is a string that has prefix `t`, followed by the number of timex. Numbering of timexes starts from 1. Example:

In [6]:
# Create new text object
text = Text('Kaks aastat põrandaalust aktiivset tegevust, 14 aastat vanglat ning 30 aastat pideva '+
            'nuhkimise all elamist.')

# Annotate temporal expressions ('timexes')
text.tag_layer(['timexes'] , resolver=TIMEXES_RESOLVER)

# See results
text.timexes[['text', 'tid', 'type', 'value']]

Unnamed: 0,text,tid,type,value
0,"['Kaks', 'aastat']",t1,DURATION,P2Y
1,"['14', 'aastat']",t2,DURATION,P14Y
2,"['30', 'aastat']",t3,DURATION,P30Y


 * **Note:** the identifier **`t0`** refers to the _document creation time_. It is never used as a `tid` of a timex, but it can be used (referred to) in the attributes `anchor_time_id`, `begin_point` and `end_point`, whenever calculations of semantics involve the using _document creation time_;

### The attribute `type`

... indicates the type of the temporal expression. Can be one of the following:
   * `DATE` -- occurrence dates, such as _24. aprillil_ (on 24th of April), or _eelmise aastal_ (in last year);
   * `TIME` -- occurrence times which have a granularity smaller than day, such as _neljapäeva pärastlõunal_ (on Thursday afternoon),  _eile kell 3 päeval_ (yesterday at 3 p.m.);
   * `DURATION` -- duration specifications, such as _kaks aastat_ (two years), or _pool päeva_ (half a day);
   * `SET` -- recurrence specifications, such as _kolmapäeviti_ (on every Wednesday) or _igal aastal_ (annually);


### The attribute `value`

... conveys most important part of the semantics of the temporal expression -- the semantics of a date, time or duration based on the ISO datetime format. There are five possible formats:

   * I. Date-based format: `yyyy-mm-ddThh:mm`

           yyyy - year (4 digits)
           mm - month (01-12)
           dd - day (01-31)
             
   * II. Weekday-based: `yyyy-Wnn-wdThh:mm`

           nn - the week of the year (01-53)
           wd - day of the week (1-7, where 1 denotes Monday).

   * III. Time-based: `Thh:mm`

           hh - hour of day (00-23)
           mm - minute of hour (00-59)
       
   * IV. Time span: `Pn1Yn2Mn3Wn4DTn5Hn6M`

           where ni denotes a value and Y (year), M (month), 
           W (week), D (day), H (hours), M (minutes) denotes 
           respective time granularity.
       
   * V. Special labels, such as `PRESENT_REF` and `PAST_REF`
             
           in some cases, special labels are used to 
           express the date & time semantics. See the 
           annotation guidelines below for more details 
           
           
Formats I and II are used with DATE, TIME and SET types. Format I is always preferred if both I and II can be used. Format III is used in cases when it is impossible to extract the date information. Format IV is used in DURATION expressions, and in some SET expressions.

Parts of the ISO datetime format can be replaced by labels conveying special semantics of commonly used temporal expressions:

   * `hh:mm` (_hours and minutes_) can be replaced by a label referring to a _time of the day_:
         
         MO - morning - hommik
         AF - afternoon - pärastlõuna
         EV - evening - õhtu
         NI - night - öö
         DT - daytime - päevane aeg

   * `wd` (_weekday_) can be replaced by a general label referring to a group of weekdays:
   
         WD - workday - tööpäev
         WE - weekend - nädalalõpp

   * `mm` (_month_) can be replaced by a general label referring to a season:
         
         SP - spring - kevad
         SU - summer - suvi
         FA - fall - sügis
         WI - winter - talv
         
   * `mm` (_month_) can also be replaced by a general label referring to a quarter (of year):
   
         Q1, Q2, Q3, Q4
         QX - unknown/unspecified quarter
         

_Shortened `value`._ If a DATE expression does not contain all the information that can be expressed in the format ( `yyyy-mm-dd` or `yyyy-Wnn-wd` ), then `value` will be shortened from the right side, leaving out unspecified information. For instance, if the expression contains only _year_ and _month_ level information, then the `value` will be shortened and the _day_ part will be left out:

In [7]:
# Create new text object
text = Text('Digi-TV-le minnakse üle kõikjal maailmas, viimati eelmisel kuul kogu USAs, lisas ta.')

# Mark creation time of the document
text.meta['document_creation_time'] = '2009-07-02'

# Annotate temporal expressions ('timexes')
text.tag_layer(['timexes'] , resolver=TIMEXES_RESOLVER)

# See results
text.timexes[['text', 'type', 'value']]

Unnamed: 0,text,type,value
0,"['eelmisel', 'kuul']",DATE,2009-06


In a similar way, if the expression contains only _year_ level information, and there is a lack of information about the specific year: only a decade or a century is mentioned, then the `value` will be further shortened from the right, leaving out unspecified _year_ information:

In [8]:
# Create new text object
text = Text('Üheksakümnendatel aastatel tegutses Saaremaal seitse panka.')

# Mark creation time of the document
text.meta['document_creation_time'] = '2008-04-01'

# Annotate temporal expressions ('timexes')
text.tag_layer(['timexes'] , resolver=TIMEXES_RESOLVER)

# See results
text.timexes[['text', 'type', 'value']]

Unnamed: 0,text,type,value
0,"['Üheksakümnendatel', 'aastatel']",DATE,199


_Before Common Era._ If the expression explicitly refers to date/time before the Common Era, then its `value` will have prefix `BC`:

In [9]:
# Create new text object
text = Text('Lülle laevkalmed, rajatud umbes 8. sajandil e.m.a., on Eestis ainulaadsed.')

# Annotate temporal expressions ('timexes')
text.tag_layer(['timexes'] , resolver=TIMEXES_RESOLVER)

# See results
text.timexes[['text', 'type', 'value']]

Unnamed: 0,text,type,value
0,"['umbes', '8.', 'sajandil', 'e.m.a.']",DATE,BC07


More detailed description of the `value` can be found in the **annotation guidelines** [here](https://github.com/soras/Ajavt/blob/master/doc/margendusformaat_et.pdf?raw=true) (currently only in Estonian);

### The attribute `temporal_function`

... is boolean indicating whether the semantics of the expression are relative to the context ( that is: have been calculated/need to be calculated by some function, hence the name `temporal_function` );
   
   * For `DATE` and `TIME` expressions:
      
       * `temporal_function=true` indicates that the expression is relative. For instance, _'eile'_ (yesterday) has `temporal_function=true` because its value will be calculated relative to the context (the creation time of the document);
           
       * `temporal_function=false` indicates that the expression is absolute. For instance, _'2009. aastal'_ (in 2009) has `temporal_function=false` because its value will be copied from the textual part of the expression (no need for calculations);
           
           
   * For `DURATION` expressions, `temporal_function` is mostly `false`, except for vague durations;
   * For `SET` expressions, `temporal_function` is always `true`;

### The attribute `mod`

... refers to a modifier of the semantics part in the `value`. It is used in special occasions when semantics cannot be expressed completely by the attribute `value` -- there is a need for an elaboration. For instance, the expression _'2009. aasta alguses'_ (in the beginning of 2009) will have `value=2009` and `mod=START`.
      
Another example:

In [10]:
# Create new text object
text = Text('Tavaliselt võtab paariks kasvamine aega umbes kaks aastat, mil toimub teineteise nurkade '+
            'mahalihvimine ning ühiste reeglite ja maailmapildi kujunemine.')

# Annotate temporal expressions ('timexes')
text.tag_layer(['timexes'] , resolver=TIMEXES_RESOLVER)

# See results
text.timexes[['text', 'type', 'value', 'mod']]

Unnamed: 0,text,type,value,mod
0,"['umbes', 'kaks', 'aastat']",DURATION,P2Y,APPROX


The attribute `mod` can have the following string values: `START`, `MID`, `END`, `FIRST_HALF`, `SECOND_HALF`, `APPROX`, `LESS_THAN`, `MORE_THAN`, `EQUAL_OR_LESS` or `EQUAL_OR_MORE`.

### The attribute `anchor_time_id`

... refers to the time point (identifier of the time point) which was used as the _reference time_ in calculating semantics of a relative expression. 
For relative date expressions such as _eile_ (yesterday) and _nüüd_ (now), the _reference time_ is usually the document creation date ("the speech time" or the "writing time"), so the value of the attribute will be `t0`:

In [11]:
# Create new text object
text = Text('Ma alles grillisin eile, nüüd on Davidi või Kevini kord.')

text.meta['dct'] = '2010-07-15'

# Annotate temporal expressions ('timexes')
text.tag_layer(['timexes'] , resolver=TIMEXES_RESOLVER)

# See results
text.timexes[['text', 'tid', 'type', 'value', 'anchor_time_id']]

Unnamed: 0,text,tid,type,value,anchor_time_id
0,['eile'],t1,DATE,2010-07-14,t0
1,['nüüd'],t2,DATE,PRESENT_REF,t0


The _reference time_ can also be some (previous) temporal expression in text, like in the following example:

In [12]:
# Create new text object
text = Text('Seetõttu on 2006 aastal oodata rohkem kutsikaid, kui aasta varem.')

# Annotate temporal expressions ('timexes')
text.tag_layer(['timexes'] , resolver=TIMEXES_RESOLVER)

# See results
text.timexes[['text', 'tid', 'type', 'value', 'anchor_time_id']]

Unnamed: 0,text,tid,type,value,anchor_time_id
0,"['2006', 'aastal']",t1,DATE,2006,
1,"['aasta', 'varem']",t2,DATE,2005,t1


_Notes:_
   * in case of an absolute DATE/TIME expression (or a DURATION or SET expression), the `anchor_time_id` will be `None`;
   
   
   * how relative datetime expressions should be anchored dependes on the text's domain (or subdomain). By default, `TimexTagger` uses a  set of rules that have been developed for analysing texts from the news domain. If you need to analyse texts from some other domain, you may want to adjust the rules for different anchoring strategies to get the correct normalizations -- see below for instructions how to do it;
   
   
   * if a DATE expression has `temporal_function == True` and `anchor_time_id` is not set by the rules, then `TimexTagger` sets `anchor_time_id` to `t0`;

### Attributes `quant` and `freq`

... elaborate semantics of SET expressions. The attribute `quant` contains the quantifier keyword extracted from the recurrence expression:

In [13]:
# Create new text object
text = Text('Põhjanaela mass väheneb igal aastal ligikaudu Maa massi võrra .')

# Annotate temporal expressions ('timexes')
text.tag_layer(['timexes'] , resolver=TIMEXES_RESOLVER)

# See results
text.timexes[['text', 'type', 'value', 'quant', 'freq']]

Unnamed: 0,text,type,value,quant,freq
0,"['igal', 'aastal']",SET,P1Y,EVERY,


In the example above, `value=P1Y` indicates the recurring period (yearly / annual period), and the attribute `quant` fixes the quantifier applied to the period: `EVERY` is the English equivalent of the Estonian quantifier keyword _iga_.

The attribute `freq ` specifies the exact number of recurrence times (within the recurring period) -- if this information is explicitly available in the temporal expression. Example:

In [14]:
# Create new text object 
text = Text('Füüsilist pingutust nõudva tegevusega tuleks tegeleda vähemalt kaks korda nädalas.')

# Annotate temporal expressions ('timexes')
text.tag_layer(['timexes'] , resolver=TIMEXES_RESOLVER)

# See results
text.timexes[['text', 'type', 'value', 'quant', 'freq', 'mod']]

Unnamed: 0,text,type,value,quant,freq,mod
0,"['vähemalt', 'kaks', 'korda', 'nädalas']",SET,P1W,,2X,EQUAL_OR_MORE


In the example above, `value=P1W` indicates the recurring period (a week), and the attribute `freq` fixes recurrence frequency during that period (`2X` means "twice a week"). In addition, the phrase also contains _modifier_ `EQUAL_OR_MORE`, so the frequency `2X` can be interpreted as a lower bound of a possibly greater frequency.

### Attributes `begin_point` and `end_point`

... elaborate semantics of DURATION expressions that refer to implicit start and end points of the duration. The attribute `begin_point` refers to the starting point of the interval, and the attribute `end_point` refers to the ending point.

The exact value of `begin_point` / `end_point` can be either a timex identifier (a string), or a dictionary representing the implicit timex (a dictionary which contains keys `tid`, `type`, `value`, ...). 
Currently, timex identifier is only used if the timepoint refers to the document creation time (`t0`), and all other implicit timexes will be represented as dictionaries. Example:

In [15]:
# Create new text object
text = Text('Erinevad kotkauurijad on Saaremaal kaljukotkaid viimase kümne aasta jooksul aeg-ajalt lendamas näinud.')

# Set document creation time
text.meta['dct'] = '2013-XX-XX'

# Annotate temporal expressions ('timexes')
text.tag_layer(['timexes'] , resolver=TIMEXES_RESOLVER)

# See results
text.timexes[['text', 'type', 'value', 'begin_point', 'end_point']]

Unnamed: 0,text,type,value,begin_point,end_point
0,"['viimase', 'kümne', 'aasta', 'jooksul']",DURATION,P10Y,"OrderedDict([('tid', 't2'), ('type', 'DATE'), ('value', '2003'), ('temporal_function', True)])",t0


In the example above, `value=P10Y` indicates a time interval with the length of 10 years, the attribute `begin_point` fixes the starting point of the interval (not explicit in text, but can be calculated as a year 10 years ago -- 2003), and the attribute `end_point` fixes the end point (not explicit in text, but can be associated with the document creation date (the "time of writing")).

_A Technical Note_:

   * In order to ensure fixed order of keys, implicit start and end points use [`OrderedDict`](  https://docs.python.org/3/library/collections.html#collections.OrderedDict)-s instead of regular Python dictionaries. However, you can switch to the regular Python dictionaries in the output if you initiate `TimexTagger` with parameter `output_ordered_dicts=False`:
   
```python
from estnltk.taggers import TimexTagger
timexTagger = TimexTagger(output_ordered_dicts=False)
TIMEXES_RESOLVER.taggers.rules['timexes'].close()  # terminate old tagger
TIMEXES_RESOLVER.update( timexTagger )             # add new tagger
```

### The attribute `part_of_interval`

... elaborates semantics of DATE/TIME expressions that are part of an implicit time interval. If a DATE/TIME expression is part of an implicit interval, then `part_of_interval` is a dictionary representing the implicit timex (a dictionary which contains keys `tid`, `type`, `value`, ...). For example:

In [16]:
# Create new text object
text = Text('04.- 05. juulini toimus XXV Üldlaulupidu ja XVIII Üldtantsupidu.')

# Set document creation time
text.meta['dct'] = '2009-09-01'

# Annotate temporal expressions ('timexes')
text.tag_layer(['timexes'] , resolver=TIMEXES_RESOLVER)

# See results
text.timexes[['text', 'tid', 'type', 'value', 'part_of_interval']]

Unnamed: 0,text,tid,type,value,part_of_interval
0,['04.'],t1,DATE,2009-07-04,"OrderedDict([('tid', 't3'), ('type', 'DURATION'), ('value', 'PXXD'), ('temporal_ ..., type: <class 'collections.OrderedDict'>, length: 6"
1,"['05.', 'juulini']",t2,DATE,2009-07-05,"OrderedDict([('tid', 't3'), ('type', 'DURATION'), ('value', 'PXXD'), ('temporal_ ..., type: <class 'collections.OrderedDict'>, length: 6"


In the example above, starting and ending points of the interval are marked as explicit timexes (start: 4th of July, end: 5th of July). 
The interval itself is an implicit timex (DURATION), which refers to its explicit timex parts via attributes `begin_point` and `end_point`.
The exact `value` of implicit interval is left unspecified (`value=PXXD`).

_Notes:_

  * In the current implementation, `value`-s of timexes of implicit intervals are not calculated. So, there will always be values like `PXXY` (an unspecified amount of years), `PXXD` (an unspecified amount of days) and so on;
  

  * The TIMEX3 tag in the [TimeML](https://en.wikipedia.org/wiki/TimeML) standard has only attributes `begin_point` (`beginPoint`) and `end_point` (`endPoint`) for referring to  implicit timexes, and the attribute `part_of_interval` is not supported. Because EstNLTK has no means for representing "empty tags" (the TimeML uses "empty tags" for implicit timexes), we have chosen to introduce an extra attribute which conveys the same information.
  
  
  * _A technical note_: in order to ensure fixed order of keys, implicit interval uses an [`OrderedDict`](  https://docs.python.org/3/library/collections.html#collections.OrderedDict) instead of regular Python dictionary. However, you can switch to the regular Python dictionaries in the output if you initiate `TimexTagger` with parameter `output_ordered_dicts=False`:

```python
from estnltk.taggers import TimexTagger
timexTagger = TimexTagger(output_ordered_dicts=False)
TIMEXES_RESOLVER.taggers.rules['timexes'].close()  # terminate old tagger
TIMEXES_RESOLVER.update( timexTagger )             # add new tagger
```

---

#### The `use_normalized_word_form` parameter

The boolean parameter `use_normalized_word_form` specifies, if the normalized word forms are used in TimexTagger's input instead of the surface word forms (provided that normalized word forms are available). 
By default, `use_normalized_word_form` is enabled, but you can change the parameter upon initialization of `TimexTagger`.

Notes: 
  * in case of ambiguity of `normalized_form`, the first `normalized_form` is picked for the input;
  * if `normalized_form` is `None`, then the surface word form is picked for the input;

---

##  Technical notes about using `TimexTagger`


* **Note 1:** EstNLTK's `TimexTagger` uses a Java-based temporal expression tagger implementation. Before using the tagger, make sure that:
  * Java SE Runtime Environment (version >= 1.8) is installed into the system;
  * `java` is in the [PATH environment variable](https://docs.oracle.com/javase/tutorial/essential/environment/paths.html);

    Source code of the Java-based temporal expression tagger is available [here](https://github.com/soras/Ajavt).


* **Note 2:** Because `TimexTagger` uses Java resources, these resources need to be cleaned up after using the tagger. If you want to use `TimexTagger` without `TIMEXES_RESOLVER`, we recommend to use `TimexTagger` in a **`with`** statement as a _context manager_ (like in the example below), so that the resources will be automatically cleaned up afterwards. After the **`with`** context, the `TimexTagger` instance can no longer be used for tagging texts.

```python
from estnltk.taggers import TimexTagger

# Create new text object and add prerequisite layers
text = Text( ... ).tag_layer('morph_analysis')

# Create new timex tagger and annotate temporal expressions
with TimexTagger() as timexTagger: 
    timexTagger.tag( text )
```


* **Note 3:** If you need to create `TimexTagger` outside **`with`** context, you should use the method `close()` after tagging to terminate the process manually:


```python
from estnltk.taggers import TimexTagger

# Create new timex tagger
timexTagger = TimexTagger()

# Tag texts
...

# Release resources
timexTagger.close()
```

Because initiating `TimexTagger` takes time, and the process acquires memory resources, it is advisable to create `TimexTagger` instances sparingly (typically, one should be enough). 

## Creating new analysis rules for `TimexTagger`

`TimexTagger` comes with a default set of rules that have been developed for analysing texts from news domain. If you need to analyse texts from some other domain, you may need to adapt the rules: provide new patterns for detecting domain-specific expressions, and/or change the normalization strategies.

Rules used by the system are described in an XML file. The file (_reeglid.xml_) itself is available [here](https://github.com/estnltk/estnltk/blob/devel_1.6/estnltk/java/res/reeglid.xml), and its format is described in detail in [this document](https://github.com/soras/Ajavt/blob/master/doc/writingRules.txt).  You can make a copy of the rules file, modify the rules (or create your own rules from the scratch), and load `TimexTagger` with the new set of rules:


```python
from estnltk.taggers import TimexTagger

# Location of the new rules file
new_rules = 'C:\\My_stuff\\minu_reeglid.xml'

# Create new timex tagger with new rules
timexTagger = TimexTagger(rules_file=new_rules)

# Now, you can use TimexTagger to analyse texts with your own rules
...
        
# Release resources
timexTagger.close()
```

---