# DateTagger

A class that tags dates in **Text** object.

## Usage

In [1]:
from date_tagger import DateTagger
from estnltk import *
from pprint import pprint
from IPython.display import HTML, FileLink
from estnltk.names import TEXT, START, END

In [2]:
tagger = DateTagger(return_layer=False, layer_name = 'date')


### Example

In [3]:
sent = Text('''07.07.2011 14:25 - KOLK, REIN - D04946 - E170 - kardioloogia. 
Tuleb 28.05., enne KTG.
pt.-l 2007a.-l diagnoositud sügatõbi:',
'11.09.13. tehtud S.Rhesonatiivi 1250TÜ i/m',
'Eelmine kord 2009 ja diagnoosika jäi reaktiivne artropaatia',
            'haava revideerimine od kl 21.20',
' 17.09.2013.a. kell 06:09 sünnib elus ajaline T 3250/50.',
'Kontrolli 20.04.2013 kell 11.00 I-korrus, 
'Lõikus 4.1.2013.',
'Kontrollile- 09.2011'
"05.09.2012 tehtud SKG
1.09 angiograafia leid 
02.09 kell 09.15 taastus siinusrütm''')

In [4]:
tagger.tag(sent)

{'date': [{'end': 16,
   'example': '21.03.2015 15:30:45',
   'extracted_values': {'datetime': datetime.datetime(2011, 7, 7, 14, 25)},
   'groups': {'DAY': '07',
    'MONTH': '07',
    'YEAR': '2011',
    'hour': '14',
    'minute': '25',
    'second': None},
   'probability': '0.9',
   'regex': '(^|[^0-9])(?P<DAY>(0?[1-9]|[12][0-9]|3[01]))\\.\\s*(?P<MONTH>(0?[1-9]|1[0-2]))\\.\\s*(?P<YEAR>((19[0-9]{2})|(20[0-9]{2})|([0-9]{2})))\\s*(?P<hour>[0-2][0-9])[:](?P<minute>[0-5][0-9])(:(?P<second>[0-5][0-9]))?',
   'start': 0,
   'type': 'date_time'},
  {'end': 98,
   'example': '1998a',
   'extracted_values': {},
   'groups': {'LONGYEAR': '2007'},
   'probability': '0.8',
   'regex': '(^|[^0-9])(?P<LONGYEAR>((19[0-9]{2})|(20[0-9]{2})))\\s*a',
   'start': 92,
   'type': 'partial_date'},
  {'end': 137,
   'example': '12.01.98',
   'extracted_values': {'date': datetime.date(2013, 9, 11)},
   'groups': {'DAY': '11', 'MONTH': '09', 'YEAR': '13'},
   'probability': '0.8',
   'regex': '(^|[^0-9])(?P<

The tagger tags 4 types of dates: date_time, date, time and partial_date (only a year, year and month or month and day). The examples of different types can be seen from below (meanings of colours are seen from the list 'rules'). For the first three types, an attribute 'extracted_values' is added which contains the detected datetime object.

In [5]:
rules = [
            ('date_time', 'pink'),
            ('date', 'lightgreen'),
            ('partial_date', 'yellow'),
            ('time', 'lightblue')
        ]



def extract(text):
    return ({TEXT: t['type'], 
            START: t[START], 
             END: t[END]} 
            for t in text['date'])

pp = PrettyPrinter(background=extract, background_value=rules)
html = pp.render(sent, True)
HTML(html)