Skip to content

paramitamirza/IndoTimex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IndoTimex

Temporal expression extraction system, including temporal expression recognition and normalization, for Indonesian language, written in Python.

###Requirements

###Usage ! The input file(s) must be in the TimeML annotation format !

python python TimexExtraction.py dir_name [options]        or
python python TimexExtraction.py file_name [options]

options: -o output_dir_name/file_name (default: dir_path/dir_name_Timex/ for directory and file_path/file_name_timex.tml for file)

The output file(s) will be a TimeML document annotated with temporal expressions (TIMEX3 tags).

#####To convert TimeML file(s) to HTML for better viewing

python python ConvertToHTML.py dir_name [options]        or
python python ConvertToHTML.py file_name [options]

options: -o output_dir_name/file_name (default: dir_path/dir_name_HTML/ for directory and file_path/file_name.html for file)

###Modules IndoTimex contains two main modules:

  1. Timex recognition, a finite state transducer (FST) to recognize temporal expressions and their types (based on the TimeML standard, i.e. DATE, DURATION, TIME and SET). The complete FST can be seen in lib/fst/timex.pdf (minimized and drawn with OpenFST).
  2. Timex normalization, an extension of TimeNorm, a library for normalizing the values of temporal expressions (based on the ISO 8601 standard) using synchronous context free grammars, for Indonesian language. To run the timex normalizer: java -jar ./lib/timenorm-id-0.9.2-jar-with-dependencies.jar ./lib/id.grammar.

#####Publication Paramita Mirza. 2015. Recognizing and Normalizing Temporal Expressions in Indonesian Texts. (to appear) In Proceedings of the Conference of the Pacific Association for Computational Linguistics (PACLING 2015), Bali, Indonesia, May. [pdf]

#####Dataset The dataset for development and evaluation phases of the system is available in dataset/, comprising 75 news articles taken from www.kompas.com.

! Whenever making reference to this resource please cite the paper in the Publication section. !

###Demo The online demo is available at http://paramitamirza.ml/indotimex/.

###Contact For more information please contact Paramita Mirza (paramita@fbk.eu).

About

Time expression extraction module, including time expression recognition and normalization, for Indonesian language.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages