Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
lib
 
 
 
 

README.md

IndoTimex

Temporal expression extraction system, including temporal expression recognition and normalization, for Indonesian language, written in Python.

###Requirements

###Usage ! The input file(s) must be in the TimeML annotation format !

python python TimexExtraction.py dir_name [options]        or
python python TimexExtraction.py file_name [options]

options: -o output_dir_name/file_name (default: dir_path/dir_name_Timex/ for directory and file_path/file_name_timex.tml for file)

The output file(s) will be a TimeML document annotated with temporal expressions (TIMEX3 tags).

#####To convert TimeML file(s) to HTML for better viewing

python python ConvertToHTML.py dir_name [options]        or
python python ConvertToHTML.py file_name [options]

options: -o output_dir_name/file_name (default: dir_path/dir_name_HTML/ for directory and file_path/file_name.html for file)

###Modules IndoTimex contains two main modules:

  1. Timex recognition, a finite state transducer (FST) to recognize temporal expressions and their types (based on the TimeML standard, i.e. DATE, DURATION, TIME and SET). The complete FST can be seen in lib/fst/timex.pdf (minimized and drawn with OpenFST).
  2. Timex normalization, an extension of TimeNorm, a library for normalizing the values of temporal expressions (based on the ISO 8601 standard) using synchronous context free grammars, for Indonesian language. To run the timex normalizer: java -jar ./lib/timenorm-id-0.9.2-jar-with-dependencies.jar ./lib/id.grammar.

#####Publication Paramita Mirza. 2015. Recognizing and Normalizing Temporal Expressions in Indonesian Texts. (to appear) In Proceedings of the Conference of the Pacific Association for Computational Linguistics (PACLING 2015), Bali, Indonesia, May. [pdf]

#####Dataset The dataset for development and evaluation phases of the system is available in dataset/, comprising 75 news articles taken from www.kompas.com.

! Whenever making reference to this resource please cite the paper in the Publication section. !

###Demo The online demo is available at http://paramitamirza.ml/indotimex/.

###Contact For more information please contact Paramita Mirza (paramita@fbk.eu).

About

Time expression extraction module, including time expression recognition and normalization, for Indonesian language.

Resources

Releases

No releases published

Packages

No packages published

Languages

You can’t perform that action at this time.