Find file
Fetching contributors…
Cannot retrieve contributors at this time
116 lines (98 sloc) 5.15 KB
Parsing Time: Learning to Interpret Time Expressions
We present a probabilistic approach for learning to interpret temporal
phrases given only a corpus of utterances and the times they reference.
While most approaches to the task have used regular expressions and similar
linear pattern interpretation rules, the possibility of phrasal embedding
and modification in time expressions motivates our use of a compositional
grammar of time expressions. This grammar is used to construct a latent
parse which evaluates to the time the phrase would represent, as a logical
parse might evaluate to a concrete entity. In this way, we can employ a
loosely supervised EM-style bootstrapping approach to learn these latent
parses while capturing both syntactic uncertainty and pragmatic ambiguity
in a probabilistic framework. We achieve an accuracy of 72% on an adapted
TempEval-2 task -- comparable to state of the art systems.
Scala 2.9.1
JavaNLP April 9 2012
Most experiments can be run with something in the rc file
interpret: run the interpretation model (last valid run)
interpretAux: run the interpretation model (good parameters)
sutimeInterpret: run SUTime
heideltimeInterpret: run HeidelTime
gutimeInterpret: run GUTime
detect: run the detection model (last valid run)
dist/time.jar: the program
dist/interpretModel.ser.gz: serialized temporal interpretation model
dist/detectModel.ser.gz: serialized temporal detection model (includes
interpretation model)
dist/log.gz: the log from the run producing interpretModel.ser.gz
dist/options: the command line options from the run
Some key classes are explained below. This is not an exhaustive list, but
hopefully can help give a sense for the code.
Data.scala: Utilities for working with tempeval data, or the CoreMaps created
from them
class TimeDataset: holds the CoreMaps corresponding to the raw data
object DataLib: utilities for working with the data
Detect.scala: Code for the detection model
class CRFFeatures: the definition of the features used
class CRFDetector: the class which performs and aggregates the detections
class DetectionDatum: encapsulates the input information for detecting a
time during training
class CoreMapDetectionStore: the dataset used for detection; handles
iterating over the data
class DetectionTask: trains the CRF detection module
Entry.scala: General purpose code and main methods
class Indexing: handles moving between strings and ints for POS and words
object G: general utility fields
object U: general utility functions
class TimeSent: Embodies a sentence sent to the CKY parser
trait DataStore: interface for a block of data (train/dev/test)
*object Interpret: MAIN for running interpretation or detection for
either our system or another system (via a CoreNLP Annotator)
*object Entry: MAIN for training the models; this is the usual entry
point for the program, when not evaluating
Gigaword.scala: Some old code to try to get more training instances from
automatically parsed data
Interpret.scala: Most of the code for the interpretation model lives here
class Grammar: defines the grammar used in CKY parsing
object Grammar: creates the actual temporal grammar
*class TreeTime: MODEL this is our model, which can be serialized and loaded.
It is also a CoreNLP annotator, and does both detection and interpretation.
class InterpretationTask: The code for training the interpretation model All the options for training the model(s). Some of these may be dead
Tempeval2.scala: Code for processing the TempEval data into CoreMaps
Temporal.scala: The temporal representation
trait Temporal: all temporal representations subclass this
trait Range: all ranges subclass this (see paper)
trait Duration: all durations subclass this (see paper)
trait Sequence: all sequences subclass this (see paper). Note that it is
both a Range and a Duration as well
class Time: a point in time
class GroundedRange: the conventional begin-to-end range (e.g., 1998)
class UngroundedRange: a 'floating' range (e.g., today)
class GroundedDuration: the conventional x-milliseconds type of duration
class FuzzyDuration: an approximate duration
class RepeatedRange: the standard range-every-n-milliseconds type of sequence
class CompositeRange: a complex sequence that we can't evaluate analytically
class NoTime: returned when an invalid operation is performed and no such
time actually exists (e.g., Feb 30)
class UnkTime: a time that we can't represent
object Range: utility functions for handling ranges, notably intersecting
*object Time: MAIN for running an interactive shell for evaluating temporal
expressions. This is actually kind of a nifty domain specific language.
object Lex: the definitions of times used in the grammar
- The clases in etc/ are licensed under various licenses -- they're included
here to make compilation easier, but care should be taken when using this
code in any commerical setting to respect their respective licenses.