Parsing non-strings? #3

ath opened this Issue Sep 25, 2012 · 1 comment


ath commented Sep 25, 2012

Is it possible to use amotoen to parse non-strings?
For example, I may want to have a tokenizer, which splits an input text (natural language) into a seq of tokens (record instances). Now I would like to recognize patterns in those token seqs.
"Date is 27.03.2008. It will cost 15$."
This gets tokenized to

 {:value "Date" :type :word},
 {:value "is" :type :word},
 {:value "27" :type :numeric},
 {:value "." :type :char},
 {:value "03" :type :numeric},
 {:value "." :type :char},
 {:value "2008" :type :numeric},
 {:value "." :type :char},
 {:value "It" :type :word},
 {:value "will" :type :word},
 {:value "cost" :type :word},
 {:value "15" :type :numeric},
 {:value "$" :type :char},
 {:value "." :type :char}

Now I may wish to replace the "27", ".", "03", ".", "2008" token seq with one single token
{:value #<java.util.Date 2008-03-12] :type :date}.
And the "15", "$" should become a token {:value 15 :type :amount :currency :dollar}.

I would like to describe a grammar that describes how dates look. A grammar how amounts of money look, etc.
The input wouldn’t be Strings, but sequences of my (defrecord Token […]).

Can amotoen currently do that? Would it be difficult to extend it?


richard-lyman commented Nov 2, 2016

Complete transition to a new implementation. The new implementation can support the idea requested here and more.

