Parsing non-strings? #3

Closed
ath opened this Issue Sep 25, 2012 · 1 comment

Comments

Projects
None yet
2 participants

ath commented Sep 25, 2012

Is it possible to use amotoen to parse non-strings?
For example, I may want to have a tokenizer, which splits an input text (natural language) into a seq of tokens (record instances). Now I would like to recognize patterns in those token seqs.
Example:
"Date is 27.03.2008. It will cost 15$."
This gets tokenized to

[
 {:value "Date" :type :word},
 {:value "is" :type :word},
 {:value "27" :type :numeric},
 {:value "." :type :char},
 {:value "03" :type :numeric},
 {:value "." :type :char},
 {:value "2008" :type :numeric},
 {:value "." :type :char},
 {:value "It" :type :word},
 {:value "will" :type :word},
 {:value "cost" :type :word},
 {:value "15" :type :numeric},
 {:value "$" :type :char},
 {:value "." :type :char}
]

Now I may wish to replace the "27", ".", "03", ".", "2008" token seq with one single token
{:value #<java.util.Date 2008-03-12] :type :date}.
And the "15", "$" should become a token {:value 15 :type :amount :currency :dollar}.

I would like to describe a grammar that describes how dates look. A grammar how amounts of money look, etc.
The input wouldn’t be Strings, but sequences of my (defrecord Token […]).

Can amotoen currently do that? Would it be difficult to extend it?

Owner

richard-lyman commented Nov 2, 2016

Complete transition to a new implementation. The new implementation can support the idea requested here and more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment