Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing non-strings? #3

Closed
ath opened this issue Sep 25, 2012 · 1 comment
Closed

Parsing non-strings? #3

ath opened this issue Sep 25, 2012 · 1 comment

Comments

@ath
Copy link

ath commented Sep 25, 2012

Is it possible to use amotoen to parse non-strings?
For example, I may want to have a tokenizer, which splits an input text (natural language) into a seq of tokens (record instances). Now I would like to recognize patterns in those token seqs.
Example:
"Date is 27.03.2008. It will cost 15$."
This gets tokenized to

[
 {:value "Date" :type :word},
 {:value "is" :type :word},
 {:value "27" :type :numeric},
 {:value "." :type :char},
 {:value "03" :type :numeric},
 {:value "." :type :char},
 {:value "2008" :type :numeric},
 {:value "." :type :char},
 {:value "It" :type :word},
 {:value "will" :type :word},
 {:value "cost" :type :word},
 {:value "15" :type :numeric},
 {:value "$" :type :char},
 {:value "." :type :char}
]

Now I may wish to replace the "27", ".", "03", ".", "2008" token seq with one single token
{:value #<java.util.Date 2008-03-12] :type :date}.
And the "15", "$" should become a token {:value 15 :type :amount :currency :dollar}.

I would like to describe a grammar that describes how dates look. A grammar how amounts of money look, etc.
The input wouldn’t be Strings, but sequences of my (defrecord Token […]).

Can amotoen currently do that? Would it be difficult to extend it?

@richard-lyman
Copy link
Owner

Complete transition to a new implementation. The new implementation can support the idea requested here and more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants