Custom Rule (ex Lemmatization) #43

Closed
dbose opened this Issue Nov 9, 2013 · 4 comments

Comments

Projects
None yet
2 participants
@dbose

dbose commented Nov 9, 2013

First of all, awesome work !! I think citrus is very close to pyparsing.

Any idea how I can implement a custom parsing Rule, let's say for lemmatization?

-Cheers
Deb

@mjackson

This comment has been minimized.

Show comment Hide comment
@mjackson

mjackson Dec 9, 2013

Owner

@dbose Thanks! I've never personally done any work with lemmatization. Do you think PEGs would be a good fit for it?

P.S. Closing this since it isn't really an issue.

Owner

mjackson commented Dec 9, 2013

@dbose Thanks! I've never personally done any work with lemmatization. Do you think PEGs would be a good fit for it?

P.S. Closing this since it isn't really an issue.

@mjackson mjackson closed this Dec 9, 2013

@dbose

This comment has been minimized.

Show comment Hide comment
@dbose

dbose Dec 9, 2013

Definitely, PEGs are not good for such NLP processes.

I think I phrased it incorrectly. What I meant - Is there a way to match only for lemmatized word. I handled it in following way (it's an hack, only taking care of adjective forms) -

rule pre_modifier_token
modifier ('d' | 'ed' | 'ped')*
end

For example in pyparsing, I can hook in custom function into the PEG (https://github.com/JoshRosen/cmps140_creative_cooking_assistant/blob/master/nlu/ingredient_line_grammar.py; LemmatizedWord is a custom function)

Another way would be to build Lemmatization and other IE capabilities on top of the PEG. But it would have been excellent to hook custom functions into the stream.

dbose commented Dec 9, 2013

Definitely, PEGs are not good for such NLP processes.

I think I phrased it incorrectly. What I meant - Is there a way to match only for lemmatized word. I handled it in following way (it's an hack, only taking care of adjective forms) -

rule pre_modifier_token
modifier ('d' | 'ed' | 'ped')*
end

For example in pyparsing, I can hook in custom function into the PEG (https://github.com/JoshRosen/cmps140_creative_cooking_assistant/blob/master/nlu/ingredient_line_grammar.py; LemmatizedWord is a custom function)

Another way would be to build Lemmatization and other IE capabilities on top of the PEG. But it would have been excellent to hook custom functions into the stream.

@dbose

This comment has been minimized.

Show comment Hide comment
@dbose

dbose Dec 9, 2013

By the way, I'm using citrus in extracting data out of recipes and it's looking great so far. As the domain vocabulary of cooking is rather limited, a ML-based extractor would have been overkill. Thanks again for your work.

I would love to contribute on this (bringing it closer to pyparsing et. al.), and raise a pull-request with my thoughts on what I meant by custom functions.

Cheers
Deb

dbose commented Dec 9, 2013

By the way, I'm using citrus in extracting data out of recipes and it's looking great so far. As the domain vocabulary of cooking is rather limited, a ML-based extractor would have been overkill. Thanks again for your work.

I would love to contribute on this (bringing it closer to pyparsing et. al.), and raise a pull-request with my thoughts on what I meant by custom functions.

Cheers
Deb

@mjackson

This comment has been minimized.

Show comment Hide comment
@mjackson

mjackson Dec 9, 2013

Owner

Ah, thanks for the explanation.

I think you'll probably want to look into subclassing Citrus::Nonterminal to achieve what you're describing. A non-terminal is able to do custom logic that describes matching behavior of other rules. In your case, it sounds like you could possibly create a non-terminal that looks for lemmatization and matches (or doesn't) based on that.

In any case, I'd definitely be interested in seeing a PR that implements this.

Owner

mjackson commented Dec 9, 2013

Ah, thanks for the explanation.

I think you'll probably want to look into subclassing Citrus::Nonterminal to achieve what you're describing. A non-terminal is able to do custom logic that describes matching behavior of other rules. In your case, it sounds like you could possibly create a non-terminal that looks for lemmatization and matches (or doesn't) based on that.

In any case, I'd definitely be interested in seeing a PR that implements this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment