New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support single word rule composition #110
Comments
Yes, this is a somewhat annoying thing about hacking on duckling. |
I've been thinking about how we could approach this issue. My idea is to add a rule to each language that splits single words so we can reuse the rules that already exist. Assume that we have What I would like to do is add a For example, rule The ruleSingleWordDecomposition :: Rule
ruleSingleWordDecomposition = Rule
{ name = "single-word decomposition"
, pattern =
[ regex "(\\S+)" -- we match 'single' words
]
, prod = \tokens -> case tokens of
(Token RegexMatch (GroupMatch (w:_)):_) -> do
let rgx = "(hello|world)" :: String -- we need to compute in advance a big regex of possible "subwords"
let ws = Text.pack <$> getAllTextMatches (show w =~ rgx)
Just $ Token RegexMatch $ GroupMatch ws -- How do we return the "subwords" so other rules can pick them up?
_ -> Nothing
} Note: we need to split the "single word" in Haskell ( In this case, our The main issue that I found with this is how to use the output of this rule as input for the remaining rules (they don't pick it up). So, even if we're able to build the required regex and split the "single word" into "subwords" I can't seem to redirect them to other rules. I'm trying to avoid adding special cases or hacking into the internals of Duckling ( |
One quick note, the mentioned examples in the first comment aren't permalinks so I can't be sure to what they refer to. I understand that this issue is related to: |
@chessai Do you think that this could work? |
Regarding @emlautarom1's interesting suggestion, I see two problems that should be considered before going this route. First, decomposing German number words like "dreiundzwanzig" 'three.and.twenty' (23) can lead to unnwanted ambiguities because "drei und zwanzig" 'three and twenty' is also a grammatical phrase in German. Consider, for instance, the sentence "sie wählten zwischen dreiundzwanzig Kandidaten" 'they made a selection between twenty three candidates'. The problem is that the sentence "sie wählten zwischen drei und zwanzig Kandidaten" 'they made a selection between three candidates and twenty candidates'. Hence, the suggested decomposition makes the two sentences indistinguishable. Similarly, if So the upshot is that we lose information by applying |
Today Duckling doesn't allow to create multiple patterns to match against a single word.
As a result, we can't compose rules/dimensions and have to duplicate the regexes.
This is the case for DE (e.g. here and here), NL (e.g. here), EL (e.g. here), and RU (e.g. here).
The text was updated successfully, but these errors were encountered: