-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom Slot "Rest of the Sentence" #845
Comments
Hello @ea167 , For instance, let's say you want to create a simple ---
type: intent
name: Reminder
utterances:
- Remind me to [action](go to the gym)
- remind me to [action](take an umbrella)
- remind me to [action](walk the dog tonight)
- could you remind me to [action](call my mom)
- please remind me to [action](go to my appointment)
---
type: entity
name: action
automatically_extensible: yes Because the utterances are well structured around the " This is an extreme case where the pattern of the intent is strong. In general, for a fix dataset size, the more structured the intent is, and the better the parsing accuracy will be. In order to reach good parsing accuracy for intents without a strong structure, you will have to provide more utterances. I hope this helps. |
Thank you very much Adrien @adrienball! I'll try it right now. Would a configuration with |
You don't need to change the default |
Thank you Adrien! One surprising element in this particular case is the need to provide a value, otherwise we get the exception: The "Reminder" is a great use case, another would be a search (to retrieve an element about XXX, etc.) An interesting point that I found out during my tests: commands are robust to a lot of variations in English, except the -ing for verbs. When you input "searching..." instead of "search...", the intent is not recognized. Thanks again for this great NLU engine! |
Thanks for the feedback :) {
"unit_name": "featurizer",
"pvalue_threshold": 0.4,
"added_cooccurrence_feature_ratio": 0.0,
"tfidf_vectorizer_config": {
"unit_name": "tfidf_vectorizer",
"use_stemming": False, # Allow to enable or disable stemming for intent classification
"word_clusters_name": None
},
"cooccurrence_vectorizer_config": {
"unit_name": "cooccurrence_vectorizer",
"window_size": None,
"filter_stop_words": True,
"unknown_words_replacement_string": None,
"keep_order": True
}
} With this configuration, words are not stemmed in the intent classification step. We made this choice for the default config, as English has typically few variations compared to other languages and we found that overall disabling stemming worked better. If you find that stemming is relevant for your use case, here's a simple way to provide a custom configuration with stemming enabled: from copy import deepcopy
from snips_nlu import SnipsNLUEngine
from snips_nlu.default_configs.config_en import CONFIG as CONFIG_EN
config_with_stemming = deepcopy(CONFIG_EN) # Make a deepcopy to avoid changing the default config globally
config_with_stemming["intent_parsers_configs"][1]["intent_classifier_config"][
"featurizer_config"]["tfidf_vectorizer_config"] = True
nlu_engine = SnipsNLUEngine(config=config_with_stemming) You can also create your own |
Wonderful, thank you Adrien! |
Hello,
First of all, congrats for this great NLU engine!
I was wondering if there is a way to configure a custom slot so that it captures everything that is unmatched at the end of a sentence.
Thank you!
The text was updated successfully, but these errors were encountered: