Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Slot "Rest of the Sentence" #845

Closed
ea167 opened this issue Aug 27, 2019 · 6 comments
Closed

Custom Slot "Rest of the Sentence" #845

ea167 opened this issue Aug 27, 2019 · 6 comments
Assignees
Labels

Comments

@ea167
Copy link

ea167 commented Aug 27, 2019

Hello,

First of all, congrats for this great NLU engine!

I was wondering if there is a way to configure a custom slot so that it captures everything that is unmatched at the end of a sentence.

Thank you!

@adrienball adrienball self-assigned this Aug 27, 2019
@adrienball
Copy link
Contributor

Hello @ea167 ,
There is no particular option to do that, but if your intent is structured enough, this should work automatically, provided you have marked the entity as automatically_extensible.

For instance, let's say you want to create a simple Reminder intent with a single action slot that can take any value:

---
type: intent
name: Reminder
utterances:
- Remind me to [action](go to the gym)
- remind me to [action](take an umbrella)
- remind me to [action](walk the dog tonight)
- could you remind me to [action](call my mom)
- please remind me to [action](go to my appointment)

---
type: entity
name: action
automatically_extensible: yes

Because the utterances are well structured around the "remind me to" chunk, the NLU will be able to generalize very well to any value.

This is an extreme case where the pattern of the intent is strong. In general, for a fix dataset size, the more structured the intent is, and the better the parsing accuracy will be. In order to reach good parsing accuracy for intents without a strong structure, you will have to provide more utterances.

I hope this helps.

@ea167
Copy link
Author

ea167 commented Aug 28, 2019

Thank you very much Adrien @adrienball! I'll try it right now.

Would a configuration with matching_strictness: 0.0
help, or won't it change much?

@adrienball
Copy link
Contributor

You don't need to change the default matching_strictness if your entity is configured with automatically_extensible: yes.

@ea167
Copy link
Author

ea167 commented Aug 29, 2019

Thank you Adrien!
It works pretty well, congrats!

One surprising element in this particular case is the need to provide a value, otherwise we get the exception: snips_nlu.exceptions.DatasetFormatError: At least one entity value must be provided for entity 'AnythingSlot'

The "Reminder" is a great use case, another would be a search (to retrieve an element about XXX, etc.)

An interesting point that I found out during my tests: commands are robust to a lot of variations in English, except the -ing for verbs. When you input "searching..." instead of "search...", the intent is not recognized.
Hope it helps!

Thanks again for this great NLU engine!

@adrienball
Copy link
Contributor

Thanks for the feedback :)
Regarding your experience with "-ing" variations not working, it can be addressed by modifying the default configuration used by the NLU engine.
The default configuration used in English can be found here: https://github.com/snipsco/snips-nlu/blob/master/snips_nlu/default_configs/config_en.py
As you can see, it's pretty big and a bit opaque, but there is "use_stemming" field in "tfidf_vectorizer_config" which is set to False:

{
    "unit_name": "featurizer",
    "pvalue_threshold": 0.4,
    "added_cooccurrence_feature_ratio": 0.0,
    "tfidf_vectorizer_config": {
        "unit_name": "tfidf_vectorizer",
        "use_stemming": False,  # Allow to enable or disable stemming for intent classification
        "word_clusters_name": None
    },
    "cooccurrence_vectorizer_config": {
        "unit_name": "cooccurrence_vectorizer",
        "window_size": None,
        "filter_stop_words": True,
        "unknown_words_replacement_string": None,
        "keep_order": True
    }
}

With this configuration, words are not stemmed in the intent classification step. We made this choice for the default config, as English has typically few variations compared to other languages and we found that overall disabling stemming worked better.

If you find that stemming is relevant for your use case, here's a simple way to provide a custom configuration with stemming enabled:

from copy import deepcopy

from snips_nlu import SnipsNLUEngine
from snips_nlu.default_configs.config_en import CONFIG as CONFIG_EN

config_with_stemming = deepcopy(CONFIG_EN)  # Make a deepcopy to avoid changing the default config globally
config_with_stemming["intent_parsers_configs"][1]["intent_classifier_config"][
    "featurizer_config"]["tfidf_vectorizer_config"] = True

nlu_engine = SnipsNLUEngine(config=config_with_stemming)

You can also create your own NLUEngineConfig from scratch by checking the documentation here.
Cheers

@ea167
Copy link
Author

ea167 commented Aug 29, 2019

Wonderful, thank you Adrien!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants