Custom Slot "Rest of the Sentence" #845

ea167 · 2019-08-27T11:26:40Z

Hello,

First of all, congrats for this great NLU engine!

I was wondering if there is a way to configure a custom slot so that it captures everything that is unmatched at the end of a sentence.

Thank you!

adrienball · 2019-08-27T14:03:56Z

Hello @ea167 ,
There is no particular option to do that, but if your intent is structured enough, this should work automatically, provided you have marked the entity as automatically_extensible.

For instance, let's say you want to create a simple Reminder intent with a single action slot that can take any value:

---
type: intent
name: Reminder
utterances:
- Remind me to [action](go to the gym)
- remind me to [action](take an umbrella)
- remind me to [action](walk the dog tonight)
- could you remind me to [action](call my mom)
- please remind me to [action](go to my appointment)

---
type: entity
name: action
automatically_extensible: yes

Because the utterances are well structured around the "remind me to" chunk, the NLU will be able to generalize very well to any value.

This is an extreme case where the pattern of the intent is strong. In general, for a fix dataset size, the more structured the intent is, and the better the parsing accuracy will be. In order to reach good parsing accuracy for intents without a strong structure, you will have to provide more utterances.

I hope this helps.

ea167 · 2019-08-28T15:18:12Z

Thank you very much Adrien @adrienball! I'll try it right now.

Would a configuration with matching_strictness: 0.0
help, or won't it change much?

adrienball · 2019-08-28T15:22:46Z

You don't need to change the default matching_strictness if your entity is configured with automatically_extensible: yes.

ea167 · 2019-08-29T09:51:09Z

Thank you Adrien!
It works pretty well, congrats!

One surprising element in this particular case is the need to provide a value, otherwise we get the exception: snips_nlu.exceptions.DatasetFormatError: At least one entity value must be provided for entity 'AnythingSlot'

The "Reminder" is a great use case, another would be a search (to retrieve an element about XXX, etc.)

An interesting point that I found out during my tests: commands are robust to a lot of variations in English, except the -ing for verbs. When you input "searching..." instead of "search...", the intent is not recognized.
Hope it helps!

Thanks again for this great NLU engine!

adrienball · 2019-08-29T12:52:18Z

Thanks for the feedback :)
Regarding your experience with "-ing" variations not working, it can be addressed by modifying the default configuration used by the NLU engine.
The default configuration used in English can be found here: https://github.com/snipsco/snips-nlu/blob/master/snips_nlu/default_configs/config_en.py
As you can see, it's pretty big and a bit opaque, but there is "use_stemming" field in "tfidf_vectorizer_config" which is set to False:

{
    "unit_name": "featurizer",
    "pvalue_threshold": 0.4,
    "added_cooccurrence_feature_ratio": 0.0,
    "tfidf_vectorizer_config": {
        "unit_name": "tfidf_vectorizer",
        "use_stemming": False,  # Allow to enable or disable stemming for intent classification
        "word_clusters_name": None
    },
    "cooccurrence_vectorizer_config": {
        "unit_name": "cooccurrence_vectorizer",
        "window_size": None,
        "filter_stop_words": True,
        "unknown_words_replacement_string": None,
        "keep_order": True
    }
}

With this configuration, words are not stemmed in the intent classification step. We made this choice for the default config, as English has typically few variations compared to other languages and we found that overall disabling stemming worked better.

If you find that stemming is relevant for your use case, here's a simple way to provide a custom configuration with stemming enabled:

from copy import deepcopy

from snips_nlu import SnipsNLUEngine
from snips_nlu.default_configs.config_en import CONFIG as CONFIG_EN

config_with_stemming = deepcopy(CONFIG_EN)  # Make a deepcopy to avoid changing the default config globally
config_with_stemming["intent_parsers_configs"][1]["intent_classifier_config"][
    "featurizer_config"]["tfidf_vectorizer_config"] = True

nlu_engine = SnipsNLUEngine(config=config_with_stemming)

You can also create your own NLUEngineConfig from scratch by checking the documentation here.
Cheers

ea167 · 2019-08-29T13:05:47Z

Wonderful, thank you Adrien!

adrienball added the question label Aug 27, 2019

adrienball self-assigned this Aug 27, 2019

adrienball closed this as completed Sep 3, 2019

calypset mentioned this issue Aug 19, 2020

Wildcard entity #885

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom Slot "Rest of the Sentence" #845

Custom Slot "Rest of the Sentence" #845

ea167 commented Aug 27, 2019

adrienball commented Aug 27, 2019

ea167 commented Aug 28, 2019

adrienball commented Aug 28, 2019

ea167 commented Aug 29, 2019

adrienball commented Aug 29, 2019

ea167 commented Aug 29, 2019

Custom Slot "Rest of the Sentence" #845

Custom Slot "Rest of the Sentence" #845

Comments

ea167 commented Aug 27, 2019

adrienball commented Aug 27, 2019

ea167 commented Aug 28, 2019

adrienball commented Aug 28, 2019

ea167 commented Aug 29, 2019

adrienball commented Aug 29, 2019

ea167 commented Aug 29, 2019