Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use response selector keys for confusion matrix labelling #7423

Merged
merged 6 commits into from Dec 15, 2020

Conversation

hotzenklotz
Copy link
Contributor

@hotzenklotz hotzenklotz commented Dec 1, 2020

Proposed changes:

  • At the moment, when running an NLU evaluation for response selectors the plotted confusion matrix with use the training data utterances as plot labels. This small PR uses the response selector keys and predicted (sub)-intents as labels instead. This make the confusion matrix a lot more readable and useful.
  • Similarly, the response_selection_report.json uses the same labels leading to better comprehension.

Examples
Before
response_selection_confmat_old

After

response_selection_confmat copy

Status (please check what you already did):

  • added some tests for the functionality
  • updated the documentation
  • updated the changelog (please check changelog for instructions)
  • reformat files using black (please check Readme for instructions)

@hotzenklotz hotzenklotz changed the base branch from master to 1.10.x December 1, 2020 10:06
@tmbo tmbo requested a review from dakshvar22 December 1, 2020 10:08
@sara-tagger
Copy link
Collaborator

Thanks for submitting a pull request 🚀 @tttthomasssss will take a look at it as soon as possible ✨

Copy link
Contributor

@dakshvar22 dakshvar22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this on 1.10.x branch 👍 I suggested a few changes and based on them there are a few more places where you will have to make changes. For example, line 412-414(return actual and predicted labels instead of actual and predicted texts), line 1488(use actual and predicted labels instead of actual and predicted texts to compute metrics), line 241(not filter response examples based on empty texts).
All of these changes are actually implemented on master already if you would like to view them for reference.

rasa/nlu/test.py Outdated
Comment on lines 1058 to 1061
if isinstance(response_prediction_full_intent, str):
response_prediction_full_intent = response_prediction_full_intent.split(
"/"
)[1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest not splitting the predicted retrieval_intent/sub_intent because you could have same sub_intent under multiple retrieval intents. So, let's compare for e.g. faq/ask_name to faq/ask_weather and not ask_name to ask_weather.

rasa/nlu/test.py Outdated
response_target = example.get("response", "")
response_key = example.get(RESPONSE_KEY_ATTRIBUTE, "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following from the above comment, this would then change to example.get_combined_intent_response_key()

rasa/nlu/test.py Outdated
@@ -62,7 +63,7 @@

ResponseSelectionEvaluationResult = namedtuple(
"ResponseSelectionEvaluationResult",
"intent_target " "response_target " "response_prediction " "message " "confidence",
"intent_target response_key response_target response_prediction_full_intent response_prediction message confidence",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like with this change we can also get rid of intent_target, response_target and response_prediction. Can you scrub those off as well? There will be some more places where you will have to make changes.

@tmbo tmbo removed the request for review from tttthomasssss December 7, 2020 09:49
@hotzenklotz
Copy link
Contributor Author

@dakshvar22 Thanks for the feedback. I applied your suggestions. Please see commit 021bcb1

@dakshvar22 dakshvar22 self-requested a review December 9, 2020 19:38
@hotzenklotz
Copy link
Contributor Author

@dakshvar22 Is anything else required for this PR?

Copy link
Contributor

@dakshvar22 dakshvar22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this and addressing all my comments. ✨

@dakshvar22 dakshvar22 merged commit a9fd74f into RasaHQ:1.10.x Dec 15, 2020
hotzenklotz added a commit to hotzenklotz/rasa that referenced this pull request Dec 16, 2020
@hotzenklotz
Copy link
Contributor Author

@dakshvar22 To my dismay I discovered that my refactoring reintroduced a bug in my original PR. Unfortunately, it was captured by any of the unit the tests. I came across it when running a new response selector evaluation today. Please see this commit for a fix: hotzenklotz@fd037be

In short, the default value for a response selector key was wrong. It needs to default to None for all "regular" intent examples or otherwise those won't be filtered out further down the pipeline and crash the sklearn reports.

cc @tmbo

@hotzenklotz hotzenklotz mentioned this pull request Dec 17, 2020
4 tasks
dakshvar22 added a commit that referenced this pull request Dec 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants