Skip to content

Fix format of values in confusion matrix plot. #16159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 107 commits into from
Mar 1, 2020
Merged

Fix format of values in confusion matrix plot. #16159

merged 107 commits into from
Mar 1, 2020

Conversation

Rick-Mackenbach
Copy link
Contributor

Reference Issues/PRs

In reference to this, #16127

What does this implement/fix? Explain your changes.

The formatting of values in the plot of the confusion matrix were set to '.2g' if no other format was specified. As of now, if no format is provided by the user, the format will be 'd' if the value is <1e7, and '.2g' otherwise.

@Rick-Mackenbach
Copy link
Contributor Author

@jnothman Hey Joel, I made the fix name a bit clearer now.

Could you help me understand why the codecov/patch and codecov/project tests are not passing?

@Rick-Mackenbach
Copy link
Contributor Author

Rick-Mackenbach commented Jan 21, 2020

Well that certainly was a journey, but it seems to be working now.

@NicolasHug What do you think?

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Rick-Mackenbach , made a few comments

Just FYI, you don't need to close and open new PR, you can simply push changes to your already-existing PR. It makes it easier for us to keep track of it ;)

ha="center", va="center",
color=color)

if (values_format is None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No parenthesis

Comment on lines 105 to 110
color = cmap_max if cm[i, j] < thresh else cmap_min

self.text_[i, j] = ax.text(
j, i, format(cm[i, j], values_format),
ha="center", va="center",
color=color)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines are duplicated.

Let's put the if value_format is None condition inside of the for loop to avoid duplication.

Comment on lines 86 to 87
if (values_format is None and cm.dtype.kind == "f"):
values_format = '.2g'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge this condition with the rest below? It's better to have all related logic at the same place

text_text = np.array([
t.get_text() for t in disp.text_.ravel()])
assert_array_equal(expected_text, text_text)
if values_format is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a big if/else statement like this kinda defeats the purpose of parametrizing the test.

Maybe keep this test unchanged and create a new one test_confusion_matrix_text_format_default which would only check the default behavior.

@jnothman
Copy link
Member

I don't think that merge worked well... The diff now includes all changes.

You should have been able to do something like:

git checkout master
git pull https://github.com/scikit-learn/scikit-learn master
git checkout my_branch
git merge master
git push

not sure that'll work now. You might want to reset back to your last good commit using git reset --hard. But it seems your commits and the commits in master are now interleaved...

It might be better to lose your history by doing something like:

git checkout confusion_fix
git reset --hard master
git checkout master sklearn/metrics/_plot/confusion_matrix.py sklearn/metrics/_plot/tests/test_plot_confusion_matrix.py  # are these the only files you intended to modify in this PR??
git status
git commit

@Rick-Mackenbach
Copy link
Contributor Author

Thanks @jnothman for the fix!

@jnothman
Copy link
Member

jnothman commented Feb 12, 2020 via email

@Rick-Mackenbach
Copy link
Contributor Author

I hope it kept all your changes!

Seems like it did from what I can see :)

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks almost done!

color=color)

fmt = values_format
if values_format is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think trying .2g and then trying d if applicable, is slightly clearer

This suggestion is to address #16159 (comment)

if values_format is None:
    text_cm = format(cm[i, j], '.2g')
    if cm.dtype.kind != 'f':
        text_d = format(cm[i, j], 'd')
        if len(text_d) < len(text_cm):
            text_cm = text_d
else:
    text_cm = format(cm[i, j], values_format)

The test would need to updated to reflect this behavior.

@Rick-Mackenbach
Copy link
Contributor Author

@thomasjpfan Not sure if this is what you meant?

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your continued work and patience on this PR.

format(cm[i, j], 'd')):
fmt = '.2g'

self.text_[i, j] = ax.text(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I was not more clear. The current PR may call format three times. I am trying to push for calling it at most twice:

if values_format is None:
    text_cm = format(cm[i, j], '.2g')
    if cm.dtype.kind != 'f':
        text_d = format(cm[i, j], 'd')
        if len(text_d) < len(text_cm):
            text_cm = text_d
else:
    text_cm = format(cm[i, j], values_format)

self.text_[i, j] = ax.text(j, i, text_cm, ...)

@Rick-Mackenbach
Copy link
Contributor Author

@thomasjpfan My pleasure, thanks for you continuous feedback on this :).

I wasn't looking correctly regarding the correction, I see now that the formatting is done in the if-statement, and you just pass this formatted text to self.text_

Should be correct now 👍

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thomasjpfan
Copy link
Member

Please add an entry to the change log at doc/whats_new/v0.23.rst with a API tag. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

@Rick-Mackenbach
Copy link
Contributor Author

@thomasjpfan Not sure why the tests have failed here, could you help me out a bit? :)

@thomasjpfan
Copy link
Member

I pushed a change to fix up the PR. It should be ready now.

@jnothman
Copy link
Member

jnothman commented Mar 1, 2020

Thanks @Rick-Mackenbach!

@jnothman jnothman merged commit 94f877b into scikit-learn:master Mar 1, 2020
@Rick-Mackenbach Rick-Mackenbach deleted the confusion_fix branch March 2, 2020 10:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants