New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] Classificaion report dictionary output #11160

Merged
merged 12 commits into from May 30, 2018

Conversation

Projects
None yet
6 participants
@danielbarkhorn
Contributor

danielbarkhorn commented May 29, 2018

Reference Issues/PRs

Fixes #7845 . Closes #8240

What does this implement/fix? Explain your changes.

Added ability for classification_report function to return statistics as dictionary output

Any other comments?

This replaces outdated PR #8240 (#8240)

@qinhanmin2014

LGTM

@qinhanmin2014

This comment has been minimized.

Member

qinhanmin2014 commented May 30, 2018

Please add an entry to the change log at doc/whats_new/v*.rst. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:.

danielbarkhorn and others added some commits May 30, 2018

@qinhanmin2014

This comment has been minimized.

Member

qinhanmin2014 commented May 30, 2018

@danielbarkhorn I've resolved the conflicts for you. Will merge when green.

@qinhanmin2014 qinhanmin2014 changed the title from [MRG] Classificaion report dictionary output to [MRG+2] Classificaion report dictionary output May 30, 2018

@qinhanmin2014 qinhanmin2014 merged commit f34982e into scikit-learn:master May 30, 2018

0 of 5 checks passed

ci/circleci: python2 CircleCI is running your tests
Details
ci/circleci: python3 CircleCI is running your tests
Details
continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
lgtm analysis: Python Running analyses for revisions
Details
@danielbarkhorn

This comment has been minimized.

Contributor

danielbarkhorn commented May 30, 2018

@qinhanmin2014 Great, thanks for that!

@ymoisan

This comment has been minimized.

ymoisan commented Jul 20, 2018

I know I'm late in the game, but is there really a point in keeping statistical items as np.float64 ? The reason I'm asking is that 1) the final output is rounded off to 2 digit precision anyways and 2) there are problems serializing 64 bit data as JSON on MS Windows.

@amueller

This comment has been minimized.

Member

amueller commented Jul 20, 2018

@ymoisan well returning the data means that the output is not rounded, right?
And that seems a bug in the serialization. Is that still the case? that seems odd...

@jnothman

This comment has been minimized.

Member

jnothman commented Jul 22, 2018

@qinhanmin2014

This comment has been minimized.

Member

qinhanmin2014 commented Jul 22, 2018

I'm confused by the discussions above:

the final output is rounded off to 2 digit precision anyways

It depends on the digits parameter. Currently, when output_dict = True, we actually ignore digits parameter and return the original value (i.e., even with digits=2, you'll get something like 0.9494949494949495). I think it's acceptable when I review because we've pointed out that parameter digits is for formatting. You can open an issue to argue about it. Sorry for not mentioning it in my review.

there are problems serializing 64 bit data as JSON on MS Windows.

It seems that the post is about int, not float?

Overall, I cannot understand why should we use python floats instead of numpy floats (with the two points you provide).

@jnothman

This comment has been minimized.

Member

jnothman commented Jul 22, 2018

I hope this clarifies that the behaviour we currently have is a bug, due to zipping over iter(ndarray):

>>> import json
>>> json.dumps(dict(zip('abc', np.arange(3).tolist())))
'{"b": 1, "a": 0, "c": 2}'
>>> json.dumps(dict(zip('abc', np.arange(3))))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/joel/anaconda3/envs/scipy3k/lib/python3.5/json/__init__.py", line 230, in dumps
    return _default_encoder.encode(obj)
  File "/Users/joel/anaconda3/envs/scipy3k/lib/python3.5/json/encoder.py", line 198, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/Users/joel/anaconda3/envs/scipy3k/lib/python3.5/json/encoder.py", line 256, in iterencode
    return _iterencode(o, 0)
  File "/Users/joel/anaconda3/envs/scipy3k/lib/python3.5/json/encoder.py", line 179, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: 1 is not JSON serializable
>>> x = next(iter(np.arange(3)))
>>> type(x)
<class 'numpy.int64'>
@qinhanmin2014

This comment has been minimized.

Member

qinhanmin2014 commented Jul 22, 2018

I hope this clarifies that the behaviour we currently have is a bug, due to zipping over iter(ndarray):

@jnothman But we get np.float here, not np.int?

@jnothman

This comment has been minimized.

Member

jnothman commented Jul 22, 2018

@qinhanmin2014

This comment has been minimized.

Member

qinhanmin2014 commented Jul 22, 2018

@jnothman Oops, sorry, I suddenly realize that we still get np.int here (in 'support'). I'm fine with your solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment