Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Probabilistic metrics #232

Merged
merged 40 commits into from Sep 23, 2019
Merged

Fix Probabilistic metrics #232

merged 40 commits into from Sep 23, 2019

Conversation

aaronspring
Copy link
Collaborator

@aaronspring aaronspring commented Aug 28, 2019

Description

  • added threshold_brier_score and brier_score from xskillscore, correctly implement crps and crpss https://github.com/raybellwaves/xskillscore/pull/20
  • numba added to env
  • compute functions can now take kwargs (to provide threshold, comparison, ...) and metrics check themselves in kwargs for needed parameters
  • change x_METRICS to DETERMINISTIC_x_METRICS
  • slow tests with crpss not assuming gaussian distribution skipped for time reasons but works
  • compute_pm and compute_hindcast take arg dim to assign dimension to calculate metric over [cannot work for ACC]) https://github.com/bradyrx/climpred/issues/187

small fixes:

Fixes https://github.com/bradyrx/climpred/issues/122 https://github.com/bradyrx/climpred/issues/22 https://github.com/bradyrx/climpred/issues/218 https://github.com/bradyrx/climpred/issues/187 https://github.com/bradyrx/climpred/issues/233

Type of change

Please delete options that are not relevant.

  • Breaking change (renaming)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How Has This Been Tested?

  • pytest
  • trust properscoring

Checklist (while developing)

  • I have added docstrings to all new functions.
  • I have commented my code, particularly in hard-to-understand areas
  • Tests added for pytest, if necessary.
  • I have updated the sphinx documentation, if necessary.

Pre-Merge Checklist (final steps)

  • I have rebased onto master or develop (wherever I am merging) and dealt with any conflicts.
  • I have squashed commits to a reasonable amount, and force-pushed the squashed commits.
  • I have run make html on the documents to make sure example notebooks still compile.

References

Please add any references to manuscripts, textbooks, etc.

@bradyrx
Copy link
Collaborator

bradyrx commented Aug 28, 2019

Is there any reason this wasn't already in the package? Just something we missed? Anyways, thanks for adding this @aaronspring , it's an important one. It would be good to have some demos in this PR thread of it in use, since we haven't really tested the probabilistic ones too heavily in my view.

@coveralls
Copy link

coveralls commented Aug 28, 2019

Coverage Status

Coverage decreased (-0.1%) to 85.407% when pulling 7d9ecba on AS_add_brier into 8f6d435 on master.

@aaronspring
Copy link
Collaborator Author

I left it out because of this additional threshold argument needed. doppyo just calculate brier score (and the decomposition) based on True, False input. that would be another option to have this comparison outside the metric function (now its inside the threshold_brier_score function from properscoring, though they also have the plain brier_score function.)

regarding testing of the brier score: I took it from properscoring where it is tested.

@aaronspring
Copy link
Collaborator Author

wait until https://github.com/raybellwaves/xskillscore/issues are done

@aaronspring aaronspring changed the title Add Brier Fix Probabilistic metrics Sep 3, 2019
@bradyrx
Copy link
Collaborator

bradyrx commented Sep 4, 2019

Haven't looked at this yet, but make sure to update setup/requirements/environments to require xskillscore at the new version with your updates from https://github.com/raybellwaves/xskillscore/pull/20.

@aaronspring aaronspring self-assigned this Sep 8, 2019
@aaronspring aaronspring added the bug label Sep 8, 2019
@aaronspring
Copy link
Collaborator Author

I get reasonable results (reproducing Kadow et al. 2016) for crpss_es, but not for less so far.

climpred/bootstrap.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@bradyrx bradyrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some more comments

@aaronspring
Copy link
Collaborator Author

aaronspring commented Sep 22, 2019

added some more comments

I always forget to check the Changed files tab. now resolved all. please go over them and unresolve in case you think a sections needs a modification

This PR is now ready. the only metric I dont fully understand the results of is LESS. should I add a note of caution there? I am quite confident for the rest.

climpred/bootstrap.py Outdated Show resolved Hide resolved
climpred/bootstrap.py Outdated Show resolved Hide resolved
climpred/comparisons.py Outdated Show resolved Hide resolved
climpred/metrics.py Outdated Show resolved Hide resolved
climpred/metrics.py Outdated Show resolved Hide resolved
climpred/metrics.py Show resolved Hide resolved
climpred/prediction.py Outdated Show resolved Hide resolved
climpred/prediction.py Show resolved Hide resolved
climpred/prediction.py Show resolved Hide resolved
climpred/prediction.py Show resolved Hide resolved
@bradyrx
Copy link
Collaborator

bradyrx commented Sep 22, 2019

Okay @aaronspring, I added a few more comments. Mostly about docstrings and stuff just to polish it up. But we should be good to merge this after you finish.

This PR is now ready. the only metric I dont fully understand the results of is LESS. should I add a note of caution there? I am quite confident for the rest.

If you don't fully understand the results/are confident in them, I think you should remove LESS entirely from the package and docs. Then open an issue on implementing LESS and we can address it as its own small PR. I don't want any metrics in here that we aren't confident in, since people will be using them for their analyses

Also, can you explain once more the reason why we need or want the dim argument on our compute functions? I still think it adds confusion. Isn't this a duplication of the comparison argument?

I'll merge and update the version number after this is all done. Thanks a lot!!

Copy link
Collaborator

@bradyrx bradyrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added one small comment to testing..

@aaronspring
Copy link
Collaborator Author

aaronspring commented Sep 23, 2019

reasoning for the dim argument

The argument dim defines over which dimension a metric is computed. We can apply a metric over dims in [init, member, [member,init in PM]]. The resulting skill is then reduced by this dim specified as argument. Therefore, applying a metric over dim='member' we can receive a skill for all inits individually. This can show the initial conditions dependence of skill. Likewise when computing skill over init, we get skill for each member. this might not be useful because all members i from all inits have nothing in common.
The comparison argument is different because this just specifies how forecast and reference are defined.

The above logic applies to deterministic metrics. Probabilistic metrics need to be applied API-wise to dim member and m2r or m2m, m2c comparison.

@aaronspring
Copy link
Collaborator Author

implemented all. please use the Squash and merge button.

@bradyrx
Copy link
Collaborator

bradyrx commented Sep 23, 2019

Thanks @aaronspring so much for all the work on this!! Will merge after these tests run, then will roll out v1.1

@aaronspring aaronspring merged commit 6e33ff0 into master Sep 23, 2019
@bradyrx bradyrx deleted the AS_add_brier branch September 23, 2019 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test_perfect_model_prediction takes a while to run
4 participants