Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug/median #624

Merged
merged 5 commits into from Jul 10, 2020
Merged

Bug/median #624

merged 5 commits into from Jul 10, 2020

Conversation

ClaudiaComito
Copy link
Contributor

@ClaudiaComito ClaudiaComito commented Jul 10, 2020

Description

ht.median(a, axis) (aka ht.percentile(a, q=50, axis)) was returning erroneous results, or sometimes an empty DNDarray, when a.split == axis and the number of mpi processes was larger than 2.

Problems:

  • line 1437, perc_chunk was being calculated based on the wrong rank value, which led to perc_slice being off and percentile() returning wrong (or "empty") values. Fixed.
  • indices on ranks > 0 not being corrected for the presence of halo data. This affects the result when the relevant data aren't on rank 0, which is always the case for median() on more than 2 processes. Fixed at lines 1441 and ff.

Not occurring in #623 but also a problem:

  • dtype of output was promote_type(data.dtype, q.dtype), but we might have integer data and integer q (e.g. q=50) and still need a float result. Fixed at lines 1326-1336.

Issue/s resolved: #623

Changes proposed:

  • see above, plus:
  • implemented explicit test for median in test_percentile()

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Due Diligence

  • All split configurations tested
  • Multiple dtypes tested in relevant functions
  • Documentation updated (if needed)
  • Updated changelog.md under the title "Pending Additions"

Does this change modify the behaviour of other functions? If so, which?

no

@ClaudiaComito ClaudiaComito added this to In progress in Current sprint via automation Jul 10, 2020
@ClaudiaComito ClaudiaComito added this to In progress in v0.5.0 via automation Jul 10, 2020
@codecov
Copy link

codecov bot commented Jul 10, 2020

Codecov Report

Merging #624 into master will increase coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #624      +/-   ##
==========================================
+ Coverage   97.47%   97.49%   +0.02%     
==========================================
  Files          77       77              
  Lines       15862    15869       +7     
==========================================
+ Hits        15461    15472      +11     
+ Misses        401      397       -4     
Impacted Files Coverage Δ
heat/core/statistics.py 97.32% <100.00%> (+0.64%) ⬆️
heat/core/tests/test_statistics.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 89bfd29...82c7afb. Read the comment docs.

@ClaudiaComito ClaudiaComito added the bug Something isn't working label Jul 10, 2020
Current sprint automation moved this from In progress to Reviewer approved Jul 10, 2020
v0.5.0 automation moved this from In progress to Reviewer approved Jul 10, 2020
@Markus-Goetz Markus-Goetz merged commit 67308b4 into master Jul 10, 2020
Current sprint automation moved this from Reviewer approved to Done Jul 10, 2020
v0.5.0 automation moved this from Reviewer approved to Done Jul 10, 2020
@Markus-Goetz Markus-Goetz deleted the bug/median branch July 10, 2020 07:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
v0.5.0
  
Done
Development

Successfully merging this pull request may close these issues.

ht.median() fails when axis=split and n>2 processes
2 participants