Skip to content

ENH: Medcouple in O(N Log N) time #9570

@hmustafamail

Description

@hmustafamail

Problem description

The current Statsmodels implementation of medcouple is in O(N^2) time, leading to excessive runtimes and memory issues

Proposed remedy

  • I would like to see a revised version of Guy Brys's R code included in Statsmodels
  • The implementation is available on my Github (link to repo)
  • He has granted permission for this in correspondence
  • Details follow

Historical context

  • Guy Brys authored an R package for efficient medcouple, c. 2004 (link)
  • Jordi Gutiérrez Hermoso used that as a reference for a Python 2 implementation, c. 2015 (link)
  • There was a conversation about whether to include it in the Python Statsmodels project (link)
  • There were concerns due to the original reference implementation being licensed under GNU-GPL
  • However, as mentioned in that thread, such code may be relicensed with author permission

What I did

  • Reached out to Guy on LinkedIn (link to profile) to ask for permission
  • He granted permission
  • Revised Jordi's code for Python 3
  • Validated my revised code against the (quadratic) statsmodels implementation
    • Used data from Jordi's repo
  • RMSE was 1.03e-4
    • Much smaller than statistic's scale of [-1 to 1]
    • Consistent with implementation-level differences
  • Posted the revised code on Github (link to repo)

Please let me know what else may be needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions