Skip to content

Conversation

@justin1dennison
Copy link
Contributor

Resolves #117.

Checklist

Please ensure the following tasks are completed before submitting this pull request.

  • Read, understood, and followed the contributing guidelines, including the relevant style guides.
  • Read and understand the Code of Conduct.
  • Read and understood the licensing terms.
  • Searched for existing issues and pull requests before submitting this pull request.
  • Filed an issue (or an issue already existed) prior to submitting this pull request.
  • Rebased onto latest develop.
  • Submitted against develop branch.

Description

What is the purpose of this pull request?

This pull request:

  • Implements the one parameter Box-Cox transformation.

Related Issues

Does this pull request have any related issues?

This pull request:

Questions

Any questions for reviewers of this pull request?

No.

Other

Any other information relevant to this pull request? This may include screenshots, references, and/or implementation notes.

No.


@stdlib-js/reviewers

@kgryte kgryte changed the title Rfc 117 boxcox transform Add Box-Cox transform Oct 18, 2018
@kgryte kgryte added Feature Issue or pull request for adding a new feature. Math Issue or pull request specific to math functionality. labels Oct 18, 2018
@kgryte
Copy link
Member

kgryte commented Oct 18, 2018

@justin1dennison This is awesome! Thanks for working on this!!! Should I go ahead and review?

@justin1dennison
Copy link
Contributor Author

@kgryte A review would be awesome as I would like to work on the other related Box-Cox related packages. Thanks. Are the CI systems failing because of a malformed report or am I missing something in the logs?

Copy link
Member

@kgryte kgryte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justin1dennison Overall, great work! Most of the PR review is nitpicks and growing accustomed to project convention.

Re: CI. Both TravisCI and AppVeyor will fail atm due to failing tests on older platforms which have yet to be fixed. The main CI results to pay attention to for this PR are the results from CircleCI.

Thanks again for working on this!

ex = data[i].expected;
b = boxcox( x, y );
if ( b === ex ) {
t.strictEqual( b, ex, 'returns '+b+'when provided '+x+' and '+y+'.');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
t.strictEqual( b, ex, 'returns '+b+'when provided '+x+' and '+y+'.');
t.strictEqual( b, ex, 'returns '+b+'when provided '+x+' and '+y+'.' );

@justin1dennison
Copy link
Contributor Author

I have noticed there seems to be an issues with the tests as well. I will work on the above suggestions and resolving the testing issues as well. Thank you for the suggestions and review. Do I need to follow any procedure when the changes and fixes have been completed?

t.strictEqual( b, ex, 'returns '+b+'when provided '+x+' and '+y+'.');
} else {
delta = abs( ex - b );
tol = EPS * abs( ex );
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fixing the testing issues, I have a couple of tests that are failing because of tolerance issues. If I change to @stdlib/constants/math/float32-eps instead of @stdlib/constants/math/float64-eps, then the tests pass. What is the recommended course of action regarding these floating point comparison?

Copy link
Member

@kgryte kgryte Oct 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion here. We really need to document this, as it has been a recurring question. The rationale for L105 is that we want to get a handle on how much different our implementations are from reference implementations in terms of ULPs (or the number of trailing bits which differ in the significand). Our proxy to this (for now) is to scale double-precision floating-point epsilon. Currently, L105 suggests that the tolerance is approximately one ULP (or within floating-point rounding error). When tests fail, we scale EPS by some real-valued multiple (e.g., 2.0 * EPS * abs( ex )) until tests pass. This does not have to be exact, but you can get a rough idea of how large the tolerance needs to be by looking at the test message output which should include both delta and tol. Based on those values, we can derive an approximate "minimum" tolerance required for tests to pass.

Obviously, this is a bit ad hoc, and we should invest some time into automating this.

@kgryte
Copy link
Member

kgryte commented Oct 19, 2018

@justin1dennison Thanks for resolving the PR suggestions! In terms of process, once the PR is ready for another review, we'll do another review to ensure we didn't miss anything. If everything is good, we'll indicate our intent to merge; otherwise, we'll provide another round of feedback and suggestions.

One thing I will do is update deps on develop. When I do, I'll ask you to pull in the most recent develop changes to your branch, updating your PR. Currently, one reason CircleCI (our canary build system) is failing is due to out-of-date deps. I'll ping shortly once I've updated develop.


def main():
"""Generate fixture data."""
x = np.arange(0.25, 5.0, 0.25)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justin1dennison One additional PR point: as a general principle, we typically avoid test values which follow patterns (e.g., in this case, values which increment by a fixed 0.25 increment). Why? While not necessarily applicable here, often tolerance issues (i.e., large deviations from reference implementations) can only be discovered by testing many different underlying bit patterns. Here, the underlying bit patterns have relatively low "entropy", meaning we are repeatedly testing similar underlying bit patterns.

Accordingly, we prefer, e.g., dividing input values into regimes (e.g., very small numbers, very large numbers, medium numbers, etc) and generating 1000 (enough to reasonably generate many different bit patterns, but not so large as to overwhelm test infrastructure) random values per regime. Dividing the input values into regimes ensures that we can probe different code paths and better understand how an implementation fares as we scale input values.

In this case, if I were writing this implementation, I would define two regimes: "small" values (e.g., < 1.0e-19) and everything else. If you wanted to be pedantic, because this is a multi-parameter function, you could test various regime combinations.

We don't have to mirror the regimes as defined by an implementation exactly, but we may want to follow the spirit. Even if we were to later change the underlying implementation, testing such regimes is useful as it allows us to gauge how well the new implementation handles different input value regimes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that being an issue. Is there a place that I could take a look at a testing construct like you are discussing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justin1dennison See this file. While not using random, we try to sample different bit patterns by ensuring that increments are not "nice", which means, in this case, instead of 1000 values, generating 1007 values, thus making the increment (hopefully) an irrational number, or at least a number which does not have a "regular" underlying bit pattern.

@kgryte
Copy link
Member

kgryte commented Oct 20, 2018

@justin1dennison I updated develop, so feel free to pull in the latest changes on develop into your PR. Hopefully, this will resolve the build failure on CircleCI.

@justin1dennison justin1dennison force-pushed the rfc-117-boxcox-transform branch from f23d392 to e496e08 Compare October 20, 2018 03:46
@justin1dennison
Copy link
Contributor Author

@kgryte I think that I have resolved all of the suggestions that you made. Again, thanks for taking the time. Additionally, I changed the tests to fall in line with the linked file that you provided. I accidentally closed the pull request earlier because I fat fingered. If I need to fix anything else, please let me know.

@kgryte
Copy link
Member

kgryte commented Oct 22, 2018

@justin1dennison Awesome! Will take a final look shortly!

Copy link
Member

@kgryte kgryte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justin1dennison Great work! Thanks for working on this!

@kgryte kgryte merged commit 2370165 into stdlib-js:develop Oct 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feature Issue or pull request for adding a new feature. Math Issue or pull request specific to math functionality.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RFC: compute the Box-Cox transformation

2 participants