New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix cellnoise bug #912

Merged
merged 2 commits into from Sep 5, 2018

Conversation

Projects
None yet
2 participants
@lgritz
Member

lgritz commented Aug 29, 2018

WARNING: possible visible change in procedural patterns

It's expected that cellnoise(x) == cellnoise(floor(x)), but we found an
ugly bug wherein cellnoise internally used a helper function
quick_floor, which is supposed to take the floor of a float argument and
return its integer equivalent. But unfortunately it had a really odd
error: when x was an exact negative integer, it gave the value for the
next smaller integer.

quick_floor( 1.00) ==  1
quick_floor( 0.01) ==  0
quick_floor( 0.00) ==  0
quick_floor(-0.01) == -1
quick_floor(-0.99) == -1
quick_floor(-1.00) == -2    // WRONG!!! Should be -1
quick_floor(-1.01) == -2

This in turn caused cellnoise lookups of exact negative integer values
to be incorrect lookups of the next lower-numbered cell, i.e.,
cellnoise(i) having the same value as cellnoise(i-epsilon), but again,
only for i<0 and i==floor(i) exact integer values.

Now, if your shader made a visual representation of callnoise(x,y), it
would look just fine, unless a particular x,y fell exactly on integer
coordinates, in which case it would be ever so slightly off (by being
the color of the square immedately to the left instead of to the right,
as it should be, say). Nobody would ever notice that.

Also, if your shader took the floor before passing to cellnoise, like:
cellnoise(floor(x),floor(y)), it would also look correct, though the
negative values would be wrong, since cellnoise(-1,-1) would have
returned the value that should have been cellnoise(-2,-2), etc., but
you never would have noticed that.

But... there were some possible uses of cellnoise where it was behaving
very strangely, and the main place you would notice this is certain
types of "voronoi" noises where you are lookup up cellnoise values while
looping over adjacent lattice points. (Thanks, Zap, for noticing this,
and so sorry that there was a legit bug here!)

It was a very slight possible problem for perlin style noise as well,
but since those noises are "continuous" and defined by gradients, you
really, really, would not have known that the occasional pixel had noise
results that were ever so slightly off in its derivatives only at those
isolated integer lattice points. I think no visible artifacts were
discernable under real-world conditions.

The solution is to bite the bullet and replace the clever (but extremely
minor) speedup for computing floor, with a legit floor call. It's ever
so slightly more expensive if you isolate the cost of just raw cellnoise
calls, but in the context of a real render I don't think you'll be able
to measure it.

WARNING: This may change the appearance of certain "Voronoi" patterns,
or possibly other uses of cellnoise as random number sequences. It also
changes the appearance of "Gabor" noise, which internally also used the
faulty quick_floor in such a way that it affects the whole pattern.

Note that this should NOT change the appearance of any of the
perlin-style noises, and you may not notice any change for most ordinary
uses of cellnoise, either.

I changed the oslnoise unit tests to check that this is working
properly, and updated all of the noise unit tests that generate images
to "center" the patterns so that they visibly display the negative
regions of the noise domain (they had all shown only positive domain,
which is probably inviting trouble if there was ever a noise bug that
only affects what happens for negative values or right on the axes).
This caused a lot of the reference images to change, even for the tests
of noise patterns that were unaffected. Also, I took the chance to merge
a few of the signed/unsigned tests for less repetition.

I'm very sorry about this; changing the appearance of noise functions is
not something we take lightly, we do it only as a last resort. It was
clearly wrong before and had some very unexpected failure modes. But I'm
only making this fix in master, which is destined to be "OSL 2.0", so I
figure it's as good a time as any to make a change that could possibly
make visible changes in some shaders.

If you see this as some kind of major betrayal that will screw up your
productions, let me know and if pressed I can perhaps make some kind of
emulation of the old buggy behavior for people who need to preserve it.
But I'd rather not do that if it's not absolutely necessary.

@lgritz lgritz force-pushed the lgritz:lg-noisebug branch from ffd427b to 23ff5fa Aug 29, 2018

@@ -145,24 +145,13 @@ namespace {
// return the greatest integer <= x
inline OSL_HOSTDEVICE int quick_floor (float x) {

This comment has been minimized.

@fpsunflower

fpsunflower Aug 29, 2018

Collaborator

Is there any point in having this wrapper now?

@@ -145,24 +145,13 @@ namespace {
// return the greatest integer <= x
inline OSL_HOSTDEVICE int quick_floor (float x) {
return (int) x - ((x < 0) ? 1 : 0);
return (int)floorf(x);
}
#ifndef __CUDA_ARCH__
// return the greatest integer <= x, for 4 values at once
OIIO_FORCEINLINE int4 quick_floor (const float4& x) {

This comment has been minimized.

@fpsunflower

fpsunflower Aug 29, 2018

Collaborator

Same here ... why not just call ifloor directly everywhere?

This comment has been minimized.

@lgritz

lgritz Aug 30, 2018

Member

OK, updated the PR with this rename as requested. The reason I didn't include it at first was that it swamps and obscures the actual bug fix, making the review confusing. But if you like the rename, I'll merge it together.

@fpsunflower

This comment has been minimized.

Collaborator

fpsunflower commented Aug 29, 2018

Otherwise LGTM!

@lgritz lgritz force-pushed the lgritz:lg-noisebug branch from 23ff5fa to e3f6ebd Aug 29, 2018

@fpsunflower

This comment has been minimized.

Collaborator

fpsunflower commented Aug 30, 2018

LGTM!

I like this way better not just because it tidies up the redundant name, but it also makes sure you saw all the places that were actually using the buggy function.

@lgritz

This comment has been minimized.

Member

lgritz commented Aug 30, 2018

Hmmm... I guess the other potential reason to leave it as it was is if somebody chimes in later with a "holy crap, I need the old pattern", it would make it very easy to have a build-time switch that restores the old behavior. Not so easy once everything is changed to ifloor.

@lgritz lgritz force-pushed the lgritz:lg-noisebug branch from ebe1b6c to dd6d85a Sep 4, 2018

lgritz added some commits Aug 28, 2018

Fix cellnoise bug (912)
WARNING: possible visible change in procedural patterns

It's expected that cellnoise(x) == cellnoise(floor(x)), but we found an
ugly bug wherein cellnoise internally used a helper function
quick_floor, which is supposed to take the floor of a float argument and
return its integer equivalent. But unfortunately it had a really odd
error: when x was an exact negative integer, it gave the value for the
next smaller integer.

    quick_floor( 1.00) ==  1
    quick_floor( 0.01) ==  0
    quick_floor( 0.00) ==  0
    quick_floor(-0.01) == -1
    quick_floor(-0.99) == -1
    quick_floor(-1.00) == -2    // WRONG!!! Should be -1
    quick_floor(-1.01) == -2

This in turn caused cellnoise lookups of exact negative integer values
to be incorrect lookups of the next lower-numbered cell, i.e.,
cellnoise(i) having the same value as cellnoise(i-epsilon), but again,
only for i<0 and i==floor(i) exact integer values.

Now, if your shader made a visual representation of callnoise(x,y), it
would look just fine, unless a particular x,y fell *exactly* on integer
coordinates, in which case it would be ever so slightly off (by being
the color of the square immedately to the left instead of to the right,
as it should be, say).  Nobody would ever notice that.

Also, if your shader took the floor before passing to cellnoise, like:
cellnoise(floor(x),floor(y)), it would also *look* correct, though the
negative values would be wrong, since cellnoise(-1,-1) would have
returned the value that should have been cellnoise(-2,-2), etc., but
you never would have noticed that.

But... there were some possible uses of cellnoise where it was behaving
very strangely, and the main place you would notice this is certain
types of "voronoi" noises where you are lookup up cellnoise values while
looping over adjacent lattice points. (Thanks, Zap, for noticing this,
and so sorry that there was a legit bug here!)

It was a very slight possible problem for perlin style noise as well,
but since those noises are "continuous" and defined by gradients, you
really, really, would not have known that the occasional pixel had noise
results that were ever so slightly off in its derivatives only at those
isolated integer lattice points. I think no visible artifacts were
discernable under real-world conditions.

The solution is to bite the bullet and replace the clever (but extremely
minor) speedup for computing floor, with a legit floor call.  It's ever
so slightly more expensive if you isolate the cost of just raw cellnoise
calls, but in the context of a real render I don't think you'll be able
to measure it.

WARNING: This may change the appearance of certain "Voronoi" patterns,
or possibly other uses of cellnoise as random number sequences. It also
changes the appearance of "Gabor" noise, which internally also used the
faulty quick_floor in such a way that it affects the whole pattern.

Note that this should NOT change the appearance of any of the
perlin-style noises, and you may not notice any change for most ordinary
uses of cellnoise, either.

I changed the oslnoise unit tests to check that this is working
properly, and updated all of the noise unit tests that generate images
to "center" the patterns so that they visibly display the negative
regions of the noise domain (they had all shown only positive domain,
which is probably inviting trouble if there was ever a noise bug that
only affects what happens for negative values or right on the axes).
This caused a lot of the reference images to change, even for the tests
of noise patterns that were unaffected. Also, I took the chance to merge
a few of the signed/unsigned tests for less repetition.

I'm very sorry about this; changing the appearance of noise functions is
not something we take lightly, we do it only as a last resort. It was
clearly wrong before and had some very unexpected failure modes. But I'm
only making this fix in master, which is destined to be "OSL 2.0", so I
figure it's as good a time as any to make a change that could possibly
make visible changes in some shaders.

If you see this as some kind of major betrayal that will screw up your
productions, let me know and if pressed I can perhaps make some kind of
emulation of the old buggy behavior for people who need to preserve it.
But I'd rather not do that if it's not absolutely necessary.

@lgritz lgritz force-pushed the lgritz:lg-noisebug branch from dd6d85a to 5ec807c Sep 5, 2018

@lgritz lgritz merged commit 5ec807c into imageworks:master Sep 5, 2018

0 of 3 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
continuous-integration/travis-ci/push The Travis CI build is in progress
Details

@lgritz lgritz deleted the lgritz:lg-noisebug branch Sep 14, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment