Skip to content

Commit

Permalink
st/nine: Clamp RCP when 0*inf!=0
Browse files Browse the repository at this point in the history
Tests done on several devices of all 3 vendors and
of different generations showed that there are several
ways of handling infs and NaN for d3d9.

Tests showed Intel on windows does always clamp
RCP, RSQ and LOG (thus preventing inf/nan generation),
for all shader versions (some vendor behaviours vary
with shader versions).
Doing this in nine avoids 0*inf issues for drivers
that can't generate 0*inf=0 (which is controled by
TGSI's MUL_ZERO_WINS).

For now clamp for all drivers. An ulterior optimization
would be to avoid clamping for drivers with MUL_ZERO_WINS
for the specific shader versions where NV or AMD don't
clamp.

LOG and RSQ being already clamped, this patch only
clamps RCP.

Fixes: #316

Signed-off-by: Axel Davy <davyaxel0@gmail.com>
CC: <mesa-stable@lists.freedesktop.org>
  • Loading branch information
axeldavy committed Sep 9, 2018
1 parent 882ed53 commit 478545e
Showing 1 changed file with 13 additions and 1 deletion.
14 changes: 13 additions & 1 deletion src/gallium/state_trackers/nine/nine_shader.c
Original file line number Diff line number Diff line change
Expand Up @@ -2273,6 +2273,18 @@ DECL_SPECIAL(POW)
return D3D_OK;
}

DECL_SPECIAL(RCP)
{
struct ureg_program *ureg = tx->ureg;
struct ureg_dst dst = tx_dst_param(tx, &tx->insn.dst[0]);
struct ureg_src src = tx_src_param(tx, &tx->insn.src[0]);
struct ureg_dst tmp = tx_scratch(tx);
ureg_RCP(ureg, tmp, src);
ureg_MIN(ureg, tmp, ureg_imm1f(ureg, FLT_MAX), ureg_src(tmp));
ureg_MAX(ureg, dst, ureg_imm1f(ureg, -FLT_MAX), ureg_src(tmp));
return D3D_OK;
}

DECL_SPECIAL(RSQ)
{
struct ureg_program *ureg = tx->ureg;
Expand Down Expand Up @@ -2909,7 +2921,7 @@ static const struct sm1_op_info inst_table[] =
_OPI(SUB, NOP, V(0,0), V(3,0), V(0,0), V(3,0), 1, 2, SPECIAL(SUB)), /* 3 */
_OPI(MAD, MAD, V(0,0), V(3,0), V(0,0), V(3,0), 1, 3, NULL), /* 4 */
_OPI(MUL, MUL, V(0,0), V(3,0), V(0,0), V(3,0), 1, 2, NULL), /* 5 */
_OPI(RCP, RCP, V(0,0), V(3,0), V(0,0), V(3,0), 1, 1, NULL), /* 6 */
_OPI(RCP, RCP, V(0,0), V(3,0), V(0,0), V(3,0), 1, 1, SPECIAL(RCP)), /* 6 */
_OPI(RSQ, RSQ, V(0,0), V(3,0), V(0,0), V(3,0), 1, 1, SPECIAL(RSQ)), /* 7 */
_OPI(DP3, DP3, V(0,0), V(3,0), V(0,0), V(3,0), 1, 2, NULL), /* 8 */
_OPI(DP4, DP4, V(0,0), V(3,0), V(0,0), V(3,0), 1, 2, NULL), /* 9 */
Expand Down

0 comments on commit 478545e

Please sign in to comment.