Skip to content

Commit

Permalink
[JSC] B3 can use undefined bits or not defined required bits when spi…
Browse files Browse the repository at this point in the history
…lling

https://bugs.webkit.org/show_bug.cgi?id=156486

Patch by Benjamin Poulain <bpoulain@apple.com> on 2016-04-11
Reviewed by Filip Pizlo.

Spilling had issues when replacing arguments in place.

The problems are:
1) If we have a 32bit stackslot, a x86 instruction could still try to load 64bits from it.
2) If we have a 64bit stackslot, Move32 would only set half the bits.
3) We were reducing Move to Move32 even if the top bits are read from the stack slot.

The case 1 appear with something like this:
    Move32 %tmp0, %tmp1
    Op64 %tmp1, %tmp2, %tmp3
When we spill %tmp1, the stack slot is 32bit, Move32 sets 32bits
but Op64 supports addressing for %tmp1. When we substitute %tmp1 in Op64,
we are creating a 64bit read for a 32bit stack slot.

The case 2 is an other common one. If we have:
    BB#1
        Move32 %tmp0, %tmp1
        Jump #3
    BB#2
        Op64 %tmp0, %tmp1
        Jump #3
    BB#3
        Use64 %tmp1

We have a stack slot of 64bits. When spilling %tmp1 in #1, we are
effectively doing a 32bit store on the stack slot, leaving the top bits undefined.

Case 3 is pretty much the same as 2 but we create the Move32 ourself
because the source is a 32bit with ZDef.

Case (1) is solved by requiring that the stack slot is at least as large as the largest
use/def of that tmp.

Case (2) and (3) are solved by not replacing a Tmp by an Address if the Def
is smaller than the stack slot.

* b3/air/AirIteratedRegisterCoalescing.cpp:
* b3/testb3.cpp:
(JSC::B3::testSpillDefSmallerThanUse):
(JSC::B3::testSpillUseLargerThanDef):
(JSC::B3::run):

git-svn-id: http://svn.webkit.org/repository/webkit/trunk@199337 268f45cc-cd09-0410-ab3c-d52691b4dbfc
  • Loading branch information
commit-queue@webkit.org committed Apr 12, 2016
1 parent 7d9b5f7 commit 4c6df0b
Show file tree
Hide file tree
Showing 4 changed files with 175 additions and 8 deletions.
49 changes: 49 additions & 0 deletions Source/JavaScriptCore/ChangeLog
Original file line number Diff line number Diff line change
@@ -1,3 +1,52 @@
2016-04-11 Benjamin Poulain <bpoulain@apple.com>

[JSC] B3 can use undefined bits or not defined required bits when spilling
https://bugs.webkit.org/show_bug.cgi?id=156486

Reviewed by Filip Pizlo.

Spilling had issues when replacing arguments in place.

The problems are:
1) If we have a 32bit stackslot, a x86 instruction could still try to load 64bits from it.
2) If we have a 64bit stackslot, Move32 would only set half the bits.
3) We were reducing Move to Move32 even if the top bits are read from the stack slot.

The case 1 appear with something like this:
Move32 %tmp0, %tmp1
Op64 %tmp1, %tmp2, %tmp3
When we spill %tmp1, the stack slot is 32bit, Move32 sets 32bits
but Op64 supports addressing for %tmp1. When we substitute %tmp1 in Op64,
we are creating a 64bit read for a 32bit stack slot.

The case 2 is an other common one. If we have:
BB#1
Move32 %tmp0, %tmp1
Jump #3
BB#2
Op64 %tmp0, %tmp1
Jump #3
BB#3
Use64 %tmp1

We have a stack slot of 64bits. When spilling %tmp1 in #1, we are
effectively doing a 32bit store on the stack slot, leaving the top bits undefined.

Case 3 is pretty much the same as 2 but we create the Move32 ourself
because the source is a 32bit with ZDef.

Case (1) is solved by requiring that the stack slot is at least as large as the largest
use/def of that tmp.

Case (2) and (3) are solved by not replacing a Tmp by an Address if the Def
is smaller than the stack slot.

* b3/air/AirIteratedRegisterCoalescing.cpp:
* b3/testb3.cpp:
(JSC::B3::testSpillDefSmallerThanUse):
(JSC::B3::testSpillUseLargerThanDef):
(JSC::B3::run):

2016-04-11 Brian Burg <bburg@apple.com>

Web Inspector: get rid of InspectorBasicValue and InspectorString subclasses
Expand Down
31 changes: 23 additions & 8 deletions Source/JavaScriptCore/b3/air/AirIteratedRegisterCoalescing.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1419,6 +1419,11 @@ class IteratedRegisterCoalescing {
}
}

static unsigned stackSlotMinimumWidth(Arg::Width width)
{
return width <= Arg::Width32 ? 4 : 8;
}

template<Arg::Type type>
void addSpillAndFill(const ColoringAllocator<type>& allocator, HashSet<unsigned>& unspillableTmps)
{
Expand All @@ -1429,7 +1434,7 @@ class IteratedRegisterCoalescing {

// Allocate stack slot for each spilled value.
StackSlot* stackSlot = m_code.addStackSlot(
m_tmpWidth.width(tmp) <= Arg::Width32 ? 4 : 8, StackSlotKind::Spill);
stackSlotMinimumWidth(m_tmpWidth.requiredWidth(tmp)), StackSlotKind::Spill);
bool isNewTmp = stackSlots.add(tmp, stackSlot).isNewEntry;
ASSERT_UNUSED(isNewTmp, isNewTmp);
}
Expand All @@ -1447,30 +1452,39 @@ class IteratedRegisterCoalescing {
// only claim to read 32 bits from the source if only 32 bits of the destination are
// read. Note that we only apply this logic if this turns into a load or store, since
// Move is the canonical way to move data between GPRs.
bool forceMove32IfDidSpill = false;
bool canUseMove32IfDidSpill = false;
bool didSpill = false;
if (type == Arg::GP && inst.opcode == Move) {
if ((inst.args[0].isTmp() && m_tmpWidth.width(inst.args[0].tmp()) <= Arg::Width32)
|| (inst.args[1].isTmp() && m_tmpWidth.width(inst.args[1].tmp()) <= Arg::Width32))
forceMove32IfDidSpill = true;
canUseMove32IfDidSpill = true;
}

// Try to replace the register use by memory use when possible.
inst.forEachArg(
[&] (Arg& arg, Arg::Role, Arg::Type argType, Arg::Width width) {
[&] (Arg& arg, Arg::Role role, Arg::Type argType, Arg::Width width) {
if (arg.isTmp() && argType == type && !arg.isReg()) {
auto stackSlotEntry = stackSlots.find(arg.tmp());
if (stackSlotEntry != stackSlots.end()
&& inst.admitsStack(arg)) {

Arg::Width spillWidth = m_tmpWidth.requiredWidth(arg.tmp());
if (Arg::isAnyDef(role) && width < spillWidth)
return;
ASSERT(inst.opcode == Move || !(Arg::isAnyUse(role) && width > spillWidth));

if (spillWidth != Arg::Width32)
canUseMove32IfDidSpill = false;

stackSlotEntry->value->ensureSize(
forceMove32IfDidSpill ? 4 : Arg::bytes(width));
canUseMove32IfDidSpill ? 4 : Arg::bytes(width));
arg = Arg::stack(stackSlotEntry->value);
didSpill = true;
}
}
});

if (didSpill && forceMove32IfDidSpill)
if (didSpill && canUseMove32IfDidSpill)
inst.opcode = Move32;

// For every other case, add Load/Store as needed.
Expand All @@ -1488,9 +1502,9 @@ class IteratedRegisterCoalescing {
return;
}

Arg arg = Arg::stack(stackSlotEntry->value);
Arg::Width spillWidth = m_tmpWidth.requiredWidth(tmp);
Opcode move = Oops;
switch (stackSlotEntry->value->byteSize()) {
switch (stackSlotMinimumWidth(spillWidth)) {
case 4:
move = type == Arg::GP ? Move32 : MoveFloat;
break;
Expand All @@ -1505,6 +1519,7 @@ class IteratedRegisterCoalescing {
tmp = m_code.newTmp(type);
unspillableTmps.add(AbsoluteTmpMapper<type>::absoluteIndex(tmp));

Arg arg = Arg::stack(stackSlotEntry->value);
if (Arg::isAnyUse(role) && role != Arg::Scratch)
insertionSet.insert(instIndex, move, inst.origin, arg, tmp);
if (Arg::isAnyDef(role))
Expand Down
9 changes: 9 additions & 0 deletions Source/JavaScriptCore/b3/air/AirTmpWidth.h
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,15 @@ class TmpWidth {
return std::min(iter->value.use, iter->value.def);
}

// Return the minimum required width for all defs/uses of this Tmp.
Arg::Width requiredWidth(Tmp tmp)
{
auto iter = m_width.find(tmp);
if (iter == m_width.end())
return Arg::minimumWidth(Arg(tmp).type());
return std::max(iter->value.use, iter->value.def);
}

// This indirectly tells you how much of the tmp's high bits are guaranteed to be zero. The number of
// high bits that are zero are:
//
Expand Down
94 changes: 94 additions & 0 deletions Source/JavaScriptCore/b3/testb3.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11408,6 +11408,97 @@ void testPatchpointDoubleRegs()
CHECK(invoke<double>(*code, 42.5) == 42.5);
}

void testSpillDefSmallerThanUse()
{
Procedure proc;
BasicBlock* root = proc.addBlock();

// Move32.
Value* arg32 = root->appendNew<Value>(
proc, Trunc, Origin(),
root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0));
Value* arg64 = root->appendNew<Value>(proc, ZExt32, Origin(), arg32);

// Make sure arg64 is on the stack.
PatchpointValue* forceSpill = root->appendNew<PatchpointValue>(proc, Int64, Origin());
RegisterSet clobberSet = RegisterSet::allGPRs();
clobberSet.exclude(RegisterSet::stackRegisters());
clobberSet.exclude(RegisterSet::reservedHardwareRegisters());
clobberSet.clear(GPRInfo::returnValueGPR); // Force the return value for aliasing below.
forceSpill->clobberLate(clobberSet);
forceSpill->setGenerator(
[&] (CCallHelpers& jit, const StackmapGenerationParams& params) {
AllowMacroScratchRegisterUsage allowScratch(jit);
jit.xor64(params[0].gpr(), params[0].gpr());
});

// On x86, Sub admit an address for any operand. If it uses the stack, the top bits must be zero.
Value* result = root->appendNew<Value>(proc, Sub, Origin(), forceSpill, arg64);
root->appendNew<ControlValue>(proc, Return, Origin(), result);

auto code = compile(proc);
CHECK(invoke<int64_t>(*code, 0xffffffff00000000) == 0);
}

void testSpillUseLargerThanDef()
{
Procedure proc;
BasicBlock* root = proc.addBlock();
BasicBlock* thenCase = proc.addBlock();
BasicBlock* elseCase = proc.addBlock();
BasicBlock* tail = proc.addBlock();

RegisterSet clobberSet = RegisterSet::allGPRs();
clobberSet.exclude(RegisterSet::stackRegisters());
clobberSet.exclude(RegisterSet::reservedHardwareRegisters());

Value* condition = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR0);
Value* argument = root->appendNew<ArgumentRegValue>(proc, Origin(), GPRInfo::argumentGPR1);
root->appendNew<ControlValue>(
proc, Branch, Origin(),
root->appendNew<Value>(
proc, Trunc, Origin(),
condition),
FrequentedBlock(thenCase), FrequentedBlock(elseCase));

Value* truncated = thenCase->appendNew<Value>(proc, ZExt32, Origin(),
thenCase->appendNew<Value>(proc, Trunc, Origin(), argument));
UpsilonValue* thenResult = thenCase->appendNew<UpsilonValue>(proc, Origin(), truncated);
thenCase->appendNew<ControlValue>(proc, Jump, Origin(), FrequentedBlock(tail));

UpsilonValue* elseResult = elseCase->appendNew<UpsilonValue>(proc, Origin(), argument);
elseCase->appendNew<ControlValue>(proc, Jump, Origin(), FrequentedBlock(tail));

for (unsigned i = 0; i < 100; ++i) {
PatchpointValue* preventTailDuplication = tail->appendNew<PatchpointValue>(proc, Void, Origin());
preventTailDuplication->clobberLate(clobberSet);
preventTailDuplication->setGenerator([] (CCallHelpers&, const StackmapGenerationParams&) { });
}

PatchpointValue* forceSpill = tail->appendNew<PatchpointValue>(proc, Void, Origin());
forceSpill->clobberLate(clobberSet);
forceSpill->setGenerator(
[&] (CCallHelpers& jit, const StackmapGenerationParams&) {
AllowMacroScratchRegisterUsage allowScratch(jit);
clobberSet.forEach([&] (Reg reg) {
jit.move(CCallHelpers::TrustedImm64(0xffffffffffffffff), reg.gpr());
});
});

Value* phi = tail->appendNew<Value>(proc, Phi, Int64, Origin());
thenResult->setPhi(phi);
elseResult->setPhi(phi);
tail->appendNew<ControlValue>(proc, Return, Origin(), phi);

auto code = compile(proc);
CHECK(invoke<uint64_t>(*code, 1, 0xffffffff00000000) == 0);
CHECK(invoke<uint64_t>(*code, 0, 0xffffffff00000000) == 0xffffffff00000000);

// A second time since the previous run is still on the stack.
CHECK(invoke<uint64_t>(*code, 1, 0xffffffff00000000) == 0);

}

// Make sure the compiler does not try to optimize anything out.
NEVER_INLINE double zero()
{
Expand Down Expand Up @@ -12784,6 +12875,9 @@ void run(const char* filter)

RUN(testPatchpointDoubleRegs());

RUN(testSpillDefSmallerThanUse());
RUN(testSpillUseLargerThanDef());

if (tasks.isEmpty())
usage();

Expand Down

0 comments on commit 4c6df0b

Please sign in to comment.