Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New bmi instructions #76

Closed
wants to merge 13 commits into from
2 changes: 2 additions & 0 deletions compiler/cmm/CmmMachOp.hs
Expand Up @@ -577,6 +577,8 @@ data CallishMachOp
| MO_Memmove Int

| MO_PopCnt Width
| MO_Pdep Width
| MO_Pext Width
| MO_Clz Width
| MO_Ctz Width

Expand Down
10 changes: 10 additions & 0 deletions compiler/cmm/CmmParse.y
Expand Up @@ -1003,6 +1003,16 @@ callishMachOps = listToUFM $
( "popcnt32", (,) $ MO_PopCnt W32 ),
( "popcnt64", (,) $ MO_PopCnt W64 ),

( "pdep8", (,) $ MO_Pdep W8 ),
( "pdep16", (,) $ MO_Pdep W16 ),
( "pdep32", (,) $ MO_Pdep W32 ),
( "pdep64", (,) $ MO_Pdep W64 ),

( "pext8", (,) $ MO_Pext W8 ),
( "pext16", (,) $ MO_Pext W16 ),
( "pext32", (,) $ MO_Pext W32 ),
( "pext64", (,) $ MO_Pext W64 ),

( "cmpxchg8", (,) $ MO_Cmpxchg W8 ),
( "cmpxchg16", (,) $ MO_Cmpxchg W16 ),
( "cmpxchg32", (,) $ MO_Cmpxchg W32 ),
Expand Down
2 changes: 2 additions & 0 deletions compiler/cmm/PprC.hs
Expand Up @@ -784,6 +784,8 @@ pprCallishMachOp_for_C mop
MO_Memmove _ -> text "memmove"
(MO_BSwap w) -> ptext (sLit $ bSwapLabel w)
(MO_PopCnt w) -> ptext (sLit $ popCntLabel w)
(MO_Pext w) -> ptext (sLit $ pextLabel w)
(MO_Pdep w) -> ptext (sLit $ pdepLabel w)
(MO_Clz w) -> ptext (sLit $ clzLabel w)
(MO_Ctz w) -> ptext (sLit $ ctzLabel w)
(MO_AtomicRMW w amop) -> ptext (sLit $ atomicRMWLabel w amop)
Expand Down
78 changes: 78 additions & 0 deletions compiler/codeGen/StgCmmPrim.hs
Expand Up @@ -580,6 +580,20 @@ emitPrimOp _ [res] PopCnt32Op [w] = emitPopCntCall res w W32
emitPrimOp _ [res] PopCnt64Op [w] = emitPopCntCall res w W64
emitPrimOp dflags [res] PopCntOp [w] = emitPopCntCall res w (wordWidth dflags)

-- Parallel bit deposit
emitPrimOp _ [res] Pdep8Op [w] = emitPdepCall res w W8
emitPrimOp _ [res] Pdep16Op [w] = emitPdepCall res w W16
emitPrimOp _ [res] Pdep32Op [w] = emitPdepCall res w W32
emitPrimOp _ [res] Pdep64Op [w] = emitPdepCall res w W64
emitPrimOp dflags [res] PdepOp [w] = emitPdepCall res w (wordWidth dflags)

-- Parallel bit extract
emitPrimOp _ [res] Pext8Op [w] = emitPextCall res w W8
emitPrimOp _ [res] Pext16Op [w] = emitPextCall res w W16
emitPrimOp _ [res] Pext32Op [w] = emitPextCall res w W32
emitPrimOp _ [res] Pext64Op [w] = emitPextCall res w W64
emitPrimOp dflags [res] PextOp [w] = emitPextCall res w (wordWidth dflags)

-- count leading zeros
emitPrimOp _ [res] Clz8Op [w] = emitClzCall res w W8
emitPrimOp _ [res] Clz16Op [w] = emitClzCall res w W16
Expand Down Expand Up @@ -861,6 +875,56 @@ callishPrimOpSupported dflags op
|| llvm -> Left MO_F64_Fabs
| otherwise -> Right $ genericFabsOp W64

Pdep8Op | (ncg && (x86ish
|| ppc))
|| llvm -> Left (MO_Pdep (wordWidth dflags))
| otherwise -> error "TODO: Implement (Right genericPdep8Op)"

Pdep16Op | (ncg && (x86ish
|| ppc))
|| llvm -> Left (MO_Pdep (wordWidth dflags))
| otherwise -> error "TODO: Implement (Right genericPdep16Op)"

Pdep32Op | (ncg && (x86ish
|| ppc))
|| llvm -> Left (MO_Pdep (wordWidth dflags))

| otherwise -> error "TODO: Implement (Right genericPdep32Op)"
Pdep64Op | (ncg && (x86ish
|| ppc))
|| llvm -> Left (MO_Pdep (wordWidth dflags))
| otherwise -> error "TODO: Implement (Right genericPdep64Op)"

PdepOp | (ncg && (x86ish
|| ppc))
|| llvm -> Left (MO_Pdep (wordWidth dflags))
| otherwise -> error "TODO: Implement (Right genericPdepOp)"

Pext8Op | (ncg && (x86ish
|| ppc))
|| llvm -> Left (MO_Pext (wordWidth dflags))
| otherwise -> error "TODO: Implement (Right genericPext8Op)"

Pext16Op | (ncg && (x86ish
|| ppc))
|| llvm -> Left (MO_Pext (wordWidth dflags))
| otherwise -> error "TODO: Implement (Right genericPext16Op)"

Pext32Op | (ncg && (x86ish
|| ppc))
|| llvm -> Left (MO_Pext (wordWidth dflags))
| otherwise -> error "TODO: Implement (Right genericPext32Op)"

Pext64Op | (ncg && (x86ish
|| ppc))
|| llvm -> Left (MO_Pext (wordWidth dflags))
| otherwise -> error "TODO: Implement (Right genericPext64Op)"

PextOp | (ncg && (x86ish
|| ppc))
|| llvm -> Left (MO_Pext (wordWidth dflags))
| otherwise -> error "TODO: Implement (Right genericPextOp)"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I proceeded to add the above, because it seemed like it would fix my compile errors. It's not clear to me this is the correct approach, but put it here to ask if it makes sense.

If it is the correct thing to do, I could start working on the error "TODO: ..." bits, otherwise I'll revert.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would comment but unfortunately I can't view the patch. Have you force pushed by any chance?

Also, we generally use panic not error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I have forced pushed. Did that break things for you? Should I avoid doing that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that GitHub's pull request mechanism doesn't allow you to view old commits after a force-push. Try clicking the "View changes" button above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I see it is empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the change can only be seen from the "Files Changed" tab.

_ -> pprPanic "emitPrimOp: can't translate PrimOp " (ppr op)
where
ncg = case hscTarget dflags of
Expand Down Expand Up @@ -2227,6 +2291,20 @@ emitPopCntCall res x width = do
(MO_PopCnt width)
[ x ]

emitPdepCall :: LocalReg -> CmmExpr -> Width -> FCode ()
emitPdepCall res x width = do
emitPrimCall
[ res ]
(MO_Pdep width)
[ x ]

emitPextCall :: LocalReg -> CmmExpr -> Width -> FCode ()
emitPextCall res x width = do
emitPrimCall
[ res ]
(MO_Pext width)
[ x ]

emitClzCall :: LocalReg -> CmmExpr -> Width -> FCode ()
emitClzCall res x width = do
emitPrimCall
Expand Down
6 changes: 6 additions & 0 deletions compiler/llvmGen/LlvmCodeGen/CodeGen.hs
Expand Up @@ -216,6 +216,10 @@ genCall t@(PrimTarget (MO_Prefetch_Data localityInt)) [] args
-- and return types
genCall t@(PrimTarget (MO_PopCnt w)) dsts args =
genCallSimpleCast w t dsts args
genCall t@(PrimTarget (MO_Pdep w)) dsts args =
genCallSimpleCast w t dsts args
genCall t@(PrimTarget (MO_Pext w)) dsts args =
genCallSimpleCast w t dsts args
genCall t@(PrimTarget (MO_Clz w)) dsts args =
genCallSimpleCast w t dsts args
genCall t@(PrimTarget (MO_Ctz w)) dsts args =
Expand Down Expand Up @@ -728,6 +732,8 @@ cmmPrimOpFunctions mop = do
MO_Memset _ -> fsLit $ "llvm.memset." ++ intrinTy2

(MO_PopCnt w) -> fsLit $ "llvm.ctpop." ++ showSDoc dflags (ppr $ widthToLlvmInt w)
(MO_Pdep w) -> fsLit $ "llvm.pdep." ++ showSDoc dflags (ppr $ widthToLlvmInt w)
(MO_Pext w) -> fsLit $ "llvm.pext." ++ showSDoc dflags (ppr $ widthToLlvmInt w)
(MO_BSwap w) -> fsLit $ "llvm.bswap." ++ showSDoc dflags (ppr $ widthToLlvmInt w)
(MO_Clz w) -> fsLit $ "llvm.ctlz." ++ showSDoc dflags (ppr $ widthToLlvmInt w)
(MO_Ctz w) -> fsLit $ "llvm.cttz." ++ showSDoc dflags (ppr $ widthToLlvmInt w)
Expand Down
26 changes: 26 additions & 0 deletions compiler/main/DynFlags.hs
Expand Up @@ -148,6 +148,8 @@ module DynFlags (
isSseEnabled,
isSse2Enabled,
isSse4_2Enabled,
isBmiEnabled,
isBmi2Enabled,
isAvxEnabled,
isAvx2Enabled,
isAvx512cdEnabled,
Expand Down Expand Up @@ -928,6 +930,7 @@ data DynFlags = DynFlags {

-- | Machine dependent flags (-m<blah> stuff)
sseVersion :: Maybe SseVersion,
bmiVersion :: Maybe BmiVersion,
avx :: Bool,
avx2 :: Bool,
avx512cd :: Bool, -- Enable AVX-512 Conflict Detection Instructions.
Expand Down Expand Up @@ -3114,6 +3117,10 @@ dynamic_flags_deps = [
d { sseVersion = Just SSE4 }))
, make_ord_flag defGhcFlag "msse4.2" (noArg (\d ->
d { sseVersion = Just SSE42 }))
, make_ord_flag defGhcFlag "mbmi" (noArg (\d ->
d { bmiVersion = Just BMI1 }))
, make_ord_flag defGhcFlag "mbmi2" (noArg (\d ->
d { bmiVersion = Just BMI2 }))
, make_ord_flag defGhcFlag "mavx" (noArg (\d -> d { avx = True }))
, make_ord_flag defGhcFlag "mavx2" (noArg (\d -> d { avx2 = True }))
, make_ord_flag defGhcFlag "mavx512cd" (noArg (\d ->
Expand Down Expand Up @@ -5346,6 +5353,25 @@ isAvx512fEnabled dflags = avx512f dflags
isAvx512pfEnabled :: DynFlags -> Bool
isAvx512pfEnabled dflags = avx512pf dflags

-- -----------------------------------------------------------------------------
-- BMI2

data BmiVersion = BMI1
| BMI2
deriving (Eq, Ord)

isBmiEnabled :: DynFlags -> Bool
isBmiEnabled dflags = case platformArch (targetPlatform dflags) of
ArchX86_64 -> bmiVersion dflags >= Just BMI1
ArchX86 -> bmiVersion dflags >= Just BMI1
_ -> False

isBmi2Enabled :: DynFlags -> Bool
isBmi2Enabled dflags = case platformArch (targetPlatform dflags) of
ArchX86_64 -> bmiVersion dflags >= Just BMI2
ArchX86 -> bmiVersion dflags >= Just BMI2
_ -> False

-- -----------------------------------------------------------------------------
-- Linker/compiler information

Expand Down
20 changes: 20 additions & 0 deletions compiler/nativeGen/CPrim.hs
Expand Up @@ -5,6 +5,8 @@ module CPrim
, atomicRMWLabel
, cmpxchgLabel
, popCntLabel
, pdepLabel
, pextLabel
, bSwapLabel
, clzLabel
, ctzLabel
Expand All @@ -24,6 +26,24 @@ popCntLabel w = "hs_popcnt" ++ pprWidth w
pprWidth W64 = "64"
pprWidth w = pprPanic "popCntLabel: Unsupported word width " (ppr w)

pdepLabel :: Width -> String
pdepLabel w = "hs_pdep" ++ pprWidth w
where
pprWidth W8 = "8"
pprWidth W16 = "16"
pprWidth W32 = "32"
pprWidth W64 = "64"
pprWidth w = pprPanic "pdepLabel: Unsupported word width " (ppr w)

pextLabel :: Width -> String
pextLabel w = "hs_pext" ++ pprWidth w
where
pprWidth W8 = "8"
pprWidth W16 = "16"
pprWidth W32 = "32"
pprWidth W64 = "64"
pprWidth w = pprPanic "pextLabel: Unsupported word width " (ppr w)

bSwapLabel :: Width -> String
bSwapLabel w = "hs_bswap" ++ pprWidth w
where
Expand Down
2 changes: 2 additions & 0 deletions compiler/nativeGen/PPC/CodeGen.hs
Expand Up @@ -1907,6 +1907,8 @@ genCCall' dflags gcp target dest_regs args

MO_BSwap w -> (fsLit $ bSwapLabel w, False)
MO_PopCnt w -> (fsLit $ popCntLabel w, False)
MO_Pdep w -> (fsLit $ pdepLabel w, False)
MO_Pext w -> (fsLit $ pextLabel w, False)
MO_Clz w -> (fsLit $ clzLabel w, False)
MO_Ctz w -> (fsLit $ ctzLabel w, False)
MO_AtomicRMW w amop -> (fsLit $ atomicRMWLabel w amop, False)
Expand Down
2 changes: 2 additions & 0 deletions compiler/nativeGen/SPARC/CodeGen.hs
Expand Up @@ -652,6 +652,8 @@ outOfLineMachOp_table mop

MO_BSwap w -> fsLit $ bSwapLabel w
MO_PopCnt w -> fsLit $ popCntLabel w
MO_Pdep w -> fsLit $ pdepLabel w
MO_Pext w -> fsLit $ pextLabel w
MO_Clz w -> fsLit $ clzLabel w
MO_Ctz w -> fsLit $ ctzLabel w
MO_AtomicRMW w amop -> fsLit $ atomicRMWLabel w amop
Expand Down
63 changes: 63 additions & 0 deletions compiler/nativeGen/X86/CodeGen.hs
Expand Up @@ -1853,6 +1853,66 @@ genCCall dflags is32Bit (PrimTarget (MO_PopCnt width)) dest_regs@[dst]
format = intFormat width
lbl = mkCmmCodeLabel primUnitId (fsLit (popCntLabel width))

genCCall dflags is32Bit (PrimTarget (MO_Pdep width)) dest_regs@[dst]
args@[src] = do
let platform = targetPlatform dflags
if isBmi2Enabled dflags
then do code_src <- getAnyReg src
src_r <- getNewRegNat format
let dst_r = getRegisterReg platform False (CmmLocal dst)
return $ code_src src_r `appOL`
(if width == W8 then
-- The PDEP instruction doesn't take a r/m8
unitOL (MOVZxL II8 (OpReg src_r) (OpReg src_r)) `appOL`
unitOL (PDEP II16 (OpReg src_r) dst_r)
else
unitOL (PDEP format (OpReg src_r) dst_r)) `appOL`
(if width == W8 || width == W16 then
-- We used a 16-bit destination register above,
-- so zero-extend
unitOL (MOVZxL II16 (OpReg dst_r) (OpReg dst_r))
else nilOL)
else do
targetExpr <- cmmMakeDynamicReference dflags
CallReference lbl
let target = ForeignTarget targetExpr (ForeignConvention CCallConv
[NoHint] [NoHint]
CmmMayReturn)
genCCall dflags is32Bit target dest_regs args
where
format = intFormat width
lbl = mkCmmCodeLabel primUnitId (fsLit (pdepLabel width))

genCCall dflags is32Bit (PrimTarget (MO_Pext width)) dest_regs@[dst]
args@[src] = do
let platform = targetPlatform dflags
if isBmi2Enabled dflags
then do code_src <- getAnyReg src
src_r <- getNewRegNat format
let dst_r = getRegisterReg platform False (CmmLocal dst)
return $ code_src src_r `appOL`
(if width == W8 then
-- The PEXT instruction doesn't take a r/m8
unitOL (MOVZxL II8 (OpReg src_r) (OpReg src_r)) `appOL`
unitOL (PEXT II16 (OpReg src_r) dst_r)
else
unitOL (PEXT format (OpReg src_r) dst_r)) `appOL`
(if width == W8 || width == W16 then
-- We used a 16-bit destination register above,
-- so zero-extend
unitOL (MOVZxL II16 (OpReg dst_r) (OpReg dst_r))
else nilOL)
else do
targetExpr <- cmmMakeDynamicReference dflags
CallReference lbl
let target = ForeignTarget targetExpr (ForeignConvention CCallConv
[NoHint] [NoHint]
CmmMayReturn)
genCCall dflags is32Bit target dest_regs args
where
format = intFormat width
lbl = mkCmmCodeLabel primUnitId (fsLit (pextLabel width))

genCCall dflags is32Bit (PrimTarget (MO_Clz width)) dest_regs@[dst] args@[src]
| is32Bit && width == W64 = do
-- Fallback to `hs_clz64` on i386
Expand Down Expand Up @@ -2669,6 +2729,9 @@ outOfLineCmmOp mop res args
MO_Clz w -> fsLit $ clzLabel w
MO_Ctz _ -> unsupported

MO_Pdep _ -> fsLit "hs_pdep"
MO_Pext _ -> fsLit "hs_pext"

MO_AtomicRMW _ _ -> fsLit "atomicrmw"
MO_AtomicRead _ -> fsLit "atomicread"
MO_AtomicWrite _ -> fsLit "atomicwrite"
Expand Down
8 changes: 8 additions & 0 deletions compiler/nativeGen/X86/Instr.hs
Expand Up @@ -343,6 +343,10 @@ data Instr
| BSF Format Operand Reg -- bit scan forward
| BSR Format Operand Reg -- bit scan reverse

-- bit manipulation instructions
| PDEP Format Operand Reg -- [BMI2] deposit bits to the specified mask
| PEXT Format Operand Reg -- [BMI2] extract bits from the specified mask

-- prefetch
| PREFETCH PrefetchVariant Format Operand -- prefetch Variant, addr size, address to prefetch
-- variant can be NTA, Lvl0, Lvl1, or Lvl2
Expand Down Expand Up @@ -459,6 +463,8 @@ x86_regUsageOfInstr platform instr
DELTA _ -> noUsage

POPCNT _ src dst -> mkRU (use_R src []) [dst]
PDEP _ src dst -> mkRU (use_R src []) [dst]
PEXT _ src dst -> mkRU (use_R src []) [dst]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure the above two lines are wrong. I wrote it that way just to get it to compile.

The pdep and pext instructions take three operands, a src (read) a mask (read), and a dst (write), yet I've declared that they takes a src (read) and a dst (write) only.

I'm looking at what options I have:

-- 2 operand form; first operand Read; second Written
-- 2 operand form; first operand Read; second Modified
-- 2 operand form; first operand Modified; second Modified
-- 3 operand form; first operand Read; second Modified; third Modified
-- 1 operand form; operand Modified
-- Registers defd when an operand is written.
-- Registers used when an operand is read.
-- Registers used to compute an effective address.

None of them really work for me. The closest match is the 3 operand one:

  • 3 operand form; first operand Read; second Modified; third Modified

I believe what I need is:

first operand Read; second Read; third Write

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case perhaps what you want is,

PDEP a b c -> mkRU (use_R a $ use_R b) (def_W c)

Does this look better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I was just missing a []

PDEP a b c -> mkRU (use_R a $ use_R b []) (def_W c)

Thanks for the hint!

BSF _ src dst -> mkRU (use_R src []) [dst]
BSR _ src dst -> mkRU (use_R src []) [dst]

Expand Down Expand Up @@ -638,6 +644,8 @@ x86_patchRegsOfInstr instr env
CLTD _ -> instr

POPCNT fmt src dst -> POPCNT fmt (patchOp src) (env dst)
PDEP fmt src dst -> PDEP fmt (patchOp src) (env dst)
PEXT fmt src dst -> PEXT fmt (patchOp src) (env dst)
BSF fmt src dst -> BSF fmt (patchOp src) (env dst)
BSR fmt src dst -> BSR fmt (patchOp src) (env dst)

Expand Down
3 changes: 3 additions & 0 deletions compiler/nativeGen/X86/Ppr.hs
Expand Up @@ -645,6 +645,9 @@ pprInstr (POPCNT format src dst) = pprOpOp (sLit "popcnt") format src (OpReg dst
pprInstr (BSF format src dst) = pprOpOp (sLit "bsf") format src (OpReg dst)
pprInstr (BSR format src dst) = pprOpOp (sLit "bsr") format src (OpReg dst)

pprInstr (PDEP format src dst) = pprOpOp (sLit "hs_pdep")format src (OpReg dst)
pprInstr (PEXT format src dst) = pprOpOp (sLit "hs_pext")format src (OpReg dst)

pprInstr (PREFETCH NTA format src ) = pprFormatOp_ (sLit "prefetchnta") format src
pprInstr (PREFETCH Lvl0 format src) = pprFormatOp_ (sLit "prefetcht0") format src
pprInstr (PREFETCH Lvl1 format src) = pprFormatOp_ (sLit "prefetcht1") format src
Expand Down