Skip to content
Browse files

Allow for folding of ElemKind conversion into IO where applicable for…

… better bounds checking/testing (#3807)

This PR introduces a `FoldElemKindConversionIntoIO` pass which looks for single-use Placeholders that have ConvertTo/Quantize/Dequantize right after/before them and folds them into the Placeholder (and SaveNode if an output). It then adds usage of this pass to `compareAgainstInterpreter()` tests. By doing this we allow the backend to more directly test the operator in its desired precision instead of also allowing the results of the test to be impacted by the specific logic used for type conversion (see bottom for before/after Functions).

This is opt-in, and at least initially meant for better bounds testing on tests which use `compareAgainstInterpreter()`. This is a dangerous pass, as it requires converting the associated Tensors for these Placeholders in PlaceholderBindings, and correctly getting handles to these Tensors based on these types. I've made this opt-in based on the optimization options in the compilation context.
Pull Request resolved: #3807

Test Plan:
All tests still pass. I modified the one test (BackendCorrectnessTest's `basicFCNetQuantized`) which currently does Int8 to Int8 comparison to expect bitwise accuracy.

Related to #3710

<img width="399" alt="before" src="">

<img width="510" alt="after" src="">

Differential Revision: D18694602

Pulled By: jfix71

fbshipit-source-id: ff7da1704bc2dc23f380fbdc34eee0f01ef9e939
  • Loading branch information
jfix71 authored and facebook-github-bot committed Dec 3, 2019
1 parent 54790de commit 79be56130b01b0ffb8a92dad161ea4d2e441dae3
@@ -98,7 +98,7 @@ class ExecutionEngine final {
/// \returns the internal graph. Note: After compilation the contents of the
/// module will have been altered and raw pointers to elements of the graph
/// may no longer be valid.
Module &getModule() { return *rawModule_; }
Module &getModule() const { return *rawModule_; }

/// Clears the ExecutionEngine and all CompiledFunctions.
void clear();
@@ -138,6 +138,10 @@ class ExecutionEngine final {
/// \returns a reference to the backend with name \p backendName owned by the
/// Provisioner inside of \ref hostManager_.
Backend &getBackend(llvm::StringRef backendName) const;

/// \returns the single Function contained in this Module.
/// \pre Must be a single Function in the Module.
Function *getSingleFunctionFromModule() const;

@@ -70,6 +70,13 @@ struct OptimizationOptions {

/// If true, perform compile-time computation of constant operations.
bool enableConstantFolding{true};

/// If true, this will merge ConvertTo and Quantize nodes into inputs and
/// outputs of the Function. This means modifying the types of Placeholders
/// and SaveNodes if they have a corresponding ElemKind conversion (ConvertTo,
/// Quantize, Dequantize nodes).. Note that this must be accompanied by
/// modifying the Tensors backing Placeholders at runtime.
bool foldElemKindConversionIntoIO{false};

/// Context for compilation.
@@ -43,6 +43,7 @@ FUN_PASS(FoldLeakyRelu)

// NOTE: This pass must be last; it's used to count the total number of passes.
@@ -67,6 +67,12 @@ void ExecutionEngine::setBackendName(llvm::StringRef backend) {

llvm::StringRef ExecutionEngine::getBackendName() const { return backendName_; }

Function *ExecutionEngine::getSingleFunctionFromModule() const {
auto &fList = getModule().getFunctions();
assert(fList.size() == 1 && "More than one Function in Module.");
return *fList.begin();

ExecutionEngine::~ExecutionEngine() { clear(); }

void ExecutionEngine::clear() {
@@ -3142,6 +3142,82 @@ bool FoldTileAddIntoBatchedAdd::run(Function *F,
return changed;

/// Fold ElemKind conversion nodes (ConvertTo, Quantize, Dequantize) into
/// single-user Placeholders and SaveNodes. Note that this changes the semantics
/// of the IO of the Function and so must be done carefully, i.e. should always
/// be opt-in and done alongside conversion of corresponding Tensors in
/// PlaceholderBindings.
bool FoldElemKindConversionIntoIO::run(Function *F,
const CompilationContext &cctx) {
LOG_SCOPE(F->getLogContext(), getName());

std::unordered_set<SaveNode *> deadSaves;

bool changed = false;
// Since we will be adding in new SaveNodes, reverse iterate to be safe.
auto &nodes = F->getNodes();
for (auto it = nodes.rbegin(), e = nodes.rend(); it != e; it++) {
Node *N = &*it;
// Handle conversion of inputs (conversion of Placeholders):
ConvertToNode *CTN = llvm::dyn_cast<ConvertToNode>(N);
QuantizeNode *QN = llvm::dyn_cast<QuantizeNode>(N);
if (CTN || QN) {
NodeValue in = CTN ? CTN->getInput() : QN->getInput();
Placeholder *P = llvm::dyn_cast<Placeholder>(in);
if (!P || P->getUsers().size() != 1) {

// We have a conversion of a single-use placeholder to some other type, so
// it is safe to do the requested conversion.
NodeValue res = CTN ? CTN->getResult() : QN->getResult();

// Convert the type of the Placeholder to the conversion type.
P->setType(Storage::OutputIdx, res.getType());

// Replace all uses of the original ConvertTo to the Placeholder.

changed = true;

// Handle conversion of outputs (SaveNodes + Placeholders):
if (SaveNode *SN = llvm::dyn_cast<SaveNode>(N)) {
if (!SN) {
if (SN->getPlaceholder()->getUsers().size() != 1) {
ConvertToNode *CTN = llvm::dyn_cast<ConvertToNode>(SN->getInput());
DequantizeNode *DQN = llvm::dyn_cast<DequantizeNode>(SN->getInput());
if (!CTN && !DQN) {
NodeValue in = CTN ? CTN->getInput() : DQN->getInput();

// Set the type of the Placeholder to be same the conversion's input.
SN->getPlaceholder()->setType(Storage::OutputIdx, in.getType());

// Create a new SaveNode directly using the conversion's input.
F->createSave(SN->getName(), in, SN->getPlaceholder());

// Queue up deleting the original SaveNode as it won't be deleted via DCE.
changed = true;

// Delete all the dead saves.
for (SaveNode *SN : deadSaves) {

return changed;

void glow::fold(Function *F, CompilationContext &cctx) {
LOG_SCOPE(F->getLogContext(), "glow::fold")

@@ -3298,6 +3374,16 @@ Error glow::optimizeFunction(Function *F, const Backend &B,
// Optimize the graph again now that we have a lowered representation.
::glow::optimize(F, cctx);

// If requested, fold ElemKind conversion Nodes into inputs and outputs
// (Placeholders and SaveNodes).
if (cctx.optimizationOpts.foldElemKindConversionIntoIO) {
FunctionPassManager FPM("FoldElemKindConversionIntoIO",
if (, cctx)) {
::glow::optimize(F, cctx);

// Allow the backend to transform the graph after lowering.
if (B.transformPostLowering(F, cctx)) {
// If the backend made changes, optimize the graph again. Perform only
@@ -466,7 +466,7 @@ TEST_P(BackendCorrectnessTest, basicFCNet) {
TEST_P(BackendCorrectnessTest, basicFCNetQuantized) {
compareAgainstInterpreter(GetParam(), createAndInitBasicFCNet,
ElemKind::Int8QTy, ElemKind::Int8QTy, 0.0001f,
ElemKind::Int8QTy, ElemKind::Int8QTy, 0.f,

@@ -220,6 +220,40 @@ void dispatchInference(const std::string &fname,

/// Helper that iterates over all of the Placeholders in \p PHs and converts the
/// Tensor pair found in \p bindings to the same type as the Placeholder if
/// necessary.
static void convertBindingsToCorrectType(PlaceholderBindings &bindings,
PlaceholderList PHs) {
for (Placeholder *PH : PHs) {
Tensor *T = bindings.get(PH);
TypeRef newTy = PH->getType();
if (T->getType().isEqual(newTy)) {
ElemKind newK = newTy->getElementType();
if (isQuantizedElemKind(newK)) {
Tensor QT = quantization::quantizeTensor(
*T, {newTy->getScale(), newTy->getOffset()}, newK);
} else {

/// Helper to get a float copy of a Tensor \p T if needed.
static Tensor convertToFloatIfNecessary(Tensor &T) {
const ElemKind srcK = T.getType().getElementType();
if (srcK == ElemKind::FloatTy) {
return std::move(T);
if (isQuantizedElemKind(srcK)) {
return quantization::dequantizeTensor(T, ElemKind::FloatTy);
return T.getCopyConvertedToType(ElemKind::FloatTy);

void compareAgainstInterpreter(llvm::StringRef backendName,
CreateAndInitFunction createAndInitFunction,
ElemKind interpElemKind,
@@ -255,21 +289,43 @@ void compareAgainstInterpreter(llvm::StringRef backendName,
CompilationContext &cctxI = configs.first;
CompilationContext &cctxB = configs.second;

// Skip conversion for rowwise quantized tests as they are a special case
// which don't fit cleanly here -- e.g. RWQ-SLS has FloatTy outputs.
if (!enableRowwiseQuantization) {
// We want to compare the ops themselves and not see differences in
// conversion, so fold ElemKind conversion nodes into IO.
cctxI.optimizationOpts.foldElemKindConversionIntoIO = true;
cctxB.optimizationOpts.foldElemKindConversionIntoIO = true;

// Clone the Function inside itself many times if desired.
std::unordered_set<Tensor *> resultTensors =
cloneFunInsideFun(BFT, &bBindings, cctxB, count);
assert(resultTensors.size() == count &&
"Should get the same number of Tensors back as count.");


// Again skip rowwise quantization as before.
if (!enableRowwiseQuantization) {
// Now that we have compiled, precision transformation has occurred. Now
// convert all mismatches for Placeholders given their original bindings.
iBindings, IEE.getSingleFunctionFromModule()->findPlaceholders());
bBindings, BEE.getSingleFunctionFromModule()->findPlaceholders());

// Compare each of our result tensors to the original.
// Compare each of our result tensors to the original. Always convert back to
// float if necessary, as allowed error is expected to compare float.
Tensor finalIT = convertToFloatIfNecessary(*IFT.second);
for (Tensor *T : resultTensors) {
EXPECT_TRUE(IFT.second->isEqual(*T, allowedError));
Tensor finalBT = convertToFloatIfNecessary(*T);
EXPECT_TRUE(finalIT.isEqual(finalBT, allowedError, /* verbose */ true));

// Additionally check that each of the results from the parallel cloned

0 comments on commit 79be561

Please sign in to comment.
You can’t perform that action at this time.