Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write LLVM optimization passes for train_gpt2 #18

Open
ent0n29 opened this issue Apr 9, 2024 · 2 comments
Open

write LLVM optimization passes for train_gpt2 #18

ent0n29 opened this issue Apr 9, 2024 · 2 comments

Comments

@ent0n29
Copy link
Contributor

ent0n29 commented Apr 9, 2024

Here is a little example:

multiplications where one operand is a power of 2 and a constant integer, are optimized with a shift operation and the shift amount is calculated using the logBase2 of the constant.

bool optBasicStrengthReduction(Instruction &I) {
  auto OpCode = I.getOpcode();

  if (OpCode != Instruction::Mul) return false;

  Value *Op1 = I.getOperand(0);
  Value *Op2 = I.getOperand(1);
  ConstantInt *CI = nullptr;

  // Check if op is a constant integer and is a power of 2
  auto isConstPowOf2 = [&CI](Value *op) {
    return (CI = dyn_cast<ConstantInt>(op))
      and CI->getValue().isPowerOf2()
      and not CI->isOne();
  };

  if (isConstPowOf2(Op1)) std::swap(Op1, Op2);
  if (not isConstPowOf2(Op2)) return false;

  errs() << "Triggered train_gpt2 optimization\n";

  // Shift amount calculation
  unsigned ShiftAmount = CI->getValue().logBase2();

  // Create a new shift instruction
  Instruction *ShiftInst = BinaryOperator::Create(
    Instruction::Shl,
    Op1, ConstantInt::get(CI->getType(), ShiftAmount)
  );
    
  ShiftInst->insertAfter(&I);
  I.replaceAllUsesWith(ShiftInst);

  return true;
}

and we need to add a call to the opt in a runOnBasicBlock function:

bool runOnBasicBlock(BasicBlock &B) {
  bool globallyModified = false;
  std::set<Instruction*> toBeErased;

  for (auto &I : B) {
    bool locallyModified =
      // here you can add all your opt passes
      optBasicStrengthReduction(I)
        || optExample2(I)
        || optExample3(I)
        || optExample4(I)
        ...
    
    // dead code elimination
    if (locallyModified) {
      toBeErased.insert(&I);
      globallyModified = true;
    }
  }

  for (auto *I : toBeErased) {
    I->eraseFromParent();
  }

  return globallyModified;
}

to apply the passes we need to convert train_gpt2 to a LLVM-IR using the clang compiler:

$ clang -emit-llvm -c train_gpt2.c -o train_gpt2.bc
#apply the opt pass
$ opt -load ./build/LocalOpts.so -local-opts train_gpt2.bc -o train_gpt2_opt.bc
#obtain the optimized train_gpt2.c
$ clang train_gpt2_opt.bc -o train_gpt2_opt
@chadbrewbaker
Copy link

chadbrewbaker commented Apr 9, 2024

I was discussing this yesterday with @jonmasters. Ideally this would be a script that takes llm.c and transforms it into specialized but still legible C code for a particular architecture. It could do buffer size tuning etc like Mojo🔥

It would also be nice to have a memory/cache layout visualizer.

@blasty has some great human friendly inline assembler examples https://github.com/blasty/unwyze/blob/638e7d17e752a30a3e758f51e436f752954afbd4/exploit/src/main.c#L180

@ent0n29
Copy link
Contributor Author

ent0n29 commented Apr 9, 2024

looking into it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants