Skip to content

jehna/humanify

Repository files navigation

Humanify

Deobfuscate Javascript code using LLMs ("AI")

This tool uses large language modeles (like ChatGPT & llama2) and other tools to deobfuscate, unminify, transpile, decompile and unpack Javascript code. Note that LLMs don't perform any structural changes – they only provide hints to rename variables and functions. The heavy lifting is done by Babel on AST level to ensure code stays 1-1 equivalent.

➡️ Check out the introduction blog post for in-depth explanation!

Example

Given the following minified code:

function a(e,t){var n=[];var r=e.length;var i=0;for(;i<r;i+=t){if(i+t<r){n.push(e.substring(i,i+t))}else{n.push(e.substring(i,r))}}return n}

The tool will output a human-readable version:

function splitString(inputString, chunkSize) {
  var chunks = [];
  var stringLength = inputString.length;
  var startIndex = 0;
  for (; startIndex < stringLength; startIndex += chunkSize) {
    if (startIndex + chunkSize < stringLength) {
      chunks.push(inputString.substring(startIndex, startIndex + chunkSize));
    } else {
      chunks.push(inputString.substring(startIndex, stringLength));
    }
  }
  return chunks;
}

🚨 NOTE: 🚨

Large files may take some time to process and use a lot of tokens if you use ChatGPT. For a rough estimate, the tool takes about 2 tokens per character to process a file:

echo "$((2 * $(wc -c < yourscript.min.js)))"

So for refrence: a minified bootstrap.min.js would take about $0.5 to un-minify using ChatGPT.

Using --local flag is of course free, but may take more time, be less accurate and not possible with your existing hardware.

Getting started

First install the dependencies:

npm install

Next you'll need to decide whether to use ChatGPT or llama2. In a nutshell:

  • ChatGPT
    • Runs on someone else's computer that's specifically optimized for this kind of things
    • Costs money depending on the length of your code
    • Is more accurate
    • Is (probably) faster
  • llama2
    • Runs locally
    • Is free
    • Is less accurate
    • Needs a local GPU with ~60gb RAM (M1 Mac works just fine)
    • Runs as fast as your GPU does

See instructions below for each option:

ChatGPT

You'll need a ChatGPT API key. You can get one by signing up at https://openai.com/.

There are several ways to provide the API key to the tool:

echo "OPENAI_TOKEN=your-token" > .env && npm start --  -o deobfuscated.js obfuscated-file.js
export OPENAI_TOKEN="your-token" && npm start --  -o deobfuscated.js obfuscated-file.js
OPENAI_TOKEN=your-token npm start --  -o deobfuscated.js obfuscated-file.js
npm start -- --key="your-token"  -o deobfuscated.js obfuscated-file.js

Use your preferred way to provide the API key. Use npm start -- --help to see all available options.

llama2

Prerequisites:

  • You'll need to have a Python 3 environment with conda installed.
  • You need a Huggingface account with access to llama-2-7b-chat-hf model. Make sure to read the instructions on the model page about how to access the model.

Run the following command to install the required Python packages and activate the environment:

conda env create -f environment.yaml
conda activate base

You can now run the tool with:

npm start -- --local -o deobfuscated.js obfuscated-file.js

Note: this downloads ~13gb of model data to your computer on the first run.

Features

The main features of the tool are:

  • Uses ChatGPT functions/llama2 to get smart suggestions to rename variable and function names
  • Uses custom and off-the-shelf Babel plugins to perform AST-level unmanging
  • Uses Webcrack to unbundle Webpack bundles

Contributing

If you'd like to contribute, please fork the repository and use a feature branch. Pull requests are warmly welcome.

Licensing

The code in this project is licensed under MIT license.

About

Deobfuscate Javascript code using ChatGPT

Resources

License

Stars

Watchers

Forks