Skip to content

Latest commit

 

History

History
74 lines (45 loc) · 4.88 KB

README.md

File metadata and controls

74 lines (45 loc) · 4.88 KB

Building a custom web-tree-sitter

Tree-sitter parsers often use external C scanners, and those scanners sometimes use functions in the C standard library. For this to work in a WASM environment, web-tree-sitter needs to have anticipated which stdlib functions will need to be available. If a tree-sitter parser uses stdlib function X, but X is not included in this list of exports, the parser will fail to work and will throw an error whenever it hits a code path that uses the rogue function.

For this reason, Pulsar builds a custom web-tree-sitter. Every time someone tries to integrate a new tree-sitter parser into a Pulsar grammar, they might find that the parser relies on some stdlib function we haven’t included yet — in which case they can let us know and we’ll be able to update our web-tree-sitter build so that it can export that function.

Pulsar will need to do this until tree-sitter#949 is addressed in some way.

Check out the modified branch for the version we’re targeting

At time of writing, Pulsar was targeting web-tree-sitter version 0.20.7, so a branch exists on our fork called v0-20-7-modified. That branch contains a modified exports.json file and a modified script for building web-tree-sitter.

When we target a newer version of web-tree-sitter, a similar branch should be created against the corresponding upstream tag. The commits that were applied on the previous modified branch should be able to be cherry-picked onto the new one rather easily.

Add whatever methods are needed to exports.json

For instance, tree-sitter-ruby introduced a new dependency on the C stdlib function iswupper a while back, and web-tree-sitter doesn’t export that one by default. So we can add the line

  "_iswupper",

in an appropriate place in exports.json, then rebuild web-tree-sitter so that the WASM-built version of the tree-sitter-ruby parser has that function available to it.

If a third-party tree-sitter grammar needs something more esoteric, our default position should be to add it to the build. If the export results in a major change in file size or — somehow — performance, then the change can be discussed.

Run script/build-wasm from the root

To build web-tree-sitter for a particular version, make sure you’re using the appropriate version of Emscripten. This document is useful at matching up tree-sitter versions with Emscripten versions.

The default build-wasm script performs minification with terser. That’s easy enough to turn off — and we do — but even without minifcation, emscripten generates a JS file that doesn’t have line breaks or indentation. We fix this by running js-beautify as a final step.

Pulsar, as a desktop app, doesn’t gain a lot from minification, and ultimately it’s better to have a source file that the user can more easily debug if necessary. And it makes the next change a bit easier:

Add a warning message

When a parser tries to use a stdlib function that isn’t exported by web-tree-sitter, the error that’s thrown is not very useful. So we try to detect when that scenario is going to happen and insert a warning in the console to help users that might otherwise be befuddled.

This may be automated in the future, but for now you can modify tree-sitter.js to include the checkForAsmVersion function:

var Module = typeof Module !== "undefined" ? Module : {};
var TreeSitter = function() {

  function checkForAsmVersion(prop) {
    if (!(prop in Module['asm'])) {
      console.warn(`Warning: parser wants to call function ${prop}, but it is not defined. If parsing fails, this is probably the reason why. Please report this to the Pulsar team so that this parser can be supported properly.`);
    }
  }

  var initPromise;
  var document = typeof window == "object" ? {
    currentScript: window.document.currentScript
  } : null;

You can then search for this line

if (!resolved) resolved = resolveSymbol(prop, true);

and add the following line right below it:

checkForAsmVersion(prop);

The line in question is generated by emscripten, so if it changes in the future, you should be able to look up its equivalent in the correct version of emscripten.

Copy it to vendor

Under lib/binding_web you’ll find the built files tree-sitter.js and tree-sitter.wasm. Copy both to Pulsar’s vendor/tree-sitter directory. Relaunch Pulsar and do a smoke test with a couple of existing grammars to make sure you didn’t break anything.