Future of preprocessing? #5732

kripken · 2017-10-31T22:03:49Z

We use preprocessing on our JS files. The current implementation is a simple one (with many limitations) in JS. It would be good to be able to run it from python too, so we could use it in more places - right now, for example, we duplicate a few files since the SINGLE_FILE pr landed, as it would need preprocessing from python.

See some debate in #5494 , and also relevant:

Introduce SUPPORT_SHELL, SUPPORT_NODEJS, SUPPORT_SPIDERMONKEY, SUPPORT_IE11 etc. -s settings #5554
Gated out #include directives in JS libraries are still included #5458
relative #include "" directives are treated with respect to $EMSCRIPTEN/src #5457

One proposal was to use the clang preprocessor for everything. The benefit is it's a solid, standard preprocessor we already have. On the other hand it would means we require clang - consider if a language like Rust wanted to just ship Rust + emscripten, it would just need LLVM but not clang.

Alternatively, we could write and maintain a small preprocessor for our purposes, sort of like we do now, but more full-featured and easily usable from both JS and python.

dschuff · 2017-11-03T22:21:41Z

Using the C preprocessor doesn't have to mean Clang, it could be any C compiler. I don't think that "any C preprocessor" is that much of a burden. For example if you wanted to ship a really-minimal rust-only SDK you could include tcc or pcpp instead of clang. Otherwise you're in this uncanny valley where you have what looks like CPP but isn't. And you probably have extra bugs because you decided to roll your own. Having said that of course, CPP is still probably more featureful than we actually need; we mostly just use #ifdef and we don't use function-like macros, right? So maybe we just add proper conditional expressions and call it a day? (that's the bit that I really miss). But if you let the scope creep, then you're back into the same tradeoff.

juj · 2017-11-06T12:57:26Z

I'd really like going the route of full C preprocessing, since that would let me use #defines, #includes and C macro functions in .js files, which would help with more efficient dead code elimination. The uncanny valley aspect has bit a lot of developers, so reusing LLVM/Clang for the full thing proper would help a lot, and we would not need to even test the implementation since it'll be guaranteed to be good.

kripken · 2017-11-06T20:50:38Z

@juj What would be a use case for supporting C macros? (First I hear of that, I thought the context here was just ifdefing.)

juj · 2017-11-10T14:20:58Z

One example is that I find to be doing something like

#if FETCH_DEBUG
  console.error('tracingLogPrintWithCallstack');
#endif

#if GL_DEBUG
  console.error('tracingLogPrintWithCallstack');
#endif

#if ASMFS_DEBUG
  console.error('tracingLogPrintWithCallstack');
#endif

etc.

and it would be nice to replace those with a #define FETCH_DEBUG(str) in a JavaScript file.

Another example is when I wanted to do pthreads proxying, and wanted to have a prologue in all proxied functions: 1dbac7a#diff-94dc100be52b26c684387d19d991a887R52.

More recently such a scenario occurs when I want to do GL context proxying where a GL context might either be owned by the current thread, or by some other thread, and certain functions receive a prologue where they check if (callingThreadHostsTheGLContextOrSomeOtherThread) to know where to route the call.

kripken · 2017-11-10T19:43:12Z

Maybe I missed something, wouldn't the first example be best as

#if FETCH_DEBUG || GL_DEBUG || ASMFS_DEBUG
  console.error('tracingLogPrintWithCallstack');
#endif

Or are you saying that pattern itself would be very common and you want to replace it all with

FETCH_DEBUG(tracingLogPrintWithCallstack)

?

That does make sense. But on the other hand, we do have the {{{ code }}} capability already, so we could do

// this would be written once somewhere
{{{
function fetchDebug(str) {
  if (FETCH_DEBUG || GL_DEBUG || ASMFS_DEBUG) {
    return "console.error('" + str + "');";
  } else {
    return '';
  }
}}}
[..]
// then this can be anywhere after it
{{{ fetchDebug('tracingLogPrintWithCallstack') }}}

juj · 2017-11-10T21:01:02Z

Or are you saying that pattern itself would be very common and you want to replace it all with

FETCH_DEBUG(tracingLogPrintWithCallstack)

Yeah, this case - would be nice to be able to do one-liners like this.

But on the other hand, we do have the {{{ code }}} capability already, so we could do ...

My understanding is that this would be a compiler side thing, so would have to ad hoc create rules in the compiler .js files, whereas with macros one could use them in place in the library code itself. And also developers doing their own libraries could do their own macros for efficient DCE mechanisms.

kripken · 2017-11-13T20:48:52Z

A library could still do that. But it is a few more lines than a C-style macro, that's true.

C-style macros are more invasive, though, the preprocessor needs to look for them in all the code, not just on lines starting with # etc.

But, they are more familiar of course.

saschanaz · 2017-11-26T05:04:56Z

Personally I hope a future preprocessor can manage JS-compatible syntax so that I won't see these red lines anymore.

curiousdannii · 2017-11-26T15:29:28Z

So to put another idea out there, emscripten could take a page from the literate programming book.

Goals:

Clean and simple source code (if possible with good code editor support)
Allow the code gen code to be modularised better (Investigate ES6 modules as our internal components #5828)
Allow code to be compartmentalised

The advantage of a literate programming style system for us is that subsequent definitions of each kind of block are concatenated/appended together.

I quite like the syntax of cdosborn/lit. For example, imagine a setup where we have separate files for wasm and asm.js. Each file can then define and extend blocks.

Define some functions to use later.
    << definitions >>=
    var wasm_filename = {{{ WASM_FILENAME }}}
    << wasm fetch >>
    << wasm setup >>

Add a couple of entries to the startup promise chain:
    << startup promise chain >>=
    .then( wasm_fetch )
    .then( wasm_startup )

Add some standard lib functions:
    << standard lib >>=
    function malloc() { ... }

    function emscripten_wget() { ... }

    << base standard lib >>

So if you're not familiar with literate programming, one of the core ideas is that source code structure doesn't have to equal program structure. You can define blocks of code and then include them later, or before. In the lit syntax << ... >>= defines a block, concatenating it if there are already definitions, and << ... >> includes a block. Using literate programming also allows you to include documentation along with the code (often all the docs are generated from the code. We wouldn't have to do that.) If we used a syntax like this, the main file could be markdown. I'm not sure which editors are smart enough to do syntax formatting for code blocks, but some would be.

The code can be broken out into files for each aspect of the code, and conditionally included only if they're needed. So separate files for wasm and asm.js, for emterpreter, for GL and AL, threads, modularise, etc. The main file doesn't need to know what code eventually gets included, it just says << definitions >> at the appropriate place and everything else gets included. A code gen hook system essentially.

Just to be clear, the main advantage I'm thinking this has is the block concatenating/appending. I looked at CPP and a whole bunch of JS templating systems, and couldn't see any that had it. But maybe there are other options. Maybe it's even possible with CPP.

stale · 2019-09-19T06:45:12Z

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 7 days. Feel free to re-open at any time if this issue is still relevant.

dschuff mentioned this issue Nov 17, 2017

Initial support for pthreads with wasm #5710

Merged

saschanaz mentioned this issue Dec 29, 2017

Support ES6 in JS optimization (Babel? Acorn?) #6000

Closed

juj mentioned this issue Dec 11, 2018

[WIP] more robust JS preprocessor #7105

Closed

stale bot added the wontfix label Sep 19, 2019

stale bot closed this as completed Sep 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Future of preprocessing? #5732

Future of preprocessing? #5732

kripken commented Oct 31, 2017

dschuff commented Nov 3, 2017

juj commented Nov 6, 2017

kripken commented Nov 6, 2017 •

edited

juj commented Nov 10, 2017

kripken commented Nov 10, 2017

juj commented Nov 10, 2017

kripken commented Nov 13, 2017

saschanaz commented Nov 26, 2017

curiousdannii commented Nov 26, 2017 •

edited

stale bot commented Sep 19, 2019

Future of preprocessing? #5732

Future of preprocessing? #5732

Comments

kripken commented Oct 31, 2017

dschuff commented Nov 3, 2017

juj commented Nov 6, 2017

kripken commented Nov 6, 2017 • edited

juj commented Nov 10, 2017

kripken commented Nov 10, 2017

juj commented Nov 10, 2017

kripken commented Nov 13, 2017

saschanaz commented Nov 26, 2017

curiousdannii commented Nov 26, 2017 • edited

stale bot commented Sep 19, 2019

kripken commented Nov 6, 2017 •

edited

curiousdannii commented Nov 26, 2017 •

edited