[WIP] more robust JS preprocessor #7105

hujiajie · 2018-09-10T07:37:21Z

The patchset is a complete rework of JS preprocessing with stricter syntax validation. It also adds support for more expressive preprocessor directives like the following:

#include "foobar.js"
#if !(VAR1 /* comment */ && VAR2 >= 2)
#elif !VAR3 || VAR4 != "four" // comment
#else
#endif

The directives are parsed according to an LL(1) grammar, using a stack for syntax-directed translation.

hujiajie · 2018-09-10T11:17:02Z

cc @gyagp

kripken

Thanks! I appreciate the effort that went into this, but we should perhaps talk about the overall plan here.

This does provide useful benefits, however, it's also a lot of code and complexity, which makes me worry about what will happen if we need to debug an issue there.

For things like #if FULL_ES2 || LEGACY_GL_EMULATION I was actually hoping something much simpler could work, just do eval() on the rest of the line, as those are all global variables. There may be issues I'm not thinking of, though.

kripken · 2018-09-11T19:43:21Z

emscripten.py

@@ -522,7 +522,9 @@ def compile_settings(compiler_engine, libraries, temp_files):
  # Save settings to a file to work around v8 issue 1579
  with temp_files.get_file('.txt') as settings_file:
    with open(settings_file, 'w') as s:
-      json.dump(shared.Settings.to_dict(), s, sort_keys=True)
+      settings = shared.Settings.to_dict()
+      settings = { key: int(value) if isinstance(value, bool) else value for key, value in settings.items() }


does json.dump not work on bool values?

if so maybe it would make sense to have a helper function as this appears in another place too.

True/False in Python is dumped as true/false. Since I'm making very strict type checking here, only string and integers are accepted, and by intention true/false is not automatically converted to 1/0. The change is not necessary otherwise.

I see, thanks.

hujiajie · 2018-09-12T10:58:15Z

I'm not fully convinced yet, but the idea of eval() sounds reasonable too. The question is if it should be legal to use arbitrary JS expressions like VAR1 + VAR2 > VAR3 ? 1 : null in preprocessor directives. I tend to be very conservative here, so the code complexity seems to be inevitable. Maybe I can open another PR for the alternative proposal later for your choice, if desired.

kripken · 2018-09-17T18:55:28Z

I'd say it's better to allow arbitrary JS there, if it greatly simplifies our implementation, by just using eval.

(And we can be careful about not using really weird JS in practice in preprocessor operations.)

src/settings.js uses 0/1 instead of false/true.

The patch is a complete rework of JS preprocessing with stricter syntax validation. It also adds support for more expressive preprocessor directives like the following: #include "foobar.js" #if !(VAR1 /* comment */ && VAR2 >= 2) #elif !VAR3 || VAR4 != "four" // comment #else #endif The directives are parsed according to an LL(1) grammar, using a stack for syntax-directed translation.

hujiajie · 2018-12-05T10:22:26Z

Picking up the PR again, and the eval() way is also on my TODO list (though I haven't changed my personal taste on this).

Currently there's a problem that a few identifiers like __EMSCRIPTEN_HAS_noderawfs_js__ in library_fs.js are not always defined in the global, so evaling them would cause ReferenceError. A naive solution is to check if each built-in library is used around the definition of these identifiers in modules.js, but I would prefer to avoid the hard-coded mapping between these identifiers and library names, so I'm also considering the following refactoring:

Before

pass library names as command line arguments to compiler.js
convert these library names to __EMSCRIPTEN_HAS_*__

After

pass either __EMSCRIPTEN_HAS_*__=0 or __EMSCRIPTEN_HAS_*__=1 as command line arguments to compiler.js, for each built-in library
convert these identifiers to library names and pull in the library if needed

kripken · 2018-12-05T19:05:59Z

@juj added those __EMSCRIPTEN_HAS_noderawfs_js__, perhaps he has more context for them.

juj · 2018-12-11T13:48:26Z

The __EMSCRIPTEN_HAS_libfile_js__ identifiers were added as an automated way for system libraries to check for the existence of others, to allow FS library to omit items if e.g. NODEFS or other libs were not added in build.

This is probably third or fourth time that better preprocessing support has been proposed (and shied away from). I implemented one in parseTools.js some years back (#1969), then added LLVM/Clang preprocessing with -E flag (#5494), and in my multithreading branch, I have again added LLVM/Clang preprocessing to run on pthread-main.js and fetch-worker.js, because it is painful to maintain JS library options otherwise.

I still think what we should do is delete our own preprocessor and migrate to running LLVM/Clang -E for all of our preprocessing needs. Then we could do much more powerful macros in library_x.js files to simplify the complex reading #ifdef chains we currently have. Just being able to do

#if ASSERTIONS
#define ASSERTIONS_LOG(x) console.log(x)
#else
#define ASSERTIONS_LOG(x)
#endif

would make a lot of JS code look much cleaner, same for GL_DEBUG, GL_ASSERTIONS, FETCH_DEBUG, PTHREADS_DEBUG, etc, which have a lot of inconvenient one-liners. Also things like

  glSamplerParameteri__sig: 'viii',
  glSamplerParameteri: function(sampler, pname, param) {
#if GL_ASSERTIONS
    GL.validateGLObjectID(GL.samplers, sampler, 'glBindSampler', 'sampler');
#endif
    GLctx['samplerParameteri'](sampler ? GL.samplers[sampler] : null, pname, param);
  },

#if USE_PTHREADS
  __tm_current: '; if (ENVIRONMENT_IS_PTHREAD) ___tm_current = PthreadWorkerInit.___tm_current; else PthreadWorkerInit.___tm_current = ___tm_current = allocate({{{ C_STRUCTS.tm.__size__ }}}, "i8", ALLOC_STATIC)',
  __tm_timezone: '; if (ENVIRONMENT_IS_PTHREAD) ___tm_timezone = PthreadWorkerInit.___tm_timezone; else PthreadWorkerInit.___tm_timezone = ___tm_timezone = allocate(intArrayFromString("GMT"), "i8", ALLOC_STATIC)',
  __tm_formatted: '; if (ENVIRONMENT_IS_PTHREAD) ___tm_formatted = PthreadWorkerInit.___tm_formatted; else PthreadWorkerInit.___tm_formatted = ___tm_formatted = allocate({{{ C_STRUCTS.tm.__size__ }}}, "i8", ALLOC_STATIC)',
#else
  __tm_current: '{{{ makeStaticAlloc(C_STRUCTS.tm.__size__) }}}',
  // Statically allocated copy of the string "GMT" for gmtime() to point to
  __tm_timezone: 'allocate(intArrayFromString("GMT"), "i8", ALLOC_STATIC)',
  // Statically allocated time strings.
  __tm_formatted: '{{{ makeStaticAlloc(C_STRUCTS.tm.__size__) }}}',
#endif

  glTexImage2D__deps: ['$emscriptenWebGLGetTexPixelData'
#if USE_WEBGL2
                       , '$emscriptenWebGLGetHeapForType', '$emscriptenWebGLGetShiftForType'
#endif
  ],

would have potential to become much cleaner looking with C preprocessor macros.

If we used LLVM/Clang, we would have 100% guarantee that it is bug-free, and would only need to do brief smoke testing without needing to build up extensive preprocessor test suites.

There have been arguments before that C preprocessor or macros "would not belong to JS" or that Closure would be able to optimize the same things away (some thoughts in #5732), but both arguments fall short in real world when you do need to both optimize and maintain the code. In my experience, C preprocessor really efficiently complements the power that Closure has, and both are needed. In branch incoming...juj:november_2018 I am implementing a new runtime that optimizes the runtime from ~100KB to down to ~5.4KB for small applications ( https://github.com/juj/webgl_render_test, http://clbri.com/webgl_render_test_size_optimized/ ), and it has not been possible without both C preprocessing and Closure. One can kind of do without C macros, but it would be much cleaner with them.

kripken · 2018-12-12T01:37:23Z

@juj I tend to agree that using clang's preprocessor may be a good solution. The main downside is we need clang, which we normally have for C/C++, but maybe some user just compiling Rust or something else might not.

However, in any case I think we we may have a similarly easy possible solution with using eval for all preprocessing - I'm looking into that myself now, and am optimistic, but still running tests.

kripken · 2018-12-12T01:57:51Z

Passes tests locally for me, so I think the approach can work, I opened #7651

hujiajie · 2018-12-12T02:00:33Z

I remembered that the preprocessor from Clang will warn about certain JS constructs like #7113 ($) or single-quoted strings (''). Either way is fine with me if this concern can be resolved.

kripken · 2018-12-13T01:06:16Z

The other PR to implement the preprocessor using eval has landed (#7651), so i think we can close this. Thanks again though for the work here. It's taken us a while to figure out how to move forward with preprocessing, sorry this didn't happen faster.

hujiajie force-pushed the preprocessor-rework branch 2 times, most recently from 7c6348c to ae57def Compare September 10, 2018 10:50

hujiajie changed the title ~~more robust JS preprocessor~~ [WIP] more robust JS preprocessor Sep 10, 2018

kripken reviewed Sep 11, 2018

View reviewed changes

hujiajie force-pushed the preprocessor-rework branch 2 times, most recently from 3846ae4 to 3fa4eb9 Compare September 17, 2018 08:06

hujiajie added 5 commits December 5, 2018 10:12

small #endif directive fix

be02241

canonicalize #include directives in JS files

80c046e

convert False/True to 0/1 when dumping settings

3876665

src/settings.js uses 0/1 instead of false/true.

retire USES_GL_EMULATION

a977211

hujiajie force-pushed the preprocessor-rework branch from 3fa4eb9 to a977211 Compare December 5, 2018 10:11

hujiajie mentioned this pull request Dec 5, 2018

Hello, WebGL 2.0 Compute #7612

Closed

kripken closed this Dec 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] more robust JS preprocessor #7105

[WIP] more robust JS preprocessor #7105

hujiajie commented Sep 10, 2018 •

edited

hujiajie commented Sep 10, 2018

kripken left a comment

kripken Sep 11, 2018

hujiajie Sep 12, 2018 •

edited

kripken Sep 17, 2018

hujiajie commented Sep 12, 2018

kripken commented Sep 17, 2018

hujiajie commented Dec 5, 2018

kripken commented Dec 5, 2018

juj commented Dec 11, 2018

kripken commented Dec 12, 2018

kripken commented Dec 12, 2018

hujiajie commented Dec 12, 2018

kripken commented Dec 13, 2018

[WIP] more robust JS preprocessor #7105

[WIP] more robust JS preprocessor #7105

Conversation

hujiajie commented Sep 10, 2018 • edited

hujiajie commented Sep 10, 2018

kripken left a comment

Choose a reason for hiding this comment

kripken Sep 11, 2018

Choose a reason for hiding this comment

hujiajie Sep 12, 2018 • edited

Choose a reason for hiding this comment

kripken Sep 17, 2018

Choose a reason for hiding this comment

hujiajie commented Sep 12, 2018

kripken commented Sep 17, 2018

hujiajie commented Dec 5, 2018

kripken commented Dec 5, 2018

juj commented Dec 11, 2018

kripken commented Dec 12, 2018

kripken commented Dec 12, 2018

hujiajie commented Dec 12, 2018

kripken commented Dec 13, 2018

hujiajie commented Sep 10, 2018 •

edited

hujiajie Sep 12, 2018 •

edited