Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] more robust JS preprocessor #7105

Closed

Conversation

hujiajie
Copy link
Contributor

@hujiajie hujiajie commented Sep 10, 2018

The patchset is a complete rework of JS preprocessing with stricter syntax validation. It also adds support for more expressive preprocessor directives like the following:

#include "foobar.js"
#if !(VAR1 /* comment */ && VAR2 >= 2)
#elif !VAR3 || VAR4 != "four" // comment
#else
#endif

The directives are parsed according to an LL(1) grammar, using a stack for syntax-directed translation.

@hujiajie hujiajie force-pushed the preprocessor-rework branch 2 times, most recently from 7c6348c to ae57def Compare September 10, 2018 10:50
@hujiajie hujiajie changed the title more robust JS preprocessor [WIP] more robust JS preprocessor Sep 10, 2018
@hujiajie
Copy link
Contributor Author

cc @gyagp

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I appreciate the effort that went into this, but we should perhaps talk about the overall plan here.

This does provide useful benefits, however, it's also a lot of code and complexity, which makes me worry about what will happen if we need to debug an issue there.

For things like #if FULL_ES2 || LEGACY_GL_EMULATION I was actually hoping something much simpler could work, just do eval() on the rest of the line, as those are all global variables. There may be issues I'm not thinking of, though.

emscripten.py Outdated
@@ -522,7 +522,9 @@ def compile_settings(compiler_engine, libraries, temp_files):
# Save settings to a file to work around v8 issue 1579
with temp_files.get_file('.txt') as settings_file:
with open(settings_file, 'w') as s:
json.dump(shared.Settings.to_dict(), s, sort_keys=True)
settings = shared.Settings.to_dict()
settings = { key: int(value) if isinstance(value, bool) else value for key, value in settings.items() }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does json.dump not work on bool values?

if so maybe it would make sense to have a helper function as this appears in another place too.

Copy link
Contributor Author

@hujiajie hujiajie Sep 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True/False in Python is dumped as true/false. Since I'm making very strict type checking here, only string and integers are accepted, and by intention true/false is not automatically converted to 1/0. The change is not necessary otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks.

@hujiajie
Copy link
Contributor Author

I'm not fully convinced yet, but the idea of eval() sounds reasonable too. The question is if it should be legal to use arbitrary JS expressions like VAR1 + VAR2 > VAR3 ? 1 : null in preprocessor directives. I tend to be very conservative here, so the code complexity seems to be inevitable. Maybe I can open another PR for the alternative proposal later for your choice, if desired.

@hujiajie hujiajie force-pushed the preprocessor-rework branch 2 times, most recently from 3846ae4 to 3fa4eb9 Compare September 17, 2018 08:06
@kripken
Copy link
Member

kripken commented Sep 17, 2018

I'd say it's better to allow arbitrary JS there, if it greatly simplifies our implementation, by just using eval.

(And we can be careful about not using really weird JS in practice in preprocessor operations.)

src/settings.js uses 0/1 instead of false/true.
The patch is a complete rework of JS preprocessing with stricter syntax
validation. It also adds support for more expressive preprocessor
directives like the following:

  #include "foobar.js"
  #if !(VAR1 /* comment */ && VAR2 >= 2)
  #elif !VAR3 || VAR4 != "four" // comment
  #else
  #endif

The directives are parsed according to an LL(1) grammar, using a stack
for syntax-directed translation.
@hujiajie
Copy link
Contributor Author

hujiajie commented Dec 5, 2018

Picking up the PR again, and the eval() way is also on my TODO list (though I haven't changed my personal taste on this).

Currently there's a problem that a few identifiers like __EMSCRIPTEN_HAS_noderawfs_js__ in library_fs.js are not always defined in the global, so evaling them would cause ReferenceError. A naive solution is to check if each built-in library is used around the definition of these identifiers in modules.js, but I would prefer to avoid the hard-coded mapping between these identifiers and library names, so I'm also considering the following refactoring:

Before

  • pass library names as command line arguments to compiler.js
  • convert these library names to __EMSCRIPTEN_HAS_*__

After

  • pass either __EMSCRIPTEN_HAS_*__=0 or __EMSCRIPTEN_HAS_*__=1 as command line arguments to compiler.js, for each built-in library
  • convert these identifiers to library names and pull in the library if needed

@kripken
Copy link
Member

kripken commented Dec 5, 2018

@juj added those __EMSCRIPTEN_HAS_noderawfs_js__, perhaps he has more context for them.

@juj
Copy link
Collaborator

juj commented Dec 11, 2018

The __EMSCRIPTEN_HAS_libfile_js__ identifiers were added as an automated way for system libraries to check for the existence of others, to allow FS library to omit items if e.g. NODEFS or other libs were not added in build.

This is probably third or fourth time that better preprocessing support has been proposed (and shied away from). I implemented one in parseTools.js some years back (#1969), then added LLVM/Clang preprocessing with -E flag (#5494), and in my multithreading branch, I have again added LLVM/Clang preprocessing to run on pthread-main.js and fetch-worker.js, because it is painful to maintain JS library options otherwise.

I still think what we should do is delete our own preprocessor and migrate to running LLVM/Clang -E for all of our preprocessing needs. Then we could do much more powerful macros in library_x.js files to simplify the complex reading #ifdef chains we currently have. Just being able to do

#if ASSERTIONS
#define ASSERTIONS_LOG(x) console.log(x)
#else
#define ASSERTIONS_LOG(x)
#endif

would make a lot of JS code look much cleaner, same for GL_DEBUG, GL_ASSERTIONS, FETCH_DEBUG, PTHREADS_DEBUG, etc, which have a lot of inconvenient one-liners. Also things like

  glSamplerParameteri__sig: 'viii',
  glSamplerParameteri: function(sampler, pname, param) {
#if GL_ASSERTIONS
    GL.validateGLObjectID(GL.samplers, sampler, 'glBindSampler', 'sampler');
#endif
    GLctx['samplerParameteri'](sampler ? GL.samplers[sampler] : null, pname, param);
  },
#if USE_PTHREADS
  __tm_current: '; if (ENVIRONMENT_IS_PTHREAD) ___tm_current = PthreadWorkerInit.___tm_current; else PthreadWorkerInit.___tm_current = ___tm_current = allocate({{{ C_STRUCTS.tm.__size__ }}}, "i8", ALLOC_STATIC)',
  __tm_timezone: '; if (ENVIRONMENT_IS_PTHREAD) ___tm_timezone = PthreadWorkerInit.___tm_timezone; else PthreadWorkerInit.___tm_timezone = ___tm_timezone = allocate(intArrayFromString("GMT"), "i8", ALLOC_STATIC)',
  __tm_formatted: '; if (ENVIRONMENT_IS_PTHREAD) ___tm_formatted = PthreadWorkerInit.___tm_formatted; else PthreadWorkerInit.___tm_formatted = ___tm_formatted = allocate({{{ C_STRUCTS.tm.__size__ }}}, "i8", ALLOC_STATIC)',
#else
  __tm_current: '{{{ makeStaticAlloc(C_STRUCTS.tm.__size__) }}}',
  // Statically allocated copy of the string "GMT" for gmtime() to point to
  __tm_timezone: 'allocate(intArrayFromString("GMT"), "i8", ALLOC_STATIC)',
  // Statically allocated time strings.
  __tm_formatted: '{{{ makeStaticAlloc(C_STRUCTS.tm.__size__) }}}',
#endif
  glTexImage2D__deps: ['$emscriptenWebGLGetTexPixelData'
#if USE_WEBGL2
                       , '$emscriptenWebGLGetHeapForType', '$emscriptenWebGLGetShiftForType'
#endif
  ],

would have potential to become much cleaner looking with C preprocessor macros.

If we used LLVM/Clang, we would have 100% guarantee that it is bug-free, and would only need to do brief smoke testing without needing to build up extensive preprocessor test suites.

There have been arguments before that C preprocessor or macros "would not belong to JS" or that Closure would be able to optimize the same things away (some thoughts in #5732), but both arguments fall short in real world when you do need to both optimize and maintain the code. In my experience, C preprocessor really efficiently complements the power that Closure has, and both are needed. In branch incoming...juj:november_2018 I am implementing a new runtime that optimizes the runtime from ~100KB to down to ~5.4KB for small applications ( https://github.com/juj/webgl_render_test, http://clbri.com/webgl_render_test_size_optimized/ ), and it has not been possible without both C preprocessing and Closure. One can kind of do without C macros, but it would be much cleaner with them.

@kripken
Copy link
Member

kripken commented Dec 12, 2018

@juj I tend to agree that using clang's preprocessor may be a good solution. The main downside is we need clang, which we normally have for C/C++, but maybe some user just compiling Rust or something else might not.

However, in any case I think we we may have a similarly easy possible solution with using eval for all preprocessing - I'm looking into that myself now, and am optimistic, but still running tests.

@kripken
Copy link
Member

kripken commented Dec 12, 2018

Passes tests locally for me, so I think the approach can work, I opened #7651

@hujiajie
Copy link
Contributor Author

I remembered that the preprocessor from Clang will warn about certain JS constructs like #7113 ($) or single-quoted strings (''). Either way is fine with me if this concern can be resolved.

@kripken
Copy link
Member

kripken commented Dec 13, 2018

The other PR to implement the preprocessor using eval has landed (#7651), so i think we can close this. Thanks again though for the work here. It's taken us a while to figure out how to move forward with preprocessing, sorry this didn't happen faster.

@kripken kripken closed this Dec 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants