Sync to Async Transformation #1881

Closed
coolwanglu opened this Issue Dec 1, 2013 · 13 comments

Comments

Projects
None yet
3 participants
@coolwanglu
Contributor

coolwanglu commented Dec 1, 2013

Recently I managed to convert one of the most complicated demos of PDCurses, in a clever way (and also a stupid way).

I got a few thoughts on this topic during this terrible experience, and had them posted in this article. Disclaimer: I'm interested but I'm far from an expert in this area. Please forgive me and let me know if some/all of my points are stupid. Thanks!

tl;dr

  • Motivation: to make sleep and other similar functions actually work in our Runtime, without lots of labor work porting the C/C++ code
  • I actually made it works
  • sync->async is doable and I think it's not too hard - there are already implementations for JS
  • I think this is a feature that cannot be supported by merely an external library
  • Does it fit emscripten? Or is it possible to write a plugin for emscripten?

A plugin/extension should work, somewhat like the closure compiler, as a transformer. But ideally I think it's better integrated into emscripten for optimization reasons, depending on your designs and targets of course.

I've got experience on C/C++ and JavaScript, but I've no idea about clang/llvm, so I'm not sure how much I can help implement this feature. But I think it is important for emscripten users.

@kripken

This comment has been minimized.

Show comment
Hide comment
@kripken

kripken Dec 11, 2013

Owner

Definitely very important, I agree. My main concern is performance.

Have you looked at generators and task.js, btw?

Owner

kripken commented Dec 11, 2013

Definitely very important, I agree. My main concern is performance.

Have you looked at generators and task.js, btw?

@coolwanglu

This comment has been minimized.

Show comment
Hide comment
@coolwanglu

coolwanglu Dec 11, 2013

Contributor

@kripken I've been mainly using streamline.js, which seems to be potentially best fitting emscripten — as code generated by emscripten are in simple forms (without closures or anonymous functions etc). Maybe I could show you a demo in a few days.

Generators seem to be slower than callbacks according to jsperf (maybe will be changed in the future), and I'm not sure how generators and task.js could support recursive function calls, seems that they do not save an async stack, but I'm not sure.

Contributor

coolwanglu commented Dec 11, 2013

@kripken I've been mainly using streamline.js, which seems to be potentially best fitting emscripten — as code generated by emscripten are in simple forms (without closures or anonymous functions etc). Maybe I could show you a demo in a few days.

Generators seem to be slower than callbacks according to jsperf (maybe will be changed in the future), and I'm not sure how generators and task.js could support recursive function calls, seems that they do not save an async stack, but I'm not sure.

@coolwanglu

This comment has been minimized.

Show comment
Hide comment
@coolwanglu

coolwanglu Dec 11, 2013

Contributor

@kripken Here's something that demonstrates my idea: http://coolwanglu.github.io/vim.js/web/vim.html — after all kinds of failed attempts.

It's WIP and only tested on Firefox, I guess it might become a good demo in the future.

The performance is OK (and it's faster on Chrome), although I don't have another to compare with.

The transformation is done with https://github.com/coolwanglu/vim.js/blob/master/web/transform.js, and there are a number of assumptions on the javascript code, which are declared in the beginning of the file. I'm not sure if they will still be true for the future versions of emscripten.

Thoughts:

  • Seems that streamline.js doesn't work well with the relooper, I wonder how to disable it. Although the code still work, the relooped algorithm does not seems to make sense any more, since async functions are split into smaller ones. It should make more sense to apply the relooper algorithm after the CPS transformation, so the CPS transformation should be done in the LLVM bytecode level — which I'm not familiar with
  • With the assumptions defined in transform.js, and maybe others, streamline.js may be specially optimized and tailored. It supports complicated features of JavaScript.
  • Function pointers are evil for the transformation, the solution in my mind is to mark it somehow in the source code and take special care in the final JavaScript. Currently I did all these manually — luckly they are not that terribly many
  • (off-topic) How can I export the macros and use them with cDefine? I tried the python script in tools/, which didn't work since this is a large project and there are complicated macro dependencies. I wonder if the macros can be generated with emcc during compilation
Contributor

coolwanglu commented Dec 11, 2013

@kripken Here's something that demonstrates my idea: http://coolwanglu.github.io/vim.js/web/vim.html — after all kinds of failed attempts.

It's WIP and only tested on Firefox, I guess it might become a good demo in the future.

The performance is OK (and it's faster on Chrome), although I don't have another to compare with.

The transformation is done with https://github.com/coolwanglu/vim.js/blob/master/web/transform.js, and there are a number of assumptions on the javascript code, which are declared in the beginning of the file. I'm not sure if they will still be true for the future versions of emscripten.

Thoughts:

  • Seems that streamline.js doesn't work well with the relooper, I wonder how to disable it. Although the code still work, the relooped algorithm does not seems to make sense any more, since async functions are split into smaller ones. It should make more sense to apply the relooper algorithm after the CPS transformation, so the CPS transformation should be done in the LLVM bytecode level — which I'm not familiar with
  • With the assumptions defined in transform.js, and maybe others, streamline.js may be specially optimized and tailored. It supports complicated features of JavaScript.
  • Function pointers are evil for the transformation, the solution in my mind is to mark it somehow in the source code and take special care in the final JavaScript. Currently I did all these manually — luckly they are not that terribly many
  • (off-topic) How can I export the macros and use them with cDefine? I tried the python script in tools/, which didn't work since this is a large project and there are complicated macro dependencies. I wonder if the macros can be generated with emcc during compilation
@ngld

This comment has been minimized.

Show comment
Hide comment
@ngld

ngld Dec 17, 2013

Contributor

What macros do you mean? If you mean a simple #define SOMETHING 123, you can write a json file listing them (similiar to src/struct_info.json). tools/gen_struct_info.py contains some documentation on this.

Contributor

ngld commented Dec 17, 2013

What macros do you mean? If you mean a simple #define SOMETHING 123, you can write a json file listing them (similiar to src/struct_info.json). tools/gen_struct_info.py contains some documentation on this.

@coolwanglu

This comment has been minimized.

Show comment
Hide comment
@coolwanglu

coolwanglu Dec 19, 2013

Contributor

@ngld There are extra includes and dependencies, for example #ifdef etc. Some macros are specified in the command line parameters. Currently I extract the values and directly use them in js, but I'm afriad that some values might be changed for different configuration.

Maybe there are parameters of clang/g++ to export all the macros? such that I can fee the file to gen_struct_info

Contributor

coolwanglu commented Dec 19, 2013

@ngld There are extra includes and dependencies, for example #ifdef etc. Some macros are specified in the command line parameters. Currently I extract the values and directly use them in js, but I'm afriad that some values might be changed for different configuration.

Maybe there are parameters of clang/g++ to export all the macros? such that I can fee the file to gen_struct_info

@kripken

This comment has been minimized.

Show comment
Hide comment
@kripken

kripken Dec 31, 2013

Owner

Does it basically add a last parameter for a continuation in all async functions? Then perhaps for function pointer calls, you can assume they are all async (anything in a function table would be made async)?

Can you elaborate on the problem with the relooper? Perhaps a concrete code sample would help me understand the issue there.

Owner

kripken commented Dec 31, 2013

Does it basically add a last parameter for a continuation in all async functions? Then perhaps for function pointer calls, you can assume they are all async (anything in a function table would be made async)?

Can you elaborate on the problem with the relooper? Perhaps a concrete code sample would help me understand the issue there.

@coolwanglu

This comment has been minimized.

Show comment
Hide comment
@coolwanglu

coolwanglu Dec 31, 2013

Contributor

Sorry about the relooper issue, it turns out to be my fault, I did not enable relooper all the time.

The callback function is prepended to the argument list in every definition and call to async functions.

The functions are not transformed unless necessary, as the overhead brought by streamline.js is not negligible.
Therefore the situation is complicated, some are async and some are sync.

Contributor

coolwanglu commented Dec 31, 2013

Sorry about the relooper issue, it turns out to be my fault, I did not enable relooper all the time.

The callback function is prepended to the argument list in every definition and call to async functions.

The functions are not transformed unless necessary, as the overhead brought by streamline.js is not negligible.
Therefore the situation is complicated, some are async and some are sync.

@coolwanglu

This comment has been minimized.

Show comment
Hide comment
@coolwanglu

coolwanglu Dec 31, 2013

Contributor

A little more explanation:
in the example of vim.js, it parses and executes vimscript, and there is a function table storing the mapping from function name to function pointers, for example:

struct {
   char * name;
   void * (*func)(void*);
}[] = {
   {'printf', f_printf},
   {'wait', f_wait}  
};

void * call_func(char * name, void * arg) {
   return find_entry(name)->func(arg);
}

Suppose that f_printf is sync and f_wait is async, if theses two functions are called by names, i.e. f_printf("hi'), it'd be easy to recognize and transform them (if necessary). But in call_func, we don't know if it's sync or not, so currently I check it in the runtime.

Contributor

coolwanglu commented Dec 31, 2013

A little more explanation:
in the example of vim.js, it parses and executes vimscript, and there is a function table storing the mapping from function name to function pointers, for example:

struct {
   char * name;
   void * (*func)(void*);
}[] = {
   {'printf', f_printf},
   {'wait', f_wait}  
};

void * call_func(char * name, void * arg) {
   return find_entry(name)->func(arg);
}

Suppose that f_printf is sync and f_wait is async, if theses two functions are called by names, i.e. f_printf("hi'), it'd be easy to recognize and transform them (if necessary). But in call_func, we don't know if it's sync or not, so currently I check it in the runtime.

@kripken

This comment has been minimized.

Show comment
Hide comment
@kripken

kripken Jan 1, 2014

Owner

Interesting, thanks. So the runtime used here is the narcissus interpreter? I've been thinking meanwhile about how this could be as close to fast as we could get it, without an interpreter. The best idea I have is the following, let me know what you think and if this overlaps with what you already do:

  1. No function is relooped. In non-relooped form, a function is a switch in a loop, which allows us - just like how we implement setjmp/longjmp - to get to anywhere we want.
  2. The normal function call mechanism is avoided, for all calls. Instead we run in something like a 'stackless' mode, in the sense that there is a 'runtime' which maintains the call stack. See example below.
  3. When we need to do something synchronous, we call a method that preserves the stack, and exit. The runtime can later call back into us.

Small example:

function a() {
  var local1, local2, label;
  if (restoreState) {
    local1 = state[0];
    local2 = state[1];
    label = state[2];
  } else {
    // normal initialization
    label = 1;
  }
  while (1) switch (label) {
    case 1: ..
    case 2:
      // we want to do a synchronous operation here
      // so we need to save our state, and be called later
      state[0] = local1;
      state[1] = local2;
      state[3] = label;
      runtimeExitForSync(); // tell runtime
      return;
    case 3: ..
  }
}

The main runtime loop manages calling functions, other functions just ask it to and then return, so the callstack is always of depth one.

In theory we could do a partial relooping when there is no synchronous stuff, that should preserve full speed for inner loops without function calls. So perhaps speed would not be that bad here.

Thoughts?

Owner

kripken commented Jan 1, 2014

Interesting, thanks. So the runtime used here is the narcissus interpreter? I've been thinking meanwhile about how this could be as close to fast as we could get it, without an interpreter. The best idea I have is the following, let me know what you think and if this overlaps with what you already do:

  1. No function is relooped. In non-relooped form, a function is a switch in a loop, which allows us - just like how we implement setjmp/longjmp - to get to anywhere we want.
  2. The normal function call mechanism is avoided, for all calls. Instead we run in something like a 'stackless' mode, in the sense that there is a 'runtime' which maintains the call stack. See example below.
  3. When we need to do something synchronous, we call a method that preserves the stack, and exit. The runtime can later call back into us.

Small example:

function a() {
  var local1, local2, label;
  if (restoreState) {
    local1 = state[0];
    local2 = state[1];
    label = state[2];
  } else {
    // normal initialization
    label = 1;
  }
  while (1) switch (label) {
    case 1: ..
    case 2:
      // we want to do a synchronous operation here
      // so we need to save our state, and be called later
      state[0] = local1;
      state[1] = local2;
      state[3] = label;
      runtimeExitForSync(); // tell runtime
      return;
    case 3: ..
  }
}

The main runtime loop manages calling functions, other functions just ask it to and then return, so the callstack is always of depth one.

In theory we could do a partial relooping when there is no synchronous stuff, that should preserve full speed for inner loops without function calls. So perhaps speed would not be that bad here.

Thoughts?

@coolwanglu

This comment has been minimized.

Show comment
Hide comment
@coolwanglu

coolwanglu Jan 1, 2014

Contributor

No I did not use interpreters, which would be too slow. A function is found to be async if the actual #argument is one more than it should be, e.g.

   void (*maybe_async_func) (void * );
   // instead of maybe_async_func(arg)
   async_call_safe1(maybe_async_func, arg);

and in the runtime

   function _async_call_safe1 (async_callback, fp, arg) {
        fp = FUNCTION_TABLE[fp]; // won't work with ASM.js
        if(fp.length == 1) { // sync
            // call the callback manually
            // this is how streamline.js passes exception and return value
            async_callback(null, fp(arg);); 
        } else { // a new callback is prepend during sync/async transformation
            fp(async_callback, arg);
        }
   }

The check of fp.length might not be reliable due to a bug of closure compiler, which has been fixed in a recent release. I'd thought of creating a PR for that, but the new compiler requires java 7. Anyway seems that that error rarely occurs.

Item 2&3 you mentioned are already taken care of by streamline.js, it maintains both the sync and async stack, and it evens supports exception handling by registering handlers in the frames. See the __func function in its runtime. In fact if we don't care about exception or call stack, we can remove then for better performance.

Before making an async call, a function registers a callback and exits, the callback is defined during the sync/async transformation. Callback functions are chained through closure but we cannot access it, so we need a separate data structure storing it
Now I want to implement the while-switch pattern, which may be simpler than the current implementation of streamline.js. Instead of explicitly restoring the local variable, seems closure is more intuitive:

function test(_) {
  var label = 0;
  /* variable declarations */
 return (function __(err, return_value){ 
     while(1)switch(label){
        case ...:
        case some_label_with_async_call:
            label = next_label;
            return async_func(__); // use inner helper function as callback, local variables are preserved
        case ...
        case final_case:
            return _(null, return_value);
     }  
 })(0,0);
}
Contributor

coolwanglu commented Jan 1, 2014

No I did not use interpreters, which would be too slow. A function is found to be async if the actual #argument is one more than it should be, e.g.

   void (*maybe_async_func) (void * );
   // instead of maybe_async_func(arg)
   async_call_safe1(maybe_async_func, arg);

and in the runtime

   function _async_call_safe1 (async_callback, fp, arg) {
        fp = FUNCTION_TABLE[fp]; // won't work with ASM.js
        if(fp.length == 1) { // sync
            // call the callback manually
            // this is how streamline.js passes exception and return value
            async_callback(null, fp(arg);); 
        } else { // a new callback is prepend during sync/async transformation
            fp(async_callback, arg);
        }
   }

The check of fp.length might not be reliable due to a bug of closure compiler, which has been fixed in a recent release. I'd thought of creating a PR for that, but the new compiler requires java 7. Anyway seems that that error rarely occurs.

Item 2&3 you mentioned are already taken care of by streamline.js, it maintains both the sync and async stack, and it evens supports exception handling by registering handlers in the frames. See the __func function in its runtime. In fact if we don't care about exception or call stack, we can remove then for better performance.

Before making an async call, a function registers a callback and exits, the callback is defined during the sync/async transformation. Callback functions are chained through closure but we cannot access it, so we need a separate data structure storing it
Now I want to implement the while-switch pattern, which may be simpler than the current implementation of streamline.js. Instead of explicitly restoring the local variable, seems closure is more intuitive:

function test(_) {
  var label = 0;
  /* variable declarations */
 return (function __(err, return_value){ 
     while(1)switch(label){
        case ...:
        case some_label_with_async_call:
            label = next_label;
            return async_func(__); // use inner helper function as callback, local variables are preserved
        case ...
        case final_case:
            return _(null, return_value);
     }  
 })(0,0);
}
@kripken

This comment has been minimized.

Show comment
Hide comment
@kripken

kripken Jan 1, 2014

Owner

Ok, nice, that sounds good. I thought something worse before but I misunderstood.

Anything I can do to help here?

Owner

kripken commented Jan 1, 2014

Ok, nice, that sounds good. I thought something worse before but I misunderstood.

Anything I can do to help here?

@coolwanglu

This comment has been minimized.

Show comment
Hide comment
@coolwanglu

coolwanglu Jan 1, 2014

Contributor

There are a few issues I'd like to discuss with you before I actually start:

function pointers w/ asm.js

Deref function pointers with ASM.js is not feasible, I wonder how it can be solved.

utilizing js code pattern

Currently streamline.js can (of course) not understand the label-while-switch structure, so it cannot be reused. I hope this feature could be finally integrated into emscripten as a module, and I'm trying to generate more compact code.
In my mind there are two ways of achieving this:

  • Design convention of the format of js generated by emscripten, for example, only while(1)switch(label) is used, where label is free to modify. And a highly tailored streamline.js can safely reuse the existing labels
  • Do the same thing on AST instead of js

The first one might be more cleaner (?) but the second one is more reliable and faster. However I'm not familiar with the code of emscripten to translate llvm code to js, I just saw uglify is used in js-optimizer.js. Could you please give a few pointers?

I wonder which one do you prefer?

async function recognition

Currently this is achieved by generating the complete call graph of the whole program, and all the functions that might eventually call an (predefined) async function will be collected (and transformed).
As far as I can imagine, the ideal way of integration into emscripten is to provide functions like sleep_async, or other mechanisms. Users may choose to call whichever of sleep or sleep_async.
However I still do not have a good solution with function pointers, the safest way is to do runtime check on every function pointer call, (if fp.length==xxx is not reliable due to optimization, maybe we can set meta data as fp.async = true), which could be too slow(?)

other issues

More optimization can be performed without exception and call stack support. If exceptions are not handled, ExitStatus can no longer be captured, so we will need a new event/callback in Module

Contributor

coolwanglu commented Jan 1, 2014

There are a few issues I'd like to discuss with you before I actually start:

function pointers w/ asm.js

Deref function pointers with ASM.js is not feasible, I wonder how it can be solved.

utilizing js code pattern

Currently streamline.js can (of course) not understand the label-while-switch structure, so it cannot be reused. I hope this feature could be finally integrated into emscripten as a module, and I'm trying to generate more compact code.
In my mind there are two ways of achieving this:

  • Design convention of the format of js generated by emscripten, for example, only while(1)switch(label) is used, where label is free to modify. And a highly tailored streamline.js can safely reuse the existing labels
  • Do the same thing on AST instead of js

The first one might be more cleaner (?) but the second one is more reliable and faster. However I'm not familiar with the code of emscripten to translate llvm code to js, I just saw uglify is used in js-optimizer.js. Could you please give a few pointers?

I wonder which one do you prefer?

async function recognition

Currently this is achieved by generating the complete call graph of the whole program, and all the functions that might eventually call an (predefined) async function will be collected (and transformed).
As far as I can imagine, the ideal way of integration into emscripten is to provide functions like sleep_async, or other mechanisms. Users may choose to call whichever of sleep or sleep_async.
However I still do not have a good solution with function pointers, the safest way is to do runtime check on every function pointer call, (if fp.length==xxx is not reliable due to optimization, maybe we can set meta data as fp.async = true), which could be too slow(?)

other issues

More optimization can be performed without exception and call stack support. If exceptions are not handled, ExitStatus can no longer be captured, so we will need a new event/callback in Module

@kripken

This comment has been minimized.

Show comment
Hide comment
@kripken

kripken Jul 30, 2014

Owner

Your async stuff landed.

Owner

kripken commented Jul 30, 2014

Your async stuff landed.

@kripken kripken closed this Jul 30, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment