Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test emterpreter sync support #3129

Closed
kripken opened this issue Jan 14, 2015 · 45 comments
Closed

Test emterpreter sync support #3129

kripken opened this issue Jan 14, 2015 · 45 comments

Comments

@kripken
Copy link
Member

kripken commented Jan 14, 2015

incoming branch now has support for async/sync code in the emterpreter. That is, when running in the emterpreter, you can write sync code like most programs do, with sleep() and such, and it can still work in a browser. The interpreter loop saves it's state, does a setTimeout, then rebuilds and resumes the state. For more details see https://github.com/kripken/emscripten/wiki/Emterpreter#synchronous-code

This might be useful to emulators, ccing some people: @dreamlayers, @caiiiycuk . So far I have some basic tests in the test suite that pass, and I tried SDL-Doom, which worked (even effects like wipe work, which are synchronous, and there was no need to use emscripten_set_main_loop etc.). But this probably needs a lot more testing.

@kripken
Copy link
Member Author

kripken commented Jan 15, 2015

Tested on BananaBread as well, seems to work ok.

@kripken
Copy link
Member Author

kripken commented Jan 15, 2015

Closer testing on boon shows a rendering glitch in the emterpreter, the gunshots hitting walls look wrong. Unrelated to async, though. Perhaps I should fuzz the emterpreter.

@dreamlayers
Copy link
Contributor

DOSBox seems to work correctly with emterpreter, but is unusably slow, even with an old game (Duke Nukem 1). With some of my modifications disabled and sync enabled, I was able to interactively execute several commands at the DOS prompt, though inputting characters was very slow. Interactive use of the DOS prompt was impossible before because the way that code is structured needed sync.

I expect the blacklist could be used to improve performance while taking advantage of sync for some specific cases that were preventing significant numbers of games from running. Making all cases work wouldn't be good. The CPU interpreter should be sped up by not using emterpreter, but it can call itself recursively. This would involve listing a huge number of functions though. Can I use a file for the blacklist or use a whitelist instead?

@kripken
Copy link
Member Author

kripken commented Jan 15, 2015

Yeah, I would expect a CPU emulator to be too slow without a blacklist. Good to hear though that the async stuff seems to work (slowly).

Recursion by itself isn't a problem, normal asm.js stack frames can mix with emterpreted ones without problems - except for when doing sleep and other sync stuff. Is it possible for the core CPU interpreter to be on the stack when doing a sleep? In that case, a blacklist won't help, I'm afraid. Otherwise, if it's just called recursively, then blacklisting the one CPU interpreter function should give you most of the speed.

You can already use a file for the blacklist, I added a test now to be sure.

@dreamlayers
Copy link
Contributor

If I use EMTERPRETIFY_BLACKLIST=@file, it seems the contents of that file are still passed on the command line somewhere, so:

Traceback (most recent call last):
  File "/home/bgjenero/gitapps/emsdk_portable/emscripten/master/emcc", line 1296, in <module>
    final = shared.Building.emscripten(final, append_ext=False, extra_args=extra_args)
  File "/home/bgjenero/gitapps/emsdk_portable/emscripten/master/tools/shared.py", line 1502, in emscripten
    jsrun.timeout_run(Popen(cmdline, stdout=PIPE), None, 'Compiling')
  File "/usr/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1335, in _execute_child
    raise child_exception
OSError: [Errno 7] Argument list too long

This is with latest incoming. I didn't even know Linux had a limit. This is with 4587 functions.

@kripken
Copy link
Member Author

kripken commented Jan 15, 2015

Ah, right, we have code to expand those out early, and we do send all SETTINGS stuff to all parts of the compiler. Fixed on incoming.

@kripken
Copy link
Member Author

kripken commented Jan 16, 2015

(Regarding the boon bug I mentioned before, it wasn't emterpreter-sync or even emterpreter-related, but it only triggered in the emterpreter due to different optimizations being run. Anyhow, fixed on incoming, boon looks correct now.)

@dreamlayers
Copy link
Contributor

Thanks for your quick fixes, you're amazing!

I've managed to use a huge blacklist, which runs correctly but slowly as long as the CPU interpreter is not in the blacklist. With the CPU interpreter in the blacklist, it fails. I don't see how emscripten_sleep() could be called from the CPU interpreter with the way I'm trying to use it in my tests. I'm not sure how to examine what's going on and trial and error is getting frustrating so I'll give up on this for now.

@kripken
Copy link
Member Author

kripken commented Jan 16, 2015

Yeah, if the CPU interpreter is on the stack when a sleep is called, then it should not be in the blacklist. So this limits speed here significantly.

I wonder if you can tell before calling the CPU interpreter whether it will sleep? If so there could be 2 versions, one blacklisted and one not. But maybe this is all too complicated. Anyhow, thanks for trying this out.

@dreamlayers
Copy link
Contributor

Is a problem only supposed to occur if there is a blacklisted function on the stack when sleep occurs? (ie. if you have b() { c() } and c() { emscripten_sleep(1) }, and you call b(), which is blacklisted, then you have a problem)

Is it perfectly okay to call a blacklisted function before or after a sleep as long as the function returns before the next sleep? (ie. Suppose b() is blacklisted and via non-blacklisted functions you call c() { emscripten_sleep(1); b(); emscripten_sleep(1); b(); }. That should be okay.)

If my understanding so far is correct, I should be able to take advantage of sync to increase the amount of stuff that works in DOSBox. The paths which might call sleep via the CPU interpreter are CPU exceptions, and those should generally be handled quickly, so sleep can be skipped there.

I have a problem. In the same .js file and the same function, sleep works if the function is called via one call stack and doesn't return if the function is called via another call stack. This happens even without blacklisting. The call stack consists of a multiple calls to the emterpreter function and a few small functions which push arguments onto the stack and then call the emterpreter. If I comment out the sleep, the script is non-responsive, but the output I get when the non-responsive script dialog occurs shows that it works properly otherwise.

Edit: By moving the emscripten_sleep() back in the call chain I found a more precise location where something goes wrong. It works fine here, before the call to BatchFile::ReadLine(). However, it fails here at the start of BatchFile::ReadLine(). I now have sleep at both of these locations. First the former starts and finishes. Then the latter starts. It doesn't return, but instead the former starts again and there is an assertion error.

Edit: The problem seems to happen when calling a virtual function via a pointer:

asynctest.h:

class BatchFile {
public:
        virtual bool ReadLine(char * line);
};

extern BatchFile *bf;

asynctest1.cpp

#include <stdio.h>
#include <emscripten.h>
#include "asynctest.h"

int main(void) {
    int cnt = 0;
    while (1) {
        printf("%i\n", ++cnt);
        bf->ReadLine(0);
    }
}

asynctest2.cpp

#include <stdio.h>
#include <emscripten.h>
#include "asynctest.h"

bool BatchFile::ReadLine(char * line) {
    emscripten_log(EM_LOG_C_STACK, "Before sleep BFRL");
    printf("Sleep-->\n");
    emscripten_sleep(1000);
    printf("<--Sleep\n");
    return true;
}

BatchFile b; 

BatchFile *bf = &b;

Build with em++ -O3 -s EMTERPRETIFY=1 -s EMTERPRETIFY_ASYNC=1 -o asynctest.html asynctest1.cpp asynctest2.cpp

kripken added a commit that referenced this issue Jan 19, 2015
@kripken
Copy link
Member Author

kripken commented Jan 19, 2015

Thanks, the issue was not virtual functions specifically but having functions returning a value, on the stack, when saving it. I had only tested void functions... ;) Fixed on incoming.

@dreamlayers
Copy link
Contributor

I'm pretty sure I tried returning values, and it seemed virtual needed to be there to trigger the problem.

Anyways, with your change, DOSBox works better. I can get to the DOS prompt and watch the blinking cursor, but if I press a key I keep getting continuous fake input. Experimentation shows that if I put a sleep at the start of device_CON::Read(), the C++ this pointer is different after the sleep finishes. It's already different in the after-sleep invocation of the function which pushes the parameters onto the stack and calls the emterpreter, and it looks like another pointer. The other two parameters are not changed. I have not been able to reproduce this in a nice small test case yet.

Edit: device_CON inherits from DOS_Device, which inherits from DOS_File. In the backtrace, I see a call to DOS_Device::Read(), which then calls device_CON::Read(). Initially, both are passed different this pointers. After the sleep, both get the this pointer of DOS_Device::Read().

@dreamlayers
Copy link
Contributor

I compiled the following with em++ -Wall -O3 -profiling -s EMTERPRETIFY=1 -s EMTERPRETIFY_ASYNC=1 -o breaksleep3.html breaksleep3.cpp. It shows how this changes after sleep:

#include <emscripten.h>
#include <stdio.h>

class DOS_Device {
    int devnum;
public:
    DOS_Device() { devnum = 0; }
    virtual bool Read(unsigned char * data,unsigned short * size);
};

DOS_Device *Devices[10];

bool __attribute__((noinline)) DOS_Device::Read(unsigned char * data,unsigned short * size) {
    printf("DOS_Device::Read (this = %i)\n", (int)this);
    return Devices[devnum]->Read(data,size);
}

class device_CON : public DOS_Device {
public:
    bool Read(unsigned char * data,unsigned short * size);
};

bool device_CON::Read(unsigned char * data,unsigned short * size) {
    //emscripten_log(EM_LOG_C_STACK, "Sleep CON:Read");
    printf("device_CON::Read (this = %i) Sleep--> \n", (int)this);
    emscripten_sleep(1000);
    printf("<--Sleep (this = %i)\n", (int)this);
    return true;
}

int main(void) {
    device_CON con;
    Devices[0] = &con;
    DOS_Device dev;
    dev.Read(0,0);
}

@kripken
Copy link
Member Author

kripken commented Jan 19, 2015

I suspect virtual was needed before, because otherwise LLVM inlines and gets rid of the call.

Thanks for the new testcase, will take a look soon.

@kripken
Copy link
Member Author

kripken commented Jan 19, 2015

Fixed, the trampolines didn't notice async state. This is a problem that does only occur with virtual methods, as otherwise when restoring the stack, trampolines are not used (we just do emterpret() => emterpret()), but virtual calls go through FUNCTION_TABLE which can only lead to a trampoline.

@dreamlayers
Copy link
Contributor

Thanks! The DOSBox prompt was now usable, and I could run programs. I still need to work on the blacklist to improve performance and make games playable.

Edit: Here's a build with a blacklist and decent performance. I did not encounter any more sync problems. However, I can't link that exact same thing with emterpreter but without the blacklist.

Stack: Error
    at assertTrue (eval at globalEval (/home/bgjenero/gitapps/emsdk_portable/emscripten/master/tools/js-optimizer.js:111:8), <anonymous>:59:26)
    at getReg (/home/bgjenero/gitapps/emsdk_portable/emscripten/master/tools/js-optimizer.js:6371:13)
    at getReg (/home/bgjenero/gitapps/emsdk_portable/emscripten/master/tools/js-optimizer.js:6272:29)
    at /home/bgjenero/gitapps/emsdk_portable/emscripten/master/tools/js-optimizer.js:7107:19
    at Array.forEach (native)
    at walkStatements (/home/bgjenero/gitapps/emsdk_portable/emscripten/master/tools/js-optimizer.js:7105:13)
    at getReg (/home/bgjenero/gitapps/emsdk_portable/emscripten/master/tools/js-optimizer.js:6530:59)
    at /home/bgjenero/gitapps/emsdk_portable/emscripten/master/tools/js-optimizer.js:7107:19
    at Array.forEach (native)
    at walkStatements (/home/bgjenero/gitapps/emsdk_portable/emscripten/master/tools/js-optimizer.js:7105:13)

undefined:60
    throw msg;
          ^
Assertion failed: undefined
Traceback (most recent call last):
  File "/home/bgjenero/gitapps/emsdk_portable/emscripten/master/tools/emterpretify.py", line 688, in <module>
    shared.Building.js_optimizer(infile, ['emterpretify'], extra_info={ 'emterpretedFuncs': list(emterpreted_funcs), 'externalEmterpretedFuncs': list(external_emterpreted_funcs), 'opcodes': OPCODES, 'ropcodes': ROPCODES, 'ASYNC': ASYNC, 'PROFILING': PROFILING }, output_filename=temp, just_concat=True)
  File "/home/bgjenero/gitapps/emsdk_portable/emscripten/master/tools/shared.py", line 1637, in js_optimizer
    ret = js_optimizer.run(filename, passes, NODE_JS, jcache, debug, extra_info, just_split, just_concat)
  File "/home/bgjenero/gitapps/emsdk_portable/emscripten/master/tools/js_optimizer.py", line 553, in run
    return temp_files.run_and_clean(lambda: run_on_js(filename, passes, js_engine, jcache, source_map, extra_info, just_split, just_concat))
  File "/home/bgjenero/gitapps/emsdk_portable/emscripten/master/tools/tempfiles.py", line 39, in run_and_clean
    return func()
  File "/home/bgjenero/gitapps/emsdk_portable/emscripten/master/tools/js_optimizer.py", line 553, in <lambda>
    return temp_files.run_and_clean(lambda: run_on_js(filename, passes, js_engine, jcache, source_map, extra_info, just_split, just_concat))
  File "/home/bgjenero/gitapps/emsdk_portable/emscripten/master/tools/js_optimizer.py", line 455, in run_on_js
    filenames = pool.map(run_on_chunk, commands, chunksize=1)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get
    raise self._value
AssertionError: Error in optimizer: // return type: [__ZNSt3__115basic_stringbufIcNS_11char_traitsIcEENS_9allocatorIcEEE8overflowEi,0]
function __ZNSt3__115basic_stringbufIcNS_11char_traitsIcEENS_9allocatorIcEEE8overflowEi(i15, i14) {

That is followed by a whole lot of JavaScript. While trying to narrow down what file is causing this problem, I found an object file which triggers an error by itself with -O2, in particular em++ cpu.o -O2 -s EXPORT_ALL=1 -o foo.js. That file works as part of DOSBox though. The problem I quoted above seems to be triggered by using SDL 2, though my simple SDL 2 test program won't trigger it.

Edit: Compilation with EMTERPRETIFY=1 fails if SDL 2 is used and "_SDL_JoystickUpdate" is not in EMTERPRETIFY_BLACKLIST. That is what caused the error output quoted above. It is unrelated to EMTERPRETIFY_ASYNC, and therefore I reported it separately as #3141.

@dreamlayers
Copy link
Contributor

I'm wondering about how to automatically create a blacklist for DOSBox. I can't use a static blacklist because some name mangling changes. My current blacklist is based on printing asm.funcs in tools/emterpretify.py and removing functions from that. I don't know if I can get the same information elsewhere. Compiling to bytecode and using llvm-nm does not give the same information. It would be much easier for me to provide a list of functions which need emterpreter instead of a list of functions that shouldn't use it.

@kripken
Copy link
Member Author

kripken commented Jan 22, 2015

Ok, I added a whitelist option now, which takes precedence over the blacklist.

@dreamlayers
Copy link
Contributor

Thanks! The whitelist was easy to use and I did not find any more problems with DOSBox running with emterpreter sync. The only remaining issue I'm aware of is that the final link leaves files in tmp directories, like:

tmpKTf7zxtmpm4Rnem.jsfunc_12.js.jo.js.jo.js
tmpLvcMHPtmpnvOQx_.jsfunc_12.js.jo.js.jo.js
tmprq3OxKtmpKR3PFf.jsfunc_12.js.jo.js.jo.js.em.js

Edit: I was doing all my testing with SDL 2. When I compiled DOSBox with SDL 1, sound was broken like due to #3122. I will post a new comment if I make significant discoveries showing a problem in Emscripten.

Archive.org cannot use the emterpreter sync builds because they're unable to support the memory init file. I see how emterpretify.py is designed to use that file, but I can't think of any theoretical reason why that can't be changed and a memory init file is unavoidable when using emterpreter. I realize this is not a bug and I feel reluctant to actually ask for this change.

@DopefishJustin
Copy link
Contributor

MESS blows it up:

$ /0/home/dfjustin/jsmess/third_party/emscripten/emcc  -O3 -s DISABLE_EXCEPTION_CATCHING=2 -s NO_EXIT_RUNTIME=1 -s ASSERTIONS=0 -s COMPILER_ASSERTIONS=1 -s FORCE_ALIGNED_MEMORY=1 -s EMTERPRETIFY=1 -s EMTERPRETIFY_ASYNC=1 -s EMTERPRETIFY_WHITELIST=@emterpretify.txt -s TOTAL_MEMORY=134217728 -s EXCEPTION_CATCHING_WHITELIST='["__ZN15running_machine17start_all_devicesEv"]' -s EXPORTED_FUNCTIONS="['_main', '_malloc', '__Z14js_get_machinev', '__Z9js_get_uiv', '__Z12js_get_soundv', '__ZN10ui_manager12set_show_fpsEb', '__ZNK10ui_manager8show_fpsEv', '__ZN13sound_manager4muteEbh', 'SDL_PauseAudio']" -g2 messcoleco.bc -o messcoleco.js --pre-js pre.js --post-js post.js

...

optimizer.exe: /0/home/dfjustin/jsmess/third_party/emscripten/tools/optimizer/parser.h:306: NodeRef cashew::Parser<Ref, ValueBuilder>::parseElement(char *&, const char *) [NodeRef = Ref, Builder = ValueBuilder]: Assertion `0' failed.
optimizer.exe: /0/home/dfjustin/jsmess/third_party/emscripten/tools/optimizer/optimizer.cpp:3882: int main(int, char **): Assertion `f' failed.
Traceback (most recent call last):
  File "/0/home/dfjustin/jsmess/third_party/emscripten/emcc", line 1573, in <module>
    flush_js_optimizer_queue()
  File "/0/home/dfjustin/jsmess/third_party/emscripten/emcc", line 1442, in flush_js_optimizer_queue
    run_passes(chunks[0], title, just_split=False, just_concat=False)
  File "/0/home/dfjustin/jsmess/third_party/emscripten/emcc", line 1416, in run_passes
    final = shared.Building.js_optimizer(final, passes, jcache, debug_level >= 4, js_optimizer_extra_info, just_split=just_split, just_concat=just_concat)
  File "/0/home/dfjustin/jsmess/third_party/emscripten/tools/shared.py", line 1648, in js_optimizer
    ret = js_optimizer.run(filename, passes, NODE_JS, jcache, debug, extra_info, just_split, just_concat)
  File "/0/home/dfjustin/jsmess/third_party/emscripten/tools/js_optimizer.py", line 552, in run
    return temp_files.run_and_clean(lambda: run_on_js(filename, passes, js_engine, jcache, source_map, extra_info, just_split, just_concat))
  File "/0/home/dfjustin/jsmess/third_party/emscripten/tools/tempfiles.py", line 39, in run_and_clean
    return func()
  File "/0/home/dfjustin/jsmess/third_party/emscripten/tools/js_optimizer.py", line 552, in <lambda>
    return temp_files.run_and_clean(lambda: run_on_js(filename, passes, js_engine, jcache, source_map, extra_info, just_split, just_concat))
  File "/0/home/dfjustin/jsmess/third_party/emscripten/tools/js_optimizer.py", line 453, in run_on_js
    filenames = pool.map(run_on_chunk, commands, chunksize=1)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 227, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 528, in get
    raise self._value
AssertionError: Error in optimizer:

Files are at http://interbutt.com/temp/issue3129.zip

@kripken
Copy link
Member Author

kripken commented Feb 1, 2015

Thanks, fixed (might need emcc --clear-cache to force a rebuild of optimizer on next build command).

@DopefishJustin
Copy link
Contributor

Thanks! It worked, and does seem to lower CPU usage for less-demanding emulations that were spending time in usleep() before. However in the more-demanding case there is a substantial speed hit so I will stick to emscripten_set_main_loop for now.

@dreamlayers
Copy link
Contributor

Use of emterpreter sync makes Em-DOSBox SDL 1 audio much worse. The audio callbacks are delivered in bunches and DOSBox cannot handle the jitter, like in #3122 (which is now fixed). This is not a new problem; I just didn't investigate it before because I was using SDL 2.

It seems SDL 1 audio depends on the if (typeof SDL === "object" && SDL.audio && SDL.audio.queueNewAudioData) SDL.audio.queueNewAudioData(); line in Browser_mainLoop_runner(). Commenting out that line reproduces the problem in a build without emterpreter sync.

This is not due to DOSBox calling emscripten_sleep_with_yield() too rarely or sleeping for an overly short time. It seems the SDL_audioCaller() callbacks aren't keeping the buffers topped up despite having opportunity to do that. For now I may reduce SDL_numSimultaneouslyQueuedBuffers in Em-DOSBox.

Edit: Adding a similar line for queuing audio in EmterpreterAsync.handle when starting a sleep with yield fixes this problem. I may be a good idea to improve SDL 1 audio callback scheduling so they are sufficient to keep audio running smoothly.

   if (!yieldDuring) Browser.pauseAsyncCallbacks();
   else if (typeof SDL === "object" && SDL.audio && SDL.audio.queueNewAudioData) SDL.audio.queueNewAudioData();

@kripken
Copy link
Member Author

kripken commented Feb 20, 2015

It sounds like that problem would be fixed if sleep_with_yield called SDL.audio.queueNewAudioData? If so, I can add a general method to add things for yield to call.

edit: ah, I see this is what you said in the edit? do I understand correctly?

@dreamlayers
Copy link
Contributor

Okay, I guess an API is a better idea than just adding that line. SDL 2 or other stuff might need it later.

I wonder if the same API should be used for adding stuff to run after the main loop. In other words, if something is added, it runs after the main loop and it runs when emscripten_sleep_with_yield() is called.

@dreamlayers
Copy link
Contributor

Just to clarify: this is the SDL 1 audio fix mentioned in the edit of my previous comment.

I did not submit it as a pull request now because you talked about adding "a general method to add things for yield to call". I am okay with the solution I linked here, and will submit a pull request if you want. It fixes the sound problem, and a general method could wait until something else needs to run after yield.

@kripken
Copy link
Member Author

kripken commented Feb 20, 2015

Ok, implemented in fd797ce . Please let me know if that doesn't work (we don't have a test for it yet).

@dreamlayers
Copy link
Contributor

Thanks! Your fix works. Em-DOSBox sound is now good when using SDL 1 and emterpreter sync.

I guess an automated test could measure sound jitter. emscripten_get_now() should provide enough precision for that.

@dreamlayers
Copy link
Contributor

It seems that emscripten_sleep() and emscripten_sleep_with_yield() cannot be used together in the same program. This fails to print "Done":

#include <stdio.h>
#include <emscripten.h>

int main(void) {
    printf("No yield:\n");
    emscripten_sleep(500);
    printf("With yield:\n");
    emscripten_sleep_with_yield(500);
    printf("Done\n");
    return 0;
}

It works if both calls are emscripten_sleep() or both are emscripten_sleep_with_yield().

I first ran into a problem trying to use emscripten_wget_data() in Em-DOSBox. Then I saw putting emscripten_sleep(1) in its place caused the same problem.

@dreamlayers
Copy link
Contributor

The problem happens when there is an emscripten_sleep() before emscripten_sleep_with_yield();.
The emscripten_sleep() calls Browser.pauseAsyncCallbacks(), and Browser.resumeAsyncCallbacks() is not called before the emscripten_sleep_with_yield() call. So then the Browser.safeSetTimeout() in the anonymous function in emscripten_sleep_with_yield() results in resume() being added to Browser.queuedAsyncCallbacks instead of being called.

This could be fixed by adding Browser.resumeAsyncCallbacks() to the if (yieldDuring) { block.

@kripken
Copy link
Member Author

kripken commented Mar 9, 2015

Thanks! Fixed and added that test.

@dreamlayers
Copy link
Contributor

Disregard stuff I wrote previously about a line after emscripten_wget_data() being executed twice. That was only happening because WGET::Run() wasn't in EMTERPRETIFY_WHITELIST earlier.

emscripten_sleep_with_yield() and emscripten_wget_data() work properly together now, both in a small test and in Em-DOSBox. Thanks!

@kripken
Copy link
Member Author

kripken commented Mar 9, 2015

Great!

Can you please verify that you get an assertion if you don't have that method in the whitelist, and the line runs twice? When building with ASSERTIONS.

@dreamlayers
Copy link
Contributor

I have re-built with -O3 --profiling -s ASSERTIONS=1, and I'm getting lots of assertion failures constantly. They seem to be coming from code like asyncState ? abort(-12) | 0 : 0; in SDL 2 event handlers like _Emscripten_HandleMouseFocus() and _HandleAudioProcess().

@kripken
Copy link
Member Author

kripken commented Mar 9, 2015

That -12 should be saying that an async operation is happening with non-emterpreted code on the stack. Likely those two methods need to be added to the whitelist.

@dreamlayers
Copy link
Contributor

I cannot imagine why those methods need to be added to the whitelist, because they won't be interrupted by sleep. The assertion is happening because they are being called during emscripten_sleep_with_yield() either directly via yieldCallbacks or via browser events while the main code is sleeping. Here's a backtrace with SDL 1 sound triggering it:

__ZL14MIXER_CallBackPvPhi (dosbox.js:149612)
dynCall_viii (dosbox.js:291715)
Runtime.dynCall (dosbox.js:206)
SDL_queueNewAudioData (dosbox.js:10051)
yieldCallback (dosbox.js:10056)
(anonymous function) (dosbox.js:7274)
EmterpreterAsync.handle (dosbox.js:7273)
_emscripten_sleep_with_yield (dosbox.js:8743)
emterpret (dosbox.js:51686)
__ZL11Normal_Loopv (dosbox.js:292791)
emterpret (dosbox.js:52280)
__Z17DOSBOX_RunMachinev (dosbox.js:293587)
emterpret (dosbox.js:52685)
__Z19CALLBACK_RunRealInth (dosbox.js:291870)
emterpret (dosbox.js:50206)
__ZN10device_CON4ReadEPhPt (dosbox.js:282886)
emterpret (dosbox.js:52642)
__ZN10DOS_Device4ReadEPhPt (dosbox.js:283401)
emterpret (dosbox.js:52642)
__Z12DOS_ReadFiletPhPt (dosbox.js:283616)
emterpret (dosbox.js:49857)
__ZN9DOS_Shell12InputCommandEPc (dosbox.js:287413)
emterpret (dosbox.js:50819)
__ZN9DOS_Shell3RunEv (dosbox.js:291971)
emterpret (dosbox.js:52108)
__Z10SHELL_Initv (dosbox.js:293649)
emterpret (dosbox.js:52706)
__ZN6Config7StartUpEv (dosbox.js:291955)
emterpret (dosbox.js:51260)
resume (dosbox.js:7262)
(anonymous function) (dosbox.js:4820)

Since SDL 1 is mostly written in JS, I only needed to comment out that one asyncState ? abort(-12) | 0 : 0; to get DOSBox to work. Then I was able to verify that __ZN4WGET3RunEv [WGET::Run()] properly threw a -12 exception due to not being in the whitelist.

@dreamlayers
Copy link
Contributor

If a function in EMTERPRETIFY_BLACKLIST does not exist, that results in an error like AssertionError: requested blacklist of asdf but it does not exist, but I can't get errors for non-existent functions in EMTERPRETIFY_WHITELIST. Even compiling with -s COMPILER_ASSERTIONS=1 does not help. I think there should be an error, because otherwise the user may falsely believe they have whitelisted something, and end up wasting time figuring out what's wrong.

It seems there's no code for this in tools/emterpretify.py. I could make a pull request if you want. I'm just wondering how to test for this, if that is needed. Apparently there's no general framework for testing compiler errors, and Python code would have to be added somewhere, perhaps test_other.py. There also seems to be no test for errors due to non-existent functions in EMTERPRETIFY_BLACKLIST.

@kripken
Copy link
Member Author

kripken commented Mar 9, 2015

Regarding the earlier comment: I see. Ok, sounds like we need to be able to mark some methods as ok to run while yielding, a YIELDLIST I guess. This is a bit why I was worried about even having a yield mode - it makes the model complicated - but hopefully this is as complicated as it gets. I'll look into this now.

About the last comment: yeah, pull request to error on missing whitelist functions would be great. Testing would also be good, in test_other.py. We test various compile errors in that file, e.g. by looking at stderr output, and verifying there is no output file.

@kripken
Copy link
Member Author

kripken commented Mar 10, 2015

Ok, the YIELDLIST option has been added, with an sdl1 audio test that adds the relevant methods added to the yield list, so that assertions do not trigger, in bc1426f

@dreamlayers
Copy link
Contributor

I confirm that the SDL 1 sound abort(-12) issue mentioned above is fixed by -s EMTERPRETIFY_YIELDLIST='[ "__ZL14MIXER_CallBackPvPhi" ]'. However, SDL 2 is a bigger problem. Internal functions in SDL 2 itself need to be added to that list. Software (such as Em-DOSBox) which is using SDL 2 should not have to care about that and should only need to list its own functions. Support for SDL 2 could be implemented as a special case, but this is a general issue with libraries.

This also seems to remove all sleep protection from functions in EMTERPRETIFY_YIELDLIST. The list is supposed to mean that those functions can be run while main code is sleeping. Those functions are not supposed to be interrupted by sleep because they are not emterpreted, but code which would cause an abort(-12) in that situation is removed from them.

I was thinking that checks could use the value of asyncState. It seems values are: 0 = normal, 1 = going to sleep or sleeping, and 2 = resuming. There seems to be no way to use these now because 1 would be seen both in a wrongly interrupted function and in a function running during yield. Maybe it could be changed to 0 = normal, 1 = sleeping with yield, 2 = going to sleep or sleeping without yield, 3 = resuming. Then the asyncState > 1 ? abort(-12) | 0 : 0; could be used in the test.

Of course the test could use asyncState or asyncState > 1 depending on whether a function is in EMTERPRETIFY_YIELDLIST. I was thinking use of asyncState like that could avoid need for a yield list.

@kripken
Copy link
Member Author

kripken commented Mar 11, 2015

Good point, that would be an improvement. 1 would need to be "going to sleep or sleeping with yield", since we don't currently have a way to switch between "going to sleep" and "sleeping" (we can't tell when the stack has unwound). Otherwise that sounds great.

Not sure if I'll have time for this soon, feel free to work on it if you want.

@dreamlayers
Copy link
Contributor

It seems that after asm.emterpret(stack[0]); returns back to resume() the stack has unwound and the program is transitioning from "going to sleep" to "sleeping". When the yield callbacks are being called, that also needs to be considered as "sleeping with yield".

However, sleep can also happen in code that wasn't started via resume(). The first sleep is instead via doRun() and callMain(). Sleep could also happen in a main loop iteration or in response to an event. (I've never tried either of these.) In this situation what I described in the previous paragraph wouldn't work.

I wonder if it's possible to break things by calling sleep in code started elsewhere? Experiments led to the following somewhat unrelated issues.

Here main_loop() will run really fast, not once a second. Commenting out the sleep inside main_loop() makes it work properly. Both emscripten_sleep(1) and emscripten_sleep_with_yield(1); produce the same result.

#include <stdio.h>
#include <emscripten.h>

void main_loop(void) {
    emscripten_sleep_with_yield(1);
    printf("Main loop\n");
}

int main(void) {
    emscripten_set_main_loop(main_loop, 1, 1);
}

Here you get: Uncaught SimulateInfiniteLoop.

#include <emscripten.h>

int main(void) {
    emscripten_sleep(1);
    emscripten_exit_with_live_runtime();
}

Edit: Now I see that Module['noExitRuntime'] is set via sleep functions, so there's no need to call emscripten_exit_with_live_runtime();. That seems like unexpected behaviour, but I understand why it happens.

@kripken
Copy link
Member Author

kripken commented Mar 11, 2015

The first testcase was a bug in the mainloop code - resuming a main loop always did requestAnimationFrame, at 60fps. Fixed on incoming. I guess this wasn't noticed without the emterpreter which relies on mainloop resuming more heavily.

@kripken
Copy link
Member Author

kripken commented Mar 11, 2015

The second issue is indeed a little surprising. I'm not sure how to improve it, though.

@kripken
Copy link
Member Author

kripken commented Mar 11, 2015

I got a bug report about input events like an SDL InputHandler. We call those directly from the event handler code, so that a user input => compiled InputHandler => fullscreen or another feature which needs to directly result from a user input.

That should be fixed now on incoming. I did need to refactor a bunch of stuff, so I hope I didn't break anything. Test suite so far looks fine.

edit: InputHandlers need to be in the yield list, to not trigger asserts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants