Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

makes FS.read async, so you can read asynchronously from stdin #3714

Closed
wants to merge 6 commits into from

Conversation

albertjan
Copy link

This will make things like #23 (an issue from the way back when machine) a lot easier. It will also make the stdin a better browser citizen.

This implementation supports both sync and async input from the stdin. As you can see in the test. The buffer is filled with a promise that also returns the first byte and the rest of the buffer is returned to the read function immediately.

I have some failing tests but as far as I can see most of them are because I didn't have java installed.

testoutput

@albertjan albertjan changed the title Async read so you can asynchronously makes FS.read async, so you can read asynchronously from stdin Aug 26, 2015
@kripken
Copy link
Member

kripken commented Aug 27, 2015

What is on the stack when read() is called? (if it's anything nontrivial, that might break, as we need to reconstruct the call stack, and can only do it for compiled code)

I see this depends on Promise - I'm worried about browser support, at least IE11 I believe doesn't have it.

@albertjan
Copy link
Author

What is on the stack when read is called depends on what you compile right? We've been using it in pypy which is pretty big. I don't know enough about C and emscripten to answer this question.

I guess you're right making it use EmterpreterAsync will result in it only being available when called from code compiled with Emterpreter. If you would want to use in the javascript world you'd have to compile a wrapper around it.

You're right Promise support is not available in IE11, although easily polyfillable. But I could change it to use callbacks, That will make it a few lines longer though.

Is this something you would consider? The pull request I mean 😄

@kripken
Copy link
Member

kripken commented Sep 3, 2015

My concern is the code in between the C code and the read() call. Can you print out the call stack (stackTrace()) and paste that here?

If there is any JS code in the middle, we need to be sure it is safe. In general, we only call EmterpreterAsync.handle from a direct JS library call, i.e., the target of a C call to a library. But here there is stuff in the middle.

It might be safer to put the async stuff at the syscall level, look in src/library_syscall.js for other EmterpreterAsync.handle, we have one on fsync. Perhaps one on read would work here.

I would definitely consider this pull, assuming the safety issue is resolved. I think we should avoid Promise, much as it would be nice to support it.

@albertjan
Copy link
Author

This is the stack:

so only _read is between it.

Error
    at jsStackTrace (blob:http%3A//52.6.7.231%3A3030/57073b5a-2c96-4206-b55b-ade228782e79:1161:13)
    at stackTrace (blob:http%3A//52.6.7.231%3A3030/57073b5a-2c96-4206-b55b-ade228782e79:1178:22)
    at Object.eval (eval at evaluate (unknown source), <anonymous>:1:1)
    at Object.InjectedScript._evaluateOn (<anonymous>:904:55)
    at Object.InjectedScript._evaluateAndWrap (<anonymous>:837:34)
    at Object.InjectedScript.evaluateOnCallFrame (<anonymous>:963:21)
    at Object.FS.createDevice.FS.registerDevice.read (blob:http%3A//52.6.7.231%3A3030/57073b5a-2c96-4206-b55b-ade228782e79:4847:6)
    at Object.FS.read (blob:http%3A//52.6.7.231%3A3030/57073b5a-2c96-4206-b55b-ade228782e79:4492:43)
    at _read (blob:http%3A//52.6.7.231%3A3030/57073b5a-2c96-4206-b55b-ade228782e79:6430:19)
    at emterpret (blob:http%3A//52.6.7.231%3A3030/57073b5a-2c96-4206-b55b-ade228782e79:214113:11)"

@albertjan
Copy link
Author

I will look into the syscall

@kripken
Copy link
Member

kripken commented Sep 4, 2015

Ok. If it's just read, maybe it's not too bad. But any future refactoring might make it worse. Doing it at the syscall level should be future-proof.

@albertjan
Copy link
Author

So I figured out what doing it in syscall means 😄 I think. I will update accordingly. I'm going to make the stream in __syscall3 capable of handling callbacks and promises. that would make FS.read "synchronous" again.

@albertjan
Copy link
Author

I should think before I type. 😄 I will also have to change the way FS.read works because thats where I have access to the input method. I should make sure the EmterptretifyAsync stuff happens in __syscall3.

var stream = SYSCALLS.getStreamFromFD(), buf = SYSCALLS.get(), count = SYSCALLS.get();
return EmterpreterAsync.handle(function (resume) {
var result = FS.read(stream, {{{ makeGetSlabs('buf', 'i8', true) }}}, buf, count, null, function (bytesRead) {
resume(function() { bytesRead });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs return

@albertjan
Copy link
Author

Ah yeah hi I've been at it a bit more. And I've found I also need to implement syscall 145 and there is also pread in there somewhere 😄. There's also something else broken still because allthough the test pases when I run it seperatelly it breaks. I'm away for the weekend and will be back at it Monday. I should probably not have pushed this yet. I'll let you know when I thinks it's done.

@albertjan albertjan force-pushed the async-read branch 5 times, most recently from c79552b to 7ddc2a2 Compare September 17, 2015 20:14
@albertjan
Copy link
Author

@kripken I've rebased it and added the EmterpreterAsync calls to library_syscall. I've added a test for readv it uses gets which is considered bad practice I think but I don't know if it really matters for the tests.

@@ -808,10 +861,21 @@ var SyscallsLibrary = {
}
return nonzero;
},
#if EMTERPRETIFY_ASYNC
__syscall180: function(which, varargs) { // pread64
var stream = SYSCALLS.getStreamFromFD(), buf = SYSCALLS.get(), count = SYSCALLS.get();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happened to the zero at the end, seen in the non-async version below?

@kripken
Copy link
Member

kripken commented Sep 21, 2015

Overall this looks functional, but the code duplication is a concern, as mentioned above. But also, in general this is more invasive than I had expected. Perhaps we need to do some refactoring to make things like this easier to do? Feels like we are doing something wrong here.

@albertjan
Copy link
Author

I've added the count and offset back in. I've also removed the obvious duplications. I'll have to look into making it easier, but because we want to enter the async world in library_syscall.js I needed to add callbacks at a lot of places. The shortest path to makt this easier would be to make use of Promises 😄

@@ -810,7 +857,15 @@ var SyscallsLibrary = {
},
__syscall180: function(which, varargs) { // pread64
var stream = SYSCALLS.getStreamFromFD(), buf = SYSCALLS.get(), count = SYSCALLS.get(), zero = SYSCALLS.getZero(), offset = SYSCALLS.get64();
#if EMTERPRETIFY_ASYNC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps one way to reduce code duplication in places like this is if we always passed the extra parameter to FS.read, but it was just ignored when not async. And, if we automatically added the code to add the handle, for a list of async-aware syscalls, by modifying the text. We already do similar things for SYSCALL_DEBUG at the bottom of this file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I'll try it out see how that would look/work.

@juj juj added the filesystem label Sep 24, 2015
@kripken
Copy link
Member

kripken commented Oct 7, 2015

Another thought on this: Perhaps we could do sync stdin reading through a new API, say emscripten_sync_stdin()? Not doing it via stdio's intefaces and the filesystem code, just directly reading from Module.stdin, might be simple but still useful.

@albertjan
Copy link
Author

Ok that would make things a lot less intricate/duplicated. 😄 you'd want to make the async version the default?

It might be a good idea to add a few more api's to the sync one that might make it easier for people to port their existing sync code to the new api. I however don't know enough about c/c++ to make any assumptions about which those would be 😄

With permission from my employer/client I will revisit this PR soon, but it won't be this or next week.

@kripken
Copy link
Member

kripken commented Oct 8, 2015

I was thinking that there would be just a async version for this, unless there is value in a regular one too? I'm fine with other new APIs that are useful, of course.

@albertjan
Copy link
Author

Yeah sounds good to me. :)

@albertjan
Copy link
Author

I think I've just run into a little problem 😄 this also makes all the other reads async. And for my case pypyjs forces me to add a lot of functions to the EMTERPRETIFY whitelist. I think we may need to rethink how to go about this.

@kripken
Copy link
Member

kripken commented Dec 7, 2018

This PR is quite old, closing. Feel free to reopen if still relevant.

@kripken kripken closed this Dec 7, 2018
@zone117x
Copy link

@kripken Any suggestions for how to accomplish this today? (Feeding data into stdin via async js)
I'm using the Rust emscripten toolchain.

@kripken
Copy link
Member

kripken commented Jul 21, 2019

@zone117x In general it should be easy to call async APIs in JS now using Asyncify, see

https://emscripten.org/docs/porting/asyncify.html#making-async-web-apis-behave-as-if-they-were-synchronous

Also see #9041 which is adding an example of returning a value.

It is more work to make C stdin be async, since fread etc. are synchronous, which is what I think added a lot of complexity to this PR. If the C reads are replaced with calls to custom JS functions, then they can be async as in those Asyncify examples.

If it would be helpful, we can add support for Module.stdin returning a promise, and making that work with a new emscripten_read_stdin would be easy, while it would error when called using normal fread.

@curiousdannii
Copy link
Contributor

curiousdannii commented Aug 2, 2020

@kripken Just checking what you mean there: it would be fairly easy to add a new C func (emscripten_read_stdin) but lots of work to make the standard fread and the other standard read functions async?

I have code that makes a lot of calls to getc(stdin). Would it be possible to make device that would let these getc calls become async?

Or I've seen that you can provide an stdin function with FS.init, but if you have a lot of data to stream in won't that be horribly inefficient, sending just one character at a time? I'd expect a system where you could give a buffer. But I'm very new to all of this site of things, so I don't have much idea what I'm doing sorry!

Edit: I've tried using FS.init to set a stdin handler, passing in values one at a time, but if you run out and send a null then Emscripten will give getc an EOF.

I've found someone whose done something similar, but it involves hacking the Emscripten generated file quite a lot to add generators/yields: http://mansoft.nl/emscripten/terminal.html


I've found a solution that seems to work for my usecase:

  1. Use the linker's wrap option to wrap getc, redirecting to a custom function if the file is stdin, or calling the real getc otherwise.
  2. Make that custom function be a ASYNCIFY async function. Pause if there isn't any waiting data, otherwise return the next character.

It's not pretty, but it's working for now :).

@kripken
Copy link
Member

kripken commented Aug 3, 2020

@curiousdannii

Just checking what you mean there: it would be fairly easy to add a new C func (emscripten_read_stdin) but lots of work to make the standard fread and the other standard read functions async?

Yes, exactly.

What is "the linker's wrap option"?

@curiousdannii
Copy link
Contributor

curiousdannii commented Aug 3, 2020

Passing in -Wl,--wrap=getc to the compiler, as described here:
https://stackoverflow.com/a/46446749/2854284

This seems to be working quite reliably. But it would only be an option when you're porting something that's quite controlled in how it reads from stdin. And some of the other functions might be harder to wrap than getc. In my case I only needed to wrap getc and ungetc.

I'm also planning to make it more efficient by adding a custom buffer so that it doesn't have to cross the WASM/JS interface each and every call to getc.

Edit: I finished adding the buffer. In case this helps anyone else, here's the code: https://github.com/curiousdannii/emglken/blob/master/src/getc.c

@kripken
Copy link
Member

kripken commented Aug 4, 2020

Interesting, thanks @curiousdannii , I wasn't aware of that linker option!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants