Skip to content

ASYNCFS: Allow async JS filesystem implementations#9151

Closed
kripken wants to merge 15 commits intomainfrom
asyncfs
Closed

ASYNCFS: Allow async JS filesystem implementations#9151
kripken wants to merge 15 commits intomainfrom
asyncfs

Conversation

@kripken
Copy link
Member

@kripken kripken commented Aug 4, 2019

We've allowed customizing the JS for the filesystem (and we have MEMFS, IDBFS, etc.), and all of those are written synchronously, the way code normally runs. Asyncify can make compiled code pause and resume, but the system JS code for our default filesystem isn't written to be callback-based. So using Asyncify to wait on the result of a syscall (that internally does some async operations) has not been possible.

This PR provides an option for that, ASYNCFS, which replaces the normal syscall interface with one that integrates with Asyncify. That lets the JS side do any async operation you want before returning to the C code which appears to wait synchronously for the result.

To use this, you need to implement in JS hooks for the various syscalls, from scratch (this can't build on the existing filesystem code because that is synchronous). Each hook gets a callback for resuming execution, so this is fully async and compatible with Asyncify. The test provides an example.

Also:

  • Refactor the sync syscalls into one file, and another for the async, leaving shared code in the original syscalls JS library file.
  • Fix a minor bug with assertions for asyncify imports not handling wildcards, which led to false positives in this test.

@jgautier
Copy link

This looks like a super interesting PR. I have recently been thinking about how to create a vfs for sql.js which would use Asyncify, Blobs, and IndexedDB to be able to persist data reliably and store data larger then what can fit into memory without having to jump through the hoops of the existing synchronous file system implementations. Looks like this could be a good solution to the problem with a much larger impact. I see there has not been much activity on this PR in the last few weeks so if your looking for help and willing to lay out what you think needs to done to get this merge I'd love to help.

@kripken
Copy link
Member Author

kripken commented Sep 4, 2019

Thanks @jgautier!

How does the API look to you? It would be good to know if this is in the right direction before moving forward with it. Also it would be nice if there was some more serious example using it, but not sure we need to block on that.

@jgautier
Copy link

jgautier commented Sep 5, 2019

I think the API is looking fine. One change could be to use promises instead of a wakeUp callback but I think just depends if emscripten is moving towards promises rather then callbacks (I noticed async ccall is now using Promises). It seems pretty low level so I don't know if people would be able to create new file systems without a fair amount of knowledge of emscripten, but I am guessing most people would be using one of these async file systems rather than implementing one anyways.

I started work on a nodejs async file system (very much a work in progress) https://github.com/jgautier/emscripten/blob/asyncfs/src/library_nodefs_async.js . It seemed like a pretty good place to start. Is this something that you would imagine in emscripten core? Once I got this one working I was planning on creating one for the web using indexeddb.

Also I noticed these following syscalls were not implemented, was this for a reason or you had just not gotten around to it yet?

warning: undefined symbol: __syscall15
warning: undefined symbol: __syscall196
warning: undefined symbol: __syscall20
warning: undefined symbol: __syscall201
warning: undefined symbol: __syscall207
warning: undefined symbol: __syscall212
warning: undefined symbol: __syscall39
warning: undefined symbol: __syscall40
warning: undefined symbol: __syscall85
warning: undefined symbol: __syscall94

@kripken
Copy link
Member Author

kripken commented Sep 10, 2019

Promises are an interesting idea. Would we lose any backwards compatibility if we change the API that way? I guess we can always use a Promise polyfill for older VMs. If that sounds good I can rework this PR towards that.

A node async filesystem sounds great! Both useful and could be a good reference example. Definitely makes sense to have in upstream.

Those syscalls are just things I didn't get around to - but not sure if something in them may be tricky or not. In general I was hoping we need very few syscalls for this - open/close/read/write/seek basically. Do you expect many more to be used? Are those syscalls you expect to use?

@jgautier
Copy link

I don't think using promises would be a problem compatibility wise. caniuse has promises at 94% and the oldest version of supported node v8.0 supports them. And we could always include a polyfill. For the current node async file system I am building I don't think promises help in any way because all of the file system APIs are still using callbacks, so I think its fine to stick with callbacks for now. It can always be changed later or we could support both callbacks and promises.

I ran into a bit of a blocker in implementing the llseek function that I was hoping you could help me sort out. llseek repositions the read/write offset of a file and currently there is no node.js file system api that does this. Maybe we could call the read function and then throw the data away?
https://nodejs.org/dist/latest-v6.x/docs/api/fs.html#fs_fs_read_fd_buffer_offset_length_position_callback

The other file systems when doing llseek get passed a stream object that has a position that can be set as seen here: https://github.com/emscripten-core/emscripten/blob/incoming/src/library_fs.js#L1114 and https://github.com/emscripten-core/emscripten/blob/incoming/src/library_nodefs.js#L265 but for the AsyncFSImpl it does not have a stream parameter https://github.com/emscripten-core/emscripten/pull/9151/files#diff-6fbef50c0bc777ac7ab2342da03ba23fR98. I don't think I am familiar enough with emscripten internals to clearly see how the AsyncFSImpl can or should integrate with the rest of the emscripten file systems.

In regards to the missing syscalls, the only reason why I mentioned them was because when I was compiling things on this branch emcc warned that those were not defined. I haven't looked into it yet but seems like something that should be addressed before this is merged.

@kripken
Copy link
Member Author

kripken commented Sep 24, 2019

For seek in node.js, it looks like read has a position parameter. So we'd need to remember the seeked location and pass that. That implies there's another layer beneath these simple APIs and the underlying implementation.

I'm not sure if we can write a general such layer that would help everything. Leaving it to each implementation may be the most general thing.

@sbc100 sbc100 closed this Jan 30, 2020
@kripken kripken reopened this Jan 31, 2020
@kripken kripken changed the base branch from incoming to master January 31, 2020 00:37
@RReverser
Copy link
Collaborator

Promises are an interesting idea. Would we lose any backwards compatibility if we change the API that way? I guess we can always use a Promise polyfill for older VMs. If that sounds good I can rework this PR towards that.

I think this PR could be nicely updated on top of #11429, which added Asyncify.handleAsync specifically for Promises and async functions.

@thomasballinger
Copy link
Contributor

I'm interested in this in order to implement a filesystem that uses browser File System Access API. I haven't tried it yet, I'll give it a look! I think a file system access API FS could be mounted at specific paths for text/image editing, databases or game saves.

@curiousdannii
Copy link
Contributor

I'd like to use something like this in the future, however I'd be concerned that it may be too low level. I'd hope that a layer could also be provider that would bring the API up to the adaptability level of the current FS design, if not even higher.

@kripken
Copy link
Member Author

kripken commented Oct 19, 2020

Thanks for the feedback @thomasballinger @curiousdannii !

Yes, whether this is too low-level is a good question. May be worth experimenting with it more before we settle on an API.

Base automatically changed from master to main March 8, 2021 23:49
@stale
Copy link

stale bot commented Apr 16, 2022

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant.

@stale stale bot added the wontfix label Apr 16, 2022
@chkuendig
Copy link
Contributor

chkuendig commented Apr 16, 2022

Is this something which is still being worked on or something which will fall into the work done on WasmFS?

Being able to use Asyncify to implement an async FS would be super cool. My goal is to render a loading status / progress bar with CSS/JS on top of a app which was build for synchronous reading from harddisk but now is freezing the main loop/all rendering when fetching data from the network via a synchronous XHR call without any way to tell the user what’s happening.

@stale stale bot removed the wontfix label Apr 16, 2022
@kripken
Copy link
Member Author

kripken commented Apr 18, 2022

@chkuendig This should fall under WasmFS, when that is stable. Here is the most relevant work so far:

#16229

Closing as this PR is no longer needed.

@kripken kripken closed this Apr 18, 2022
@kripken kripken deleted the asyncfs branch April 18, 2022 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants