-
Notifications
You must be signed in to change notification settings - Fork 3.5k
[WasmFS] Initial OPFS/AccessHandles backend #16813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
||
| using ProxyWorker = emscripten::ProxyWorker; | ||
|
|
||
| extern "C" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems odd to put the extern "C" block inside a C++ namespace... maybe move the namespace usage below this block?
| #include "wasmfs.h" | ||
| #include <stdlib.h> | ||
|
|
||
| namespace wasmfs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the C++ stuff in this file designed to be used outside of this TU? Perhaps use anonymous namespace instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched this to an anonymous namespace and added a using namespace wasmfs; to the top of the file to keep name resolution working.
src/library_wasmfs_opfs.js
Outdated
| let id = ids.allocated.length; | ||
| if (ids.free.length > 0) { | ||
| id = ids.free.pop(); | ||
| } | ||
| assert(ids.allocated[id] === undefined); | ||
| ids.allocated[id] = handle; | ||
| return id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| let id = ids.allocated.length; | |
| if (ids.free.length > 0) { | |
| id = ids.free.pop(); | |
| } | |
| assert(ids.allocated[id] === undefined); | |
| ids.allocated[id] = handle; | |
| return id; | |
| var id; | |
| if (ids.free.length > 0) { | |
| id = ids.free.pop(); | |
| ids[id] = handle; | |
| } else { | |
| id = ids.allocated.length; | |
| ids.push(handle); | |
| } | |
| return id; |
Maybe slightly more idiomatic as it avoids writing to one-past-the-end as a way to extend, and instead uses push(). That might be slightly faster but I didn't measure.
src/library_wasmfs_opfs.js
Outdated
| '$wasmfsOPFSDirectories', | ||
| '$wasmfsOPFSFiles'], | ||
| $wasmfsOPFSGetOrCreateFile: async function(parent, name, create) { | ||
| let parent_handle = wasmfsOPFSDirectories.get(parent); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| let parent_handle = wasmfsOPFSDirectories.get(parent); | |
| let parentHandle = wasmfsOPFSDirectories.get(parent); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to camelCase for all parameters and locals.
src/library_wasmfs_opfs.js
Outdated
| }, | ||
|
|
||
| $wasmfsOPFSAllocate: function(ids, handle) { | ||
| let id = ids.allocated.length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've been using var. Perhaps we should switch to let at this point, but I'm not sure if we've decided that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been using let in all of my recent JS, and there is a bunch of other usage as well. When I discussed this recently with @sbc100, we decided that let was ok but for..of was not. I'm not sure if these conventions/decisions are documented anywhere, though.
src/library_wasmfs_opfs.js
Outdated
| if (err.name === "TypeMismatchError") { | ||
| return -2; | ||
| } | ||
| abort("Unknown exception " + err.name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| abort("Unknown exception " + err.name); | |
| throw err; |
src/library_wasmfs_opfs.js
Outdated
| let file_handle; | ||
| try { | ||
| file_handle = await parent_handle.getFileHandle(name, {create: create}); | ||
| } catch (err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } catch (err) { | |
| } catch (e) { |
We appear to pretty consistently use "e" for this purpose in src/*.js.
| get: function(i) { | ||
| return this.allocated[i]; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a more idiomatic and less repetitive way to add accessors like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of a better way atm.
src/library_wasmfs_opfs.js
Outdated
| for await (const [name, child] of dir_handle.entries()) { | ||
| withStackSave(() => { | ||
| let name_p = allocateUTF8OnStack(name); | ||
| // TODO: Figure out how to use `cDefine` here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guidance on how to get cDefine to work would be very welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you add the stuff you need to src/struct_info*.json and then it "just works".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got this working, although it required some work in gen_struct_info.py.
src/library_wasmfs_opfs.js
Outdated
| }, | ||
|
|
||
| $wasmfsOPFSAllocate: function(ids, handle) { | ||
| let id = ids.allocated.length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been using let in all of my recent JS, and there is a bunch of other usage as well. When I discussed this recently with @sbc100, we decided that let was ok but for..of was not. I'm not sure if these conventions/decisions are documented anywhere, though.
src/library_wasmfs_opfs.js
Outdated
| '$wasmfsOPFSDirectories', | ||
| '$wasmfsOPFSFiles'], | ||
| $wasmfsOPFSGetOrCreateFile: async function(parent, name, create) { | ||
| let parent_handle = wasmfsOPFSDirectories.get(parent); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to camelCase for all parameters and locals.
| #include "wasmfs.h" | ||
| #include <stdlib.h> | ||
|
|
||
| namespace wasmfs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched this to an anonymous namespace and added a using namespace wasmfs; to the top of the file to keep name resolution working.
kripken
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
It would also be good to add a persistence test here, like we have for (the old) IDBFS, where we run more than once and see that contents stayed around. The general format of such tests, I think, is to run them the first time (having some ifdef for that), in which we clear any old state if it exists. Then set the state. Then we run it a second time (with an ifdef for that) and we just load data there and verify it.
| get: function(i) { | ||
| return this.allocated[i]; | ||
| } | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These three could all be generated from a single function that creates such instances. A TODO to refactor though might be enough for now. It's possible we have other such stuff that could be refactored with them too.
src/library_wasmfs_opfs.js
Outdated
| '$wasmfsOPFSFiles'], | ||
| $wasmfsOPFSGetOrCreateFile: async function(parent, name, create) { | ||
| let parentHandle = wasmfsOPFSDirectories.get(parent); | ||
| assert(parentHandle !== undefined); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the assertion could be in the get function?
src/library_wasmfs_opfs.js
Outdated
| _wasmfs_opfs_get_entries__deps: [], | ||
| _wasmfs_opfs_get_entries: async function(ctx, dirID, entries) { | ||
| let dirHandle = wasmfsOPFSDirectories.get(dirID); | ||
| for await (const [name, child] of dirHandle.entries()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we stay away from const since it is more verbose than var (and let). Looks like a few snuck into library_webgl and library_webgpu but that's it, so probably best not to add more atm.
| @@ -0,0 +1,119 @@ | |||
| #include <assert.h> | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have an existing backend test template? It seems like we could use this file, or one like it if we already have one, and in each backend just include the file, maybe using some ifdefs to control things (like disable symlinks in OPFS since it lacks them). Otherwise the core testing code for each backend seems like it would be very similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have a template yet and I agree this file would make a good template. I don't think anything should change for this PR, though. We can add those ifdefs once we're ready to test the node backend like this perhaps.
| _wasmfs_opfs_init_root_directory: async function(ctx) { | ||
| if (wasmfsOPFSDirectories.allocated.length == 0) { | ||
| // Directory 0 is reserved as the root | ||
| let root = await navigator.storage.getDirectory(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, this is the only line that is specific to the Origin Private part of the File System Access API. This interface could be made more general by allowing the user to provide a directory handle (instead of calling the OPFS version of it). They could choose to give the origin private directory, or use window.showDirectoryPicker to prompt the user for a real folder.
It could be good to keep using navigator.storage.getDirectory() as a default, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this is wrong, sorry. I hadn't realized that the sync handle is limited to the OPFS. That's unfortunate.
Any plans on providing a backend for the normal FS API too, so files can be backed by real folders on disk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have concrete plans to work on that myself, but it would certainly be nice to have. My hope is that we can make it easy to create new WasmFS backends as userspace JS libraries, but it would also be reasonable to generalize this backend once it lands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, interesting. I'll keep an eye on this space and would be happy to contribute such a library when possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ability to create and mount custom backends in WasmFS is awesome.
I've been experimenting with window.showDirectoryPicker. The OPFS backend does most of the work already, but I found several challenges when replacing the root handle with a native FileSystemDirectoryHandle.
- The lack of sync handles requires using createWritable instead. The OPFS backend already uses createWritable when pthreads are not available, so we just need to always include the FileSystemAsyncAccessHandle class even when pthreads are available and avoid creating sync handles in the native access case. This requires the most changes (mostly minor).
SharedArrayBuffercannot be used, so the data must be copied to a temporaryArrayBuffer.- Passing the native
FileSystemDirectoryHandleinto the backend is difficult. I had to use IndexedDB to store it, then retrieve it when creating the backend. Is there a better way? - If the handle is persisted across sessions, some way of calling
requestPermissionon the handle is required from the backend (whengetEntriesis called). This is asynchronous and cannot be done from the proxy thread.
Working around these challenges in a custom backend based on OPSF backend code, I was able to view, create, delete files and directories on the native OS. I think the OPFS backend could be more generalized with a native option so that a whole new backend does not need to be created for native file access. The question is how to specify the FileSystemDirectoryHandle when creating the backend. Using IndexedDB adds some overhead and complexity. In my app, this works fine because I need to persist it anyway.
Some other issues I noticed include:
in wasmfs.h:
wasmfs_unmounttakes anintptr_tinstead of aconst char *wasmfs_get_backend_by_pathtakes achar *instead of aconst char *
in opfs_backend.cpp:
~OPFSDirectorycompares thedirIDto zero to avoid freeing the root ID, however isn't the root ID 1 (zero is undefined)? I think there may have been a couple of other places where ID 0 is assumed to be the root.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great feedback, thanks @goldwaving.
- Adding a
nativeoption to the OPFS backend code to reduce deduplication sounds like a good idea to me. - To pass the
FileSystemDirectoryHandleto the backend, I hope it would be possible to just store it in a global variable or a fancier global registry so the backend can retrieve it. This might also be an interesting use case for clang'sexternrefsupport: we could pass the handle directly through C to the backend constructor. - The assumption that the root has ID 0 sounds like a bug. I think that used to be true, but we must not have updated all the code when we changed that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A global variable would work for single threaded apps. When using pthreads, however, the FileSystemDirectoryHandle may have to cross two or more thread barriers (from main, to app's proxy, then to OPFS's proxy). That's where it gets tricky. Is there an emscripten way to move the handle across threads or is IndexedDB the only option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, that does sound tricky. @sbc100, do you know if we provide a way for users to postMessage JS objects from one thread to another?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really.. thats kind of below the abstraction level that our APIs provide.
Many folks have asked for this over the years but we have normally managed to stear them away from needing it at all.
In theory is possible to this this today by looking up wither worker object for a given thread and just using postMessage directly. You would also need to incercept/override the message handler for the thread, but that should also be doable today.
We could add a dedicated API for this, but I'd be tempted to just expose and official pthread->worker lookup function and then show an example of how to use postMessage based on that.
Add a backend that stores its underlying files in the Origin Private File System (OPFS) and reads and writes the files synchronously using `FileSystemSyncAccessHandle`. This initial implementation works correctly as long as there are no errors; better error handling and more robust edge case testing will come in a future PR.
a99d2fd to
44bd93d
Compare
kripken
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbc100 what do you think of the gen_struct_info.py changes here?
| cxxflags = [ | ||
| '-I' + utils.path_from_root('system/lib/libcxxabi/src'), | ||
| '-D__USING_EMSCRIPTEN_EXCEPTIONS__', | ||
| '-I' + utils.path_from_root('system/lib/wasmfs/'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe group the -I flags together?
sbc100
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gen_struct_info.py changes lgtm
|
Aha, testing locally I think the latest commit adding |
|
Alright now I don't know why the test is failing. What is the best way to investigate? Is it possible the version Chrome we're downloading is too old? |
Does this API require a certain new chrome version? We use chrome stable (see download chrome section of circleci config file). Does the test pass for you locally with chrome stable? |
|
Yes, the test passes locally with chrome stable. I saw in the download section of the config that we're downloading from our own bucket, but I don't know how that bucket gets updated. I'm not sure what is the first Chrome version that works. |
Ah! You are right it looks like we pinned for some reason: emscripten/.circleci/config.yml Lines 32 to 36 in c3fe57a
We should try to switch back to upstream |
|
I added the new latest chrome to the storage bucket under a new URL. If this works, we can commit this new URL, then copy the new chrome over the old chrome in the bucket, then have a separate PR to restore the old URL, then delete the new chrome out of the bucket. |
|
Oh wow, it worked! |
Add a backend that stores its underlying files in the Origin Private File
System (OPFS) and reads and writes the files synchronously using
FileSystemSyncAccessHandle. This initial implementation works correctly aslong as there are no errors; better error handling and more robust edge case
testing will come in a future PR.