Skip to content

Conversation

@tlively
Copy link
Member

@tlively tlively commented Apr 26, 2022

Add a backend that stores its underlying files in the Origin Private File
System (OPFS) and reads and writes the files synchronously using
FileSystemSyncAccessHandle. This initial implementation works correctly as
long as there are no errors; better error handling and more robust edge case
testing will come in a future PR.

@tlively tlively requested review from kripken and sbc100 April 26, 2022 00:06

using ProxyWorker = emscripten::ProxyWorker;

extern "C" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems odd to put the extern "C" block inside a C++ namespace... maybe move the namespace usage below this block?

#include "wasmfs.h"
#include <stdlib.h>

namespace wasmfs {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the C++ stuff in this file designed to be used outside of this TU? Perhaps use anonymous namespace instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched this to an anonymous namespace and added a using namespace wasmfs; to the top of the file to keep name resolution working.

Comment on lines 33 to 45
let id = ids.allocated.length;
if (ids.free.length > 0) {
id = ids.free.pop();
}
assert(ids.allocated[id] === undefined);
ids.allocated[id] = handle;
return id;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let id = ids.allocated.length;
if (ids.free.length > 0) {
id = ids.free.pop();
}
assert(ids.allocated[id] === undefined);
ids.allocated[id] = handle;
return id;
var id;
if (ids.free.length > 0) {
id = ids.free.pop();
ids[id] = handle;
} else {
id = ids.allocated.length;
ids.push(handle);
}
return id;

Maybe slightly more idiomatic as it avoids writing to one-past-the-end as a way to extend, and instead uses push(). That might be slightly faster but I didn't measure.

'$wasmfsOPFSDirectories',
'$wasmfsOPFSFiles'],
$wasmfsOPFSGetOrCreateFile: async function(parent, name, create) {
let parent_handle = wasmfsOPFSDirectories.get(parent);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let parent_handle = wasmfsOPFSDirectories.get(parent);
let parentHandle = wasmfsOPFSDirectories.get(parent);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to camelCase for all parameters and locals.

},

$wasmfsOPFSAllocate: function(ids, handle) {
let id = ids.allocated.length;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been using var. Perhaps we should switch to let at this point, but I'm not sure if we've decided that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been using let in all of my recent JS, and there is a bunch of other usage as well. When I discussed this recently with @sbc100, we decided that let was ok but for..of was not. I'm not sure if these conventions/decisions are documented anywhere, though.

if (err.name === "TypeMismatchError") {
return -2;
}
abort("Unknown exception " + err.name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
abort("Unknown exception " + err.name);
throw err;

let file_handle;
try {
file_handle = await parent_handle.getFileHandle(name, {create: create});
} catch (err) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} catch (err) {
} catch (e) {

We appear to pretty consistently use "e" for this purpose in src/*.js.

Comment on lines 11 to 15
get: function(i) {
return this.allocated[i];
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more idiomatic and less repetitive way to add accessors like this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of a better way atm.

for await (const [name, child] of dir_handle.entries()) {
withStackSave(() => {
let name_p = allocateUTF8OnStack(name);
// TODO: Figure out how to use `cDefine` here
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guidance on how to get cDefine to work would be very welcome.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you add the stuff you need to src/struct_info*.json and then it "just works".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this working, although it required some work in gen_struct_info.py.

},

$wasmfsOPFSAllocate: function(ids, handle) {
let id = ids.allocated.length;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been using let in all of my recent JS, and there is a bunch of other usage as well. When I discussed this recently with @sbc100, we decided that let was ok but for..of was not. I'm not sure if these conventions/decisions are documented anywhere, though.

'$wasmfsOPFSDirectories',
'$wasmfsOPFSFiles'],
$wasmfsOPFSGetOrCreateFile: async function(parent, name, create) {
let parent_handle = wasmfsOPFSDirectories.get(parent);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to camelCase for all parameters and locals.

#include "wasmfs.h"
#include <stdlib.h>

namespace wasmfs {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched this to an anonymous namespace and added a using namespace wasmfs; to the top of the file to keep name resolution working.

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

It would also be good to add a persistence test here, like we have for (the old) IDBFS, where we run more than once and see that contents stayed around. The general format of such tests, I think, is to run them the first time (having some ifdef for that), in which we clear any old state if it exists. Then set the state. Then we run it a second time (with an ifdef for that) and we just load data there and verify it.

get: function(i) {
return this.allocated[i];
}
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three could all be generated from a single function that creates such instances. A TODO to refactor though might be enough for now. It's possible we have other such stuff that could be refactored with them too.

'$wasmfsOPFSFiles'],
$wasmfsOPFSGetOrCreateFile: async function(parent, name, create) {
let parentHandle = wasmfsOPFSDirectories.get(parent);
assert(parentHandle !== undefined);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the assertion could be in the get function?

_wasmfs_opfs_get_entries__deps: [],
_wasmfs_opfs_get_entries: async function(ctx, dirID, entries) {
let dirHandle = wasmfsOPFSDirectories.get(dirID);
for await (const [name, child] of dirHandle.entries()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we stay away from const since it is more verbose than var (and let). Looks like a few snuck into library_webgl and library_webgpu but that's it, so probably best not to add more atm.

@@ -0,0 +1,119 @@
#include <assert.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an existing backend test template? It seems like we could use this file, or one like it if we already have one, and in each backend just include the file, maybe using some ifdefs to control things (like disable symlinks in OPFS since it lacks them). Otherwise the core testing code for each backend seems like it would be very similar.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have a template yet and I agree this file would make a good template. I don't think anything should change for this PR, though. We can add those ifdefs once we're ready to test the node backend like this perhaps.

_wasmfs_opfs_init_root_directory: async function(ctx) {
if (wasmfsOPFSDirectories.allocated.length == 0) {
// Directory 0 is reserved as the root
let root = await navigator.storage.getDirectory();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, this is the only line that is specific to the Origin Private part of the File System Access API. This interface could be made more general by allowing the user to provide a directory handle (instead of calling the OPFS version of it). They could choose to give the origin private directory, or use window.showDirectoryPicker to prompt the user for a real folder.

It could be good to keep using navigator.storage.getDirectory() as a default, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is wrong, sorry. I hadn't realized that the sync handle is limited to the OPFS. That's unfortunate.

Any plans on providing a backend for the normal FS API too, so files can be backed by real folders on disk?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have concrete plans to work on that myself, but it would certainly be nice to have. My hope is that we can make it easy to create new WasmFS backends as userspace JS libraries, but it would also be reasonable to generalize this backend once it lands.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, interesting. I'll keep an eye on this space and would be happy to contribute such a library when possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ability to create and mount custom backends in WasmFS is awesome.

I've been experimenting with window.showDirectoryPicker. The OPFS backend does most of the work already, but I found several challenges when replacing the root handle with a native FileSystemDirectoryHandle.

  • The lack of sync handles requires using createWritable instead. The OPFS backend already uses createWritable when pthreads are not available, so we just need to always include the FileSystemAsyncAccessHandle class even when pthreads are available and avoid creating sync handles in the native access case. This requires the most changes (mostly minor).
  • SharedArrayBuffer cannot be used, so the data must be copied to a temporary ArrayBuffer.
  • Passing the native FileSystemDirectoryHandle into the backend is difficult. I had to use IndexedDB to store it, then retrieve it when creating the backend. Is there a better way?
  • If the handle is persisted across sessions, some way of calling requestPermission on the handle is required from the backend (when getEntries is called). This is asynchronous and cannot be done from the proxy thread.

Working around these challenges in a custom backend based on OPSF backend code, I was able to view, create, delete files and directories on the native OS. I think the OPFS backend could be more generalized with a native option so that a whole new backend does not need to be created for native file access. The question is how to specify the FileSystemDirectoryHandle when creating the backend. Using IndexedDB adds some overhead and complexity. In my app, this works fine because I need to persist it anyway.

Some other issues I noticed include:
in wasmfs.h:

  • wasmfs_unmount takes an intptr_t instead of a const char *
  • wasmfs_get_backend_by_path takes a char * instead of a const char *

in opfs_backend.cpp:

  • ~OPFSDirectory compares the dirID to zero to avoid freeing the root ID, however isn't the root ID 1 (zero is undefined)? I think there may have been a couple of other places where ID 0 is assumed to be the root.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great feedback, thanks @goldwaving.

  • Adding a native option to the OPFS backend code to reduce deduplication sounds like a good idea to me.
  • To pass the FileSystemDirectoryHandle to the backend, I hope it would be possible to just store it in a global variable or a fancier global registry so the backend can retrieve it. This might also be an interesting use case for clang's externref support: we could pass the handle directly through C to the backend constructor.
  • The assumption that the root has ID 0 sounds like a bug. I think that used to be true, but we must not have updated all the code when we changed that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A global variable would work for single threaded apps. When using pthreads, however, the FileSystemDirectoryHandle may have to cross two or more thread barriers (from main, to app's proxy, then to OPFS's proxy). That's where it gets tricky. Is there an emscripten way to move the handle across threads or is IndexedDB the only option?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, that does sound tricky. @sbc100, do you know if we provide a way for users to postMessage JS objects from one thread to another?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really.. thats kind of below the abstraction level that our APIs provide.

Many folks have asked for this over the years but we have normally managed to stear them away from needing it at all.

In theory is possible to this this today by looking up wither worker object for a given thread and just using postMessage directly. You would also need to incercept/override the message handler for the thread, but that should also be doable today.

We could add a dedicated API for this, but I'd be tempted to just expose and official pthread->worker lookup function and then show an example of how to use postMessage based on that.

tlively added 8 commits April 29, 2022 11:12
Add a backend that stores its underlying files in the Origin Private File
System (OPFS) and reads and writes the files synchronously using
`FileSystemSyncAccessHandle`. This initial implementation works correctly as
long as there are no errors; better error handling and more robust edge case
testing will come in a future PR.
@tlively tlively force-pushed the wasmfs-opfs-backend branch from a99d2fd to 44bd93d Compare April 29, 2022 21:35
@tlively tlively requested review from kripken and sbc100 April 29, 2022 21:40
Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sbc100 what do you think of the gen_struct_info.py changes here?

cxxflags = [
'-I' + utils.path_from_root('system/lib/libcxxabi/src'),
'-D__USING_EMSCRIPTEN_EXCEPTIONS__',
'-I' + utils.path_from_root('system/lib/wasmfs/'),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe group the -I flags together?

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gen_struct_info.py changes lgtm

@tlively tlively enabled auto-merge (squash) May 2, 2022 16:32
@tlively
Copy link
Member Author

tlively commented May 2, 2022

@sbc100 I bisected the other.test_minimal_runtime_code_size_math failure to c5c7623

@sbc100
Copy link
Collaborator

sbc100 commented May 2, 2022

@sbc100 I bisected the other.test_minimal_runtime_code_size_math failure to c5c7623

That test passes on main right now, at least for me. And according the recent CI runs.

@tlively
Copy link
Member Author

tlively commented May 2, 2022

@kripken, @sbc100, any idea why the browser test would have [no http server activity] for the new test? Is that what happens when a test throws an unexpected exception?

@tlively
Copy link
Member Author

tlively commented May 2, 2022

Aha, testing locally I think the latest commit adding --enable-experimental-web-platform-features should fix it.

@tlively
Copy link
Member Author

tlively commented May 3, 2022

Alright now I don't know why the test is failing. What is the best way to investigate? Is it possible the version Chrome we're downloading is too old?

@sbc100
Copy link
Collaborator

sbc100 commented May 3, 2022

Alright now I don't know why the test is failing. What is the best way to investigate? Is it possible the version Chrome we're downloading is too old?

Does this API require a certain new chrome version? We use chrome stable (see download chrome section of circleci config file). Does the test pass for you locally with chrome stable?

@tlively
Copy link
Member Author

tlively commented May 3, 2022

Yes, the test passes locally with chrome stable. I saw in the download section of the config that we're downloading from our own bucket, but I don't know how that bucket gets updated. I'm not sure what is the first Chrome version that works.

@sbc100
Copy link
Collaborator

sbc100 commented May 3, 2022

Yes, the test passes locally with chrome stable. I saw in the download section of the config that we're downloading from our own bucket, but I don't know how that bucket gets updated. I'm not sure what is the first Chrome version that works.

Ah! You are right it looks like we pinned for some reason:

# Using stable rather than beta until we can fix
# Currently downloading form our own buckets due to:
# https://github.com/emscripten-core/emscripten/issues/14987
#wget -O ~/chrome.deb https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
wget -O ~/chrome.deb https://storage.googleapis.com/webassembly/chrome/google-chrome-stable_current_amd64.deb

We should try to switch back to upstream

@tlively
Copy link
Member Author

tlively commented May 3, 2022

I added the new latest chrome to the storage bucket under a new URL. If this works, we can commit this new URL, then copy the new chrome over the old chrome in the bucket, then have a separate PR to restore the old URL, then delete the new chrome out of the bucket.

@tlively tlively merged commit a824211 into main May 3, 2022
@tlively tlively deleted the wasmfs-opfs-backend branch May 3, 2022 22:55
@tlively
Copy link
Member Author

tlively commented May 3, 2022

Oh wow, it worked!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants