Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split the async fs interface, indexedDB, and sqlite-specific code #2

Open
lovasoa opened this issue Aug 12, 2021 · 8 comments
Open

Comments

@lovasoa
Copy link

lovasoa commented Aug 12, 2021

Hello !
This is very useful work you have been doing here. If I understand well, this repository contains both the Atomic and Shared Array Buffer logic required to provide an async interface on top of emscripten's synchronous FS, and the code that implements this interface with an IndexedDB backend.

  • Is there any part of this code that is specific to sql.js, or could this be reused with any emscripten-compiled code that interacts with files ? If there is code that is specific to sql.js, then would it be feasible to split it from the sync-async bridge and the IndexedDB logic ?

  • Would it be possible to split the IndexedDB backend, in order to let the user provide a different implementation? I'm thinking about an HTTP backend in particular.

@lovasoa
Copy link
Author

lovasoa commented Aug 12, 2021

@kripken : would you be ready to merge this IndexedDB backend into emscripten itself ?

@jlongster
Copy link
Owner

Hello!

I'm hesitant to merge it into emscription; it's really nice to be able to quickly release outside of emscripten, and there is enough tricky stuff here that it should probably be a separate project. However there might be things Emscripten can do to make it easier to plug in.

Is there any part of this code that is specific to sql.js, or could this be reused with any emscripten-compiled code that interacts with files ? If there is code that is specific to sql.js, then would it be feasible to split it from the sync-async bridge and the IndexedDB logic ?

I had thought about this being a more generic filesystem that would support writing down and file and storing it in a similar way. But I've specialized to to SQLite. It is not specific to sql.js, but it does assume SQLite data specifically.

You really don't want to manage any state like "page size", because it'll be so easy to get out of sync with what SQLite uses internally. We read the bytes from the file directly to get the page size.

You can see here in sqlite-file.js that we read the page size: https://github.com/jlongster/absurd-sql/blob/master/src/sqlite-file.js#L221-L232

It does this if you take a db file, and write the whole thing down. It gets the page size from the bytes, and that's how it knows how to break it up.

It's possible that this project could be split up in some way; let's let the dust settle and see what's possible.

Would it be possible to split the IndexedDB backend, in order to let the user provide a different implementation. I'm thinking about an HTTP backend in particular.

Personally I think it should be as easy as possible for users to get this stuff working with sql.js. Splitting it up more will add another library to add, but I'm open to it. I guess it wouldn't be too hard for the backends to be different packages.

Regarding HTTP, I'm really not sure how it would work. You don't have any transactional semantics to work with. I just published my post, I go into this in the second half about why that's so important: https://jlongster.com/future-sql-web

So anyway, thanks! After the dust settles we can talk about the best path forward!

@kripken
Copy link

kripken commented Aug 12, 2021

Interesting stuff here!

In general, I am hoping that we do a large rewrite of the emscripten FS code in the next few months (ASMFS, or related). So any large change might not make sense to add there atm. A small one might make sense though, as if it can be done as a small additional JS FS, then those will get ported to the new rewrite eventually (is the general plan).

I'd be interested to hear if there are things such a rewrite should do to make this easier. One goal is to write it to native code as much as possible to allow for easier sync/async interation, which from the description here sounds relevant. (cc @rstz who I've been talking with about this recently)

@jlongster
Copy link
Owner

Hey @kripken!

I don't think there shouldn't be any large changes. All I need from emscripten is the ability to hook into the filesystem APIs. Right now it nicely provides this with the mount method.

It feels a little weird for my app to be using the filesystem coming from sql.js. Maybe I'm using multiple WASM projects and I want them all to use the same filesystem. I know you can compile it without the filesystem, but I don't think there's any way to "hook" in a filesystem at runtime? I'd compile all them without a filesystem and then hook in mine.

But overall, I don't think any of this should live within emscripten. There's too many tradeoffs. People may just want to store files as blobs in IDB; they don't need to make partial writes to it. The filesystem capabilities should be 3rd party, so that's the only thing I'd keep in mind.

To be honest, I'm a little concerned to hear that it would move to native code. Will I still be able to provide my own JavaScript filesystem implementation?

Oh, there is one thing we need from Emscripten: lock and unlock methods on a file. You'd translate the fcntl syscall into those methods. Right now it just ignores requests to lock a file (F_RDLCK, etc) but it should forward those requests to the file. Because it doesn't do that I have to write some C code to manually hook up lock/unlock events. I can open an issue/PR on emscripten.

@jlongster
Copy link
Owner

Would the native code use WASI?

@jlongster
Copy link
Owner

@kripken Thoughts on above?

@kripken
Copy link

kripken commented Aug 18, 2021

Oops, sorry @jlongster , I missed this in my inbox somehow...

I like the idea to make the filesystem separate somehow. The closest thing I am aware of is BrowserFS which has a central "OS" that handles filesystem syscalls, and multiple things can connect to. Definitely that's worth considering in our rewrite.

To be honest, I'm a little concerned to hear that it would move to native code. Will I still be able to provide my own JavaScript filesystem implementation?

I think so. We should be able to provide multiple different backends for that code, using JS. Similar to the current filesystem backends, but really just the core "read to/from IndexedDB" for IDB, for example, instead of right now where there is a lot of boilerplate - that would go into the native code.

Would the native code use WASI?

As much as possible, like Emscripten does. But it is useful to optionally support some extra POSIX capabilities as emscripten does today, I think - we don't want to regress that.

Not sure what you mean by lock/unlock - are those libc methods? Or syscalls? Or do you mean logically?

@jlongster
Copy link
Owner

@kripken Sorry for my delay -- had a lot to catch up on in the last month. Going to be watching this from now on though.

I filed an issue describing the lock/unlock needs on emscripten: emscripten-core/emscripten#15070

Sounds good about the rest of the filesystem changes. As long as there is a way to hook in some JS to those APIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants