Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof of concept of sqlite serialization #436

Merged
merged 4 commits into from
Oct 5, 2022
Merged

Proof of concept of sqlite serialization #436

merged 4 commits into from
Oct 5, 2022

Conversation

oltdaniel
Copy link
Contributor

@oltdaniel oltdaniel commented Jun 20, 2022

Inspired by the StoreAsset function in order to store any file within the APE itself, I wondered if this is also possible with the SQLite file. Currently it is still required to store the file on the host itself and then adding it to the APE file afterwards.

In order to avoid too much work, I ignored the possibility of implementing a custom SQLite VFS that would be a communication layer between the APE file and the SQLite library. This should be in theory possible as well, but requires a lot of tuning to avoid moving the ZIP file content around too much. Thereby, the simple workaround with sqlite3_serialize and sqlite3_deserialize was enough. This required compiling the SQLite library with the SQLITE_ENABLE_DESERIALIZE and exposing the mentioned functions via lsqlite3.c to the Lua machine.

Testing

The demo has been extended with two files. Here a explanation what they do:

  • sql-backupstore.lua stores the SQLite database from .init.lua within the APE file as an backup and returns the stored size.
  • sql-backup.sql queries the stored database by loading it and executing the same query used in sql.lua

ToDo

  • fix style issues (if requested, clang-format config has been followed where possible)
  • learn more about the zSchema parameter to make correct use of sqlite3_serialize/sqlite3_deserialize
  • any documentation required for these new exposed functions?

This is a minimal proof of concept in order to show that it is easily possible to store the sqlite database within the zip file itself not requiring creating an external file first. Changes include compiling the sqlite library with the serialization flag, adding serialize/deserialize to the lua sqlite library and demonstrating the work via the redbean demo.
@pkulchenko
Copy link
Collaborator

Love that! @jart, this should be sufficiently simple to include. I've also been looking for a way to keep in-memory and on-disk images in sync (without worrying too much about a possible loss; mostly for performance reasons) and this should enable that support as well.

@thetanil
Copy link
Contributor

Nice contribution. I love the idea. Implementation looks clean. Thank you!

Theoretical feature request:

If the app crashes, you lose all changes since the redbean started. That's fine, the archived copy is safe. What are the things to consider if one wanted to perform a save on the db while the redbean is running? App specific behavior for sure.

I am just imagining:

  1. serialize
  2. deserialize
  3. user provided callback or something like that to initialize app logic which relies on sqlite

is there anything in redbean itself which needs to know? could the programmer put themselves in a situation where they do an incremental save and redbean cannot handle working with the newly deserialized :memory: db? I'm thinking of stale refs to old db laying around maybe?

I don't know if this is feasible, just trying to think it through out loud.

@oltdaniel
Copy link
Contributor Author

Thanks for the Feedback.

If the app crashes, you lose all changes since the redbean started. That's fine, the archived copy is safe. What are the things to consider if one wanted to perform a save on the db while the redbean is running? App specific behavior for sure.

That is a good point. I haven't done this kind of testing yet as I wanted to check first, if I'm not the only one that would find this feature nice. Yes this would be App specific or Framework specific depending on the project. We would just expose the bare functions (as this is the point of Redbean as @jart mentioned in the presentation about it). I will test saving the database "during" writes by just calling both endpoints with wrk and see if anything has been corrupted.

I am just imagining:

  1. serialize
  2. deserialize
  3. user provided callback or something like that to initialize app logic which relies on sqlite

is there anything in redbean itself which needs to know? could the programmer put themselves in a situation where they do an incremental save and redbean cannot handle working with the newly deserialized :memory: db? I'm thinking of stale refs to old db laying around maybe?

I would think that this feature is more or less used for write only during runtime and only is reading on the first initialization of redbean. But this would be an app specific functionality again. Redbean would just expose the possibility to actually serialize/deserialize a database. How you handle these functions within your application should be restricted to an example only as done with the SQLite database usage across different routes.

--

I think I'll test the usage of the database outside of the .init.lua and see how it behaves and if we run into any issues regarding data corruption with these SQLite functions themselves. Any more logic would probably live in an lua module. We could write that to have a cleaner .init.lua as it is quite bloated right now. But as this is the first work ever I've done with SQLite and Lua so I would need some more time reading the documentation.

@thetanil
Copy link
Contributor

I definitely think this would be useful for not only you and I but many others. I didn't realize that you're calling the sqlite api directly from init.lua. LoadSqlite and StoreSqlite I assumed were the public api for something hidden away more. Obviously now, it's all under the app devs control so they can do whatever they like. If they do get packed away into an API I think the callback for application (re)init still makes sense then. Anyway, keep it up, this is really nice!

@oltdaniel
Copy link
Contributor Author

Update on this feature. Sadly SQLite does not implement the use of a in-memory database across different processes. Thereby, this feature is only partially useful, as it would only allow changing the stored database within the APE file upon request, but not actually restoring a database to use it as an in memory one across the processes.

Where to go from here? As mentioned in the initial comment of this PR, there is the option to implement and VFS for SQLite that would work as an layer between the APE File and SQLite itself. If this can be implemented, the full feature-set of SQLite is supported.

The following resources can be read to determine this in-memory database behavior:

  • "allows two or more database connections in the same process to have access to the same in-memory database." (cited https://www.sqlite.org/sharedcache.html section 6)
  • https://redbean.dev/#lsqlite3 The fork architecture of readbean means, that only the database content created in the .init.lua are valid across different calls. No additional writes would make it back to this initialized database.

@pkulchenko
Copy link
Collaborator

Sadly SQLite does not implement the use of a in-memory database across different processes.

@oltdaniel, I think there are still some interesting scenarios that can be supported even with the current implementation.

  1. If you have the main process that does writing to the in-memory DB, then all forked processes can read that DB using already opened connections. This seems to work in my testing and I don't see why it wouldn't be working in general. For obvious reasons, as you noted in your last sentence in the previous comment, the writes from any forked processes won't be seen by other processes.
  2. If you can rearrange the app logic in the main redbean process, then it should be possible to handle all write queries in the main process and fork separate processes for all read queries, thus still maintaining in-memory data.
  3. I have some cases where it may be useful to have in-memory DB and save/restore it (for example, using this data for rate-limiting), but it's not super critical if the data is lost.

@oltdaniel
Copy link
Contributor Author

I think there are still some interesting scenarios that can be supported even with the current implementation.

Most definitely. Exposing these functions allows for a wider use of working with the sqlite files themselves. I changed the demo of redbean to simply include an backup example of the in-memory database and query from that backup.

The original idea is I think only feasible with SQLite VFS or simply using another in-memory database that allows cross process usage.

As explained in #436 (comment) the original use case is not possible with sqlite serialization, as an in-memory database cannot be shared across multiple processes. Thereby, this use case simply creates a backup of the in-memory database created in '.init.lua' and loads it to do a query.
@pkulchenko
Copy link
Collaborator

pkulchenko commented Sep 24, 2022

@oltdaniel, one question on the patch: why is szBuf value set to 5 here: sqlite3_deserialize(db->db, "main", buffer, size, 5, 0);? According to the docs it should be the total size of the DB and should probably be larger than (or at least equal to) size (and ideally configurable).

I'm thinking about allowing szBuf to be configurable and if it's not set, then set it to size and add SQLITE_DESERIALIZE_RESIZEABLE to flags.

@pkulchenko
Copy link
Collaborator

pkulchenko commented Sep 24, 2022

@oltdaniel, also, can you send @jart the copyright assignment per the instructions here if you haven't done that already?

Copy link
Owner

@jart jart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for letting this get placed on the backburner. Also welcome as a new contributor. We need to go through one quick hurdle before we can proceed. Could you please email Justine Tunney jtunney@gmail.com and say that you intend to assign her the copyright to the changes you contribute to Cosmopolitan? Please send the email under your legal name rather than anonymously. Please use a real email account associated with your identity. See CONTRIBUTING.md for further details. We make the ask because it helps us ship the tiniest most legally permissive prim and proper binaries.

The call to the sqlite3 library for the deserilization wasn't fully correct. This should fix the size parameters.
@oltdaniel
Copy link
Contributor Author

@oltdaniel, one question on the patch: why is szBuf value set to 5 here: sqlite3_deserialize(db->db, "main", buffer, size, 5, 0);? According to the docs it should be the total size of the DB and should probably be larger than (or at least equal to) size (and ideally configurable).

Thanks for the hint @pkulchenko . Fixed it now, wasn't familiar with the SQLite API before.

Sorry for letting this get placed on the backburner. Also welcome as a new contributor. We need to go through one quick hurdle before we can proceed. Could you please email Justine Tunney jtunney@gmail.com and say that you intend to assign her the copyright to the changes you contribute to Cosmopolitan? Please send the email under your legal name rather than anonymously. Please use a real email account associated with your identity. See CONTRIBUTING.md for further details. We make the ask because it helps us ship the tiniest most legally permissive prim and proper binaries.

No problem @jart . As I've used my noreply E-Mail from my GitHub Account for the commits, that isn't possible as such. Is there an alternative? Granting the copyright here should be enough.

I hereby assign the copyright of the code written by myself within this Pull Request to @jart .

@jart
Copy link
Owner

jart commented Oct 2, 2022

@oltdaniel The commit email address isn't an issue. You aren't required to disclose your email address to the public. You just have to disclose it to me. Your GitHub profile says you're Daniel Oltmanns. Send me an email from yourself, link to this PR in the email. Then after you've sent the email, leave a comment on this PR saying "I sent it!". That's all I need.

@oltdaniel
Copy link
Contributor Author

@jart I sent it!

@jart
Copy link
Owner

jart commented Oct 5, 2022

Thanks for sending it! I'll go ahead and merge now. Thank you again for contributing this to the project!

@jart jart merged commit afc07b9 into jart:master Oct 5, 2022
pkulchenko added a commit to pkulchenko/cosmopolitan that referenced this pull request Nov 28, 2022
Using either option (FREEONCLOSE or RESIZEABLE) requires using
sqlite3_malloc. Ref. jart#436
@pkulchenko
Copy link
Collaborator

@oltdaniel, FYI: I made a couple of changes in #701 to allow the DB to grow beyond the initial buffer size.

jart pushed a commit that referenced this pull request Nov 28, 2022
Using either option (FREEONCLOSE or RESIZEABLE) requires using
sqlite3_malloc. Ref. #436
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants