In-memory database #1816

LowLevelMahn · 2023-07-14T06:39:27Z

like memgraph (or as a different example sqlite) have some sort of only-in-memory storage if persistence is not needed at all

ray6080 · 2023-07-14T08:09:00Z

Hi @LowLevelMahn thanks for raising the issue. This is a feature we're also looking into. Will make the design public once we get started on it.

stevesimmons · 2023-08-17T21:26:31Z

I am also interested in an in-memory storage model.

My actual use case is rather than one global graph db, having one small graph db per "case", in S3 blob storage. Then quickly loading the small graph db into either an in-memory db or a temporary on-disk directory. And writing it back to S3 if there were any writes.

I can do this with Sqlite via the serialize/deserialize commands. It would be great with Kuzu to be able to serialise the catalog/metadata/data/index/wal files (per the physical storage described in #1474) to a single binary blob, and then deserialise it to reload.

ray6080 · 2023-08-18T00:40:29Z

Hi @stevesimmons , thanks! That is a quite interesting use case. It makes sense to serialize/deserialize the whole database into/from a single blob, though we have to think a bit more how this can be done altogether with our on-disk storage changes. I'm adding that into our design for in-memory storage mode.

bigluck · 2024-03-14T19:46:15Z

I'd love this feature too; my use case is very similar to the one described by @stevesimmons .

I know that you're working with an IMPORT/EXPORT command, which should facilitate importing/dumping the database into S3.

But when importing an existing database, I would like to keep it only in memory, as it is a temporary database instance used only for reading data from the graph.

ATM I'm forced to initialize the DB using the following code:

with TemporaryDirectory() as tmp_path:
   ...

But disk is way slower than memory.

Duckdb for example supports in-memory databases by default: https://duckdb.org/docs/api/python/overview#using-an-in-memory-database

hpvd · 2024-03-24T10:45:02Z

just +1 on on this

dremekie · 2024-05-12T19:56:36Z

+1

prrao87 · 2024-05-12T19:58:35Z

@dremekie and @bigluck Could you please upvote the topmost post so that we can use the stats in our sorting? Thanks!

sapalli2989 · 2024-05-18T16:00:26Z

In this regard: How could derived data be handled in Kuzu database?

For example, given some persistent data A stored in DB, and other data B derived from A, it makes more sense to ad-hoc generate B on each app startup and store it only temporarily until DB shutdown. Reason I'd like to store this derived data temporarily for the database session is leveraging Cypher queries on it.

What if we could have a separate "area" for in-memory, volatile data while being able to also access and write persistent data? So this would be rather a case-by-base decision than all or nothing for in-memory vs. persistent model. This derived data could look same as the normal one and from user's perspective (Cypher) use same tables. Only distinction would be passing certain flag at creation time.

semihsalihoglu-uw · 2024-05-20T09:34:44Z

Hi @sapalli2989: As I understand, you are thinking of a partially in-memory and partially on-disk database. This may be a good idea but I think it's a complicated one and I doubt we would deliver it, considering the complexities it would add to our use case. In case, we should first deliver a proper in-memory version at some point.

Two comments: First, if you want to avoid I/O and get an in-memory version of Kuzu, you should be able to do so by opening your database to a /tmp directory which is backed by tmpfs in-memory file system on many operating systems. Therefore you wouldn't actually do any I/O. By default when you start Kuzu, we set Kuzu's buffer manager to 80% of your available RAM. If you point your database directory to /tmp, you should decrease your buffer manager space to say 40%-50% of your available RAM when starting Kuzu because each page of your disk will be stored in the tmpfs in-memory file system's buffer, and any scanned pages will be stored in Kuzu's buffer again. We have not really tested this on our side but this should just work. The implication of this solution is that you can have in-memory databases that are at most ~40% of your available RAM. We want to eventually implement a proper in-memory version that does not have this problem of database pages being stored in RAM in duplicate manner. We can also do a few more optimizations to make this feature work better. But we're not there yet.

On the other hand, for the case of deriving temporary data B that you don't want to permanently store: I think you can do the following steps for now: EXPORT your database A somewhere persistent, say /persistent/foo. Then open a new database that points to /tmp/in-mem/bar, so it's backed by the tmpfs in-memory file system. Then you can IMPORT /persistent/foo and then derive new data B etc. When you shut down your /tmp/in-mem/bar, you should rm -rf /tmp/in-mem/bar though because as I understand what you wrote there will not be deleted automatically (though this behavior might change between OS distributions) and could take from your RAM.

andyfengHKU · 2024-08-13T19:08:31Z

In-memory mode is added in #4012

LowLevelMahn · 2024-08-13T19:27:13Z

great - thanks

ray6080 added the feature New features or missing components of existing features label Nov 6, 2023

prrao87 mentioned this issue Apr 29, 2024

Add IF NOT EXISTS option to node and edge table creation. #2878

Closed

ray6080 mentioned this issue Aug 5, 2024

In memory mode #4012

Merged

1 task

prrao87 changed the title ~~[REQUEST] In-Memory database~~ In-memory database Aug 8, 2024

prrao87 mentioned this issue Aug 8, 2024

Upcoming releases roadmap #4029

Open

44 tasks

andyfengHKU closed this as completed Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In-memory database #1816

In-memory database #1816

LowLevelMahn commented Jul 14, 2023 •

edited by prrao87

Loading

ray6080 commented Jul 14, 2023

stevesimmons commented Aug 17, 2023

ray6080 commented Aug 18, 2023

bigluck commented Mar 14, 2024

hpvd commented Mar 24, 2024

dremekie commented May 12, 2024

prrao87 commented May 12, 2024 •

edited

Loading

sapalli2989 commented May 18, 2024

semihsalihoglu-uw commented May 20, 2024 •

edited by ray6080

Loading

andyfengHKU commented Aug 13, 2024 •

edited

Loading

LowLevelMahn commented Aug 13, 2024

In-memory database #1816

In-memory database #1816

Comments

LowLevelMahn commented Jul 14, 2023 • edited by prrao87 Loading

ray6080 commented Jul 14, 2023

stevesimmons commented Aug 17, 2023

ray6080 commented Aug 18, 2023

bigluck commented Mar 14, 2024

hpvd commented Mar 24, 2024

dremekie commented May 12, 2024

prrao87 commented May 12, 2024 • edited Loading

sapalli2989 commented May 18, 2024

semihsalihoglu-uw commented May 20, 2024 • edited by ray6080 Loading

andyfengHKU commented Aug 13, 2024 • edited Loading

LowLevelMahn commented Aug 13, 2024

LowLevelMahn commented Jul 14, 2023 •

edited by prrao87

Loading

prrao87 commented May 12, 2024 •

edited

Loading

semihsalihoglu-uw commented May 20, 2024 •

edited by ray6080

Loading

andyfengHKU commented Aug 13, 2024 •

edited

Loading