Strategies for large libraries #40

shermp · 2021-01-03T03:15:47Z

The user bigwoof on MobileRead has run into issues using KU with a large book library, and it's brought to light that KU as released really is not very memory efficient. And that even when one tries to improve memory usage efficiency, holding the entire calibre metadata set in memory can be problematic.

I've been trying to think of strategies to deal with this, and these are the ideas I've come up with so far:

Don't bother with calibre metadata. Just send Calibre whatever we have available in Nickel's DB. Simple to implement, probably the most efficient. Downside is not keeping the metadata.calibre file in sync with the calibre kobo driver.
Store the metadata from calibre in some sort of file-based kv store. And maybe sync that store with metadata.calibre?
Similar to above, but use an SQLite DB with proper columns to store metadata.
Find a way of indexing/accessing JSON directly from file

I'm really open to all ideas.

Paging @NiLuJe and @pgaskin and @pazos for ideas.

The text was updated successfully, but these errors were encountered:

pazos · 2021-01-03T20:12:47Z

Don't bother with calibre metadata. Just send Calibre whatever we have available in Nickel's DB. Simple to implement, probably the most efficient. Downside is not keeping the metadata.calibre file in sync with the calibre kobo driver.

I would go with that one. After all nickel doesn't use metadata.calibre at all.

The plugin we use on KOReader discards most of the info that calibre streams on each new book. The rationale is: keep the bare minimum info to tell calibre on the next connection and a few fields useful for metadata lookups (title, authors, tags, series, series index). I think most of the junk that you hold in memory are base64 thumbnails and user columns.

That way is possible to keep track of thousands of books in memory without too much trouble. The file is dumped to a json file on each change, but that's just because it is needed for the "search on calibre metadata" function. If we didn't need that I guess that any binary format would be faster.

shermp · 2021-01-03T20:19:44Z

Yeah, if I do this, probably the only extra metadata I'd keep would be the Calibre UUID and maybe last modified date/time, as those are what's sent with the "book count" list.

pgaskin · 2021-01-03T20:39:57Z

I'm not totally familiar with how the metadata code works or when the file is manipulated, but you could try using a streaming JSON parser and keeping an index in the JSON for read operations (maybe with a caching layer if you read the same thing often), then making an in-memory log of pending updates and write them all at once. Alternatively, a database mirroring the Calibre metadata file and kept in sync with it (regenerating the Calibre metadata file when needed) would be another option, but I would probably avoid this unless absolutely necessary due to the possible race conditions and bugs.

shermp · 2021-01-03T20:58:19Z

There are actually very few times when the full metadata is actually used. The JSON indexing idea is definitely something I've been thinking about. Do you know of a streaming decoder that can do this? I don't think it can be done with encoding/json.

shermp · 2021-01-03T21:09:52Z

There are actually very few times when the full metadata is actually used. The JSON indexing idea is definitely something I've been thinking about. Do you know of a streaming decoder that can do this? I don't think it can be done with encoding/json.

Doh, helps to RTFM.

Decoder.InputOffset looks to be what I need to build an index.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strategies for large libraries #40

Strategies for large libraries #40

shermp commented Jan 3, 2021

pazos commented Jan 3, 2021

shermp commented Jan 3, 2021

pgaskin commented Jan 3, 2021

shermp commented Jan 3, 2021

shermp commented Jan 3, 2021 •

edited

Loading

Strategies for large libraries #40

Strategies for large libraries #40

Comments

shermp commented Jan 3, 2021

pazos commented Jan 3, 2021

shermp commented Jan 3, 2021

pgaskin commented Jan 3, 2021

shermp commented Jan 3, 2021

shermp commented Jan 3, 2021 • edited Loading

shermp commented Jan 3, 2021 •

edited

Loading