Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rhizome.db growing without shrinking through meshms flood #106

Open
gh0st42 opened this issue May 17, 2016 · 5 comments
Open

rhizome.db growing without shrinking through meshms flood #106

gh0st42 opened this issue May 17, 2016 · 5 comments

Comments

@gh0st42
Copy link

gh0st42 commented May 17, 2016

While testing the new version of servald we encountered a big bug regarding sqlite and meshms.

We flooded one machine from another with 32000 messages, after a few messages we can only send one message per second and a new file entry for the message is created in the database but the old journal is not removed, so each ne entry is as big as the old one plus the new size. After a certain rhizome.db size is reached blobs are written to the filesystem.

A simple script to flood another host with messages can be found here: https://github.com/umr-ds/serval-tests/blob/master/meshms-flood

Even after a few hundred mesages you can see that every few seconds the database grows another megabyte even though the messages are only 53 bytes long.

Using restful api or commandline doesn't make a difference.

@lakeman
Copy link
Member

lakeman commented May 17, 2016

Journal bundles use less network bytes to transfer, but we currently
re-write the whole blob on the filesystem of each node. Mostly so we can
re-hash the payload bytes and commit the new version atomically (ish).
Changing that is not simple.

We could impose a meshms ply size limit and advance the tail of the bundle.
A feature of journals that we've designed and discussed, but haven't built
into the client API yet.

We've also planned for multiple rhizome bundles to be created for the same
file hash. So we run a garbage collection process every 30-minutes (ish) to
clean out any orphans.

rhizome_cleanup(NULL);

Which might need to run on some other trigger, and probably isn't being
tested very well.

We've also wanted to build a new storage layer for some time with a number
of technical improvements;

  • hash files on block boundaries
  • discover and record object deltas
  • store and transfer more complex object graphs
  • move away from sqlite to a mem-mapped b-tree of some kind
  • support multiple storage devices with hotplug removal
  • shift all I/O out of the main servald process

In other words, a better git object store

I keep wanting to start this, but we haven't had a pressing need or the
budget to do this yet.

On Tue, May 17, 2016 at 10:24 PM, gh0st42 notifications@github.com wrote:

While testing the new version of servald we encountered a big bug
regarding sqlite and meshms.

We flooded one machine from another with 32000 messages, after a few
messages we can only send one message per second and a new file entry for
the message is created in the database but the old journal is not removed,
so each ne entry is as big as the old one plus the new size. After a
certain rhizome.db size is reached blobs are written to the filesystem.

A simple script to flood another host with messages can be found here:
https://github.com/umr-ds/serval-tests/blob/master/meshms-flood

Even after a few hundred mesages you can see that every few seconds the
database grows another megabyte even though the messages are only 53 bytes
long.

Using restful api or commandline doesn't make a difference.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#106

@gh0st42
Copy link
Author

gh0st42 commented May 17, 2016

Thanks for your explanation so far.
The problem is at the moment we have in the database something like that:

  1. msg1
  2. msg1 + msg2
  3. msg1 + msg2 + msg3
    ...

And all versions are kept wasting soo much disk space (for 3 messages, 6 are kept on disk). A few kilobytes of text can end up in severals tens or even hundreds of megabytes on disk. In our test a 95kb conversation produced a 111mb rhizome.db :D

This keeps growing and growing, this doesn't really scale well. So at least for our paper it makes it hard so sell, because large local communities using MeshMS as a daily communication system will come to a big storage problem on their nodes in a short amount of time. A few hundred messages per conversation is not really much if you look at usage statistics from current messengers. Especially during an emergency people will write even more messages in a shorter time because of panic, eyewitness reports, contacting family etc.. Taking the small router space and precious storage on mobile devices into account this is a big problem! And we're not even talking about malicious individuals or hackers here..

Sqlite might cause some problems and make stuff slow but here the problem ist more the concept itself, storing the same data directly on disk doesn't help either. I understand what you wanted to achieve here but the trade off of network bytes vs disk space imho is not working here. If i have to store roughly 2kb to transmit 7 message a 53 byte (=371 byte + overhead = 483 bytes real data) which gets even worse over time I might be better off transmitting the whole 371 bytes over the wire/air at least with bluetooth or wifi links. Sure modern computers have lots of disk space (our tp-link router don't:( ) but while testing I had 35GB of blobs in my database just for one long conversation.

Also I'm not sure if the garbage collection will help that much, timing problems aside, getting rid of the oldest entries is getting rid of the smallest. Even if we only keep the 3 newest per conversation, once the database has reached a significant size (long term use) the historic copies will also be quite large. But short term we can reduce a 100mb database back to 1mb or something like that which would be good for the moment.

Getting back to finding a solution:
Would it be possible to have the journal as kind of a meta-file in rhizome? Being actually a linked list where data gets appended and individual entries can be sent? So from the outside there is just one file in the database for this conversation but it has several small encrypted portions (single messages) in correct order. This way one could bulk request tail -3 entries but we wouldn't need to keep every combination. Probably missed something here...

I don't like complaining to someone else without real solutions myself but this problem is probably none that can be fixed with a few lines of code and has big implications on longterm use for larger communities or selling it in our disaster scenarios.

@gh0st42
Copy link
Author

gh0st42 commented May 17, 2016

Also wouldn't some kind of ACK from the conversation parties be enough to discard messages both parties have already received? No need for all nodes in the network to keep the history throughout eternity. If a new node comes by that has old messages it can discard them as soon has he also sees the ACK and middle nodes only need to keep a journal of what hasn't been ACK'd so far. Would be a bit more management and meta information that need to be distributed but could help in the long run to keep the network clean and in a working state...

@lakeman
Copy link
Member

lakeman commented May 18, 2016

I'm not disagreeing with you. Running;

$ servald rhizome clean

Will hopefully tidy everything up. Which is what we try to run every 30 minutes. Clearly for your test case this isn't often enough. Triggering a cleanup based on some kind of Used / Free ratio somewhere in;
https://github.com/servalproject/serval-dna/blob/development/rhizome_store.c#L197
will probably help a lot.

Deleting orphan payloads shortly after the manifest is replaced will probably help too. But there's another reason we delay removing old payloads. If one of our neighbours is fetching the current version, and a new version arrives, we want to ensure that we can complete the delivery of the current version. If you have a bundle like this that might be rapidly changing, it's better to complete each transfer than to abort it because you have a newer version.

Solving all of these issues without creating new ones is complicated. I would much rather nuke it from orbit and start again (Plan A).

Anyway, on to Plan B. We teach the rhizome store layer to handle journal bundles differently;

  1. Just before we finish writing journal payloads;
    https://github.com/servalproject/serval-dna/blob/development/rhizome_store.c#L722
    Save the hash state "somewhere" with the payload. (Probably need to be careful about library versions, CPU endianness & struct field alignment)

  2. When we open the journal again;
    https://github.com/servalproject/serval-dna/blob/development/rhizome_store.c#L1551
    If advance_by == 0 && copy_length > 0. Try to load the previous hash state. If that works, try to hard-link the existing payload file to the new temporary filename and seek to the file offset of the previous manifest.

If we can't create a new link on this filesystem (errno==EXDEV?), there's still a way to save space. But things are a bit more complicated. For other errors, just fall back to the current code path.

I think that should do it. Of course we'll need some test cases...

@lakeman
Copy link
Member

lakeman commented May 18, 2016

Also note that using the meshms command line at the same time that rhizome synchronisation is occurring will cause delay and perhaps failures due to database locking.

While using curl to send messages via the restful API may be adding a 1 second delay. By default curl sends an "Expect:" header and waits for a "100 - Continue" response. You can avoid this delay by adding a '-H "Expect:"' argument. I've just added this to our existing test cases.

Rubyfi added a commit to umr-ds/serval-tests that referenced this issue May 18, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants