Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk usage increase after moving data #67

Open
florish opened this issue Jan 10, 2023 · 9 comments
Open

Disk usage increase after moving data #67

florish opened this issue Jan 10, 2023 · 9 comments

Comments

@florish
Copy link

florish commented Jan 10, 2023

Hi, not entirely sure if this is a bug, but it felt weird, so here's a report. Feel free to dismiss!

I've been using cubdb as a simple HTTP response cache for a while. In my development database, I had ± 10K records with 2.5GB disk space in use (measured using du -sh on the CubDB data directory).

Due to a data structure change, I've been copying all data to be stored under a different key within the same CubDB database. Afterwards, the old data has been removed, which means that the net amount of data should not change significantly.

I noticed, however, that the database is now using 22GB (!) of disk space, while the total number of records (measured using CubDB.size/1) is as expected, which means the expected data deletion did take place. Manually running CubDB.compact/1 does not change the disk space used.

As I did not expect this disk usage increase, I'm suspecting some kind of bug, but I'm not sure.

One thing that could be helpful: I did not measure the number of .cub files in use before the migration, but afterwards these are the current files with their respective file size:

3.4G  14.cub
3.4G  15.cub
3.2G  16.cub
3.3G  17.cub
3.2G  18.cub
2.7G  19.cub
2.4G  1A.cub

Please let me know if any other information is needed, I'm happy to help!

@florish
Copy link
Author

florish commented Jan 10, 2023

Just after posting the above, I found https://hexdocs.pm/cubdb/CubDB.html#current_db_file/1 in the docs.

Running that gives me "1A.cub" as the current db file (last file in the list above). After moving all the other files, I still have a working CubDB instance with the expected number of records, so this seems to validate that disk usage is OK now (2.4GB after is similar to 2.5GB before, which it should be).

This does leave one question open: is it a bug in CubDB that unused previous database files aren't removed from the file system? Or is this expected behaviour and is it up to me as a user to clean up the data directory once in a while?

@lucaong
Copy link
Owner

lucaong commented Jan 16, 2023

Hi @florish , thanks for reporting this. Yes, it looks like a bug: the old .cub files should be removed automatically. Let me look into it, I might come back with more questions.

Meanwhile, if you do notice something suspicious in your logs (such as the CubDB.CleanUp process crashing), do let me know :)

@lucaong
Copy link
Owner

lucaong commented Jan 16, 2023

I tried to reproduce it manually and with some automated tests, but the old files are always cleaned up upon compaction.

A clean up of old files can be blocked if some read operation started before the compaction (so it references an old file), and did not yet complete. When the read operation completes, the clean up should be performed as soon as the first write operation occurs, but if the application is restarted before the clean up, the next clean up is triggered upon the next compaction. In your case, you performed a manual compaction, which seems to have completed (the data directory only contains .cub files and no .compact files, which would point to a still ongoing compaction), but that did not clean up old files. This is strange, likely to be a bug, but so far I failed to reproduce it.

One possibility, although unlikely, is that a snapshot or read operation was still referencing those old files, preventing clean up. For example, this would block clean up of old files indefinitely:

# Spawn a process that creates a snapshot of the database, and waits indefinitely for it:
spawn(fn ->
  CubDB.with_snapshot(db, fn snap ->
    # Wait indefinitely for a message
    receive do
      x -> x
    end
  end)
end)

# This compaction will succeed, but will not cleanup the old file, because a snapshot still references it
CubDB.compact(db)

Do you think it's possible that something like that occurred?

As a side note, if you still have this information, it would be interesting to know the creation time of those CubDB files from 14.cub to 19.cub.

@florish
Copy link
Author

florish commented Jan 16, 2023

Hi @lucaong , thank for the detailed response!

If you still have this information, it would be interesting to know the creation time of those CubDB files from 14.cub to 19.cub.

Here are the creation times:

2023-01-10 20:06 14.cub
2023-01-10 20:08 15.cub
2023-01-10 20:10 16.cub
2023-01-10 20:13 17.cub
2023-01-10 20:15 18.cub
2023-01-10 20:16 19.cub
2023-01-10 20:18 1A.cub

As you can see, all created within a time frame of 12 minutes. As I was actively working on data changes on my local machine, it could very well be that I've stopped and started the application multiple times, which could have triggered the situation you describe above. I'm not able to reproduce the exact order of migration and (manual) compacting steps, unfortunately.

One possibility, although unlikely, is that you had a snapshot or read operation (possibly obtained with CubDB.snapshot using a long or infinite timeout, or CubDB.with_snapshot with the inner function blocked) still referencing those old files, preventing clean up.

I haven't used snapshot functions much, and certainly not with non-standard timeouts, so I'm pretty sure this wasn't the case in my situation.

When the read operation completes, the clean up should be performed as soon as the first write operation occurs, but if the application is restarted before the clean up, the next clean up is triggered upon the next compaction.

After migrating the data, I did not perform any writes before posting the issue (I am 100% sure, as this is on my local machine, so no other users could have been writing to CubDB without me knowing about it). If I understand correctly, then this could be the reason for the old files still being around. I'm not sure how to reproduce this, however.

Hope this information helps. Let me know if you need anything else!

@lucaong
Copy link
Owner

lucaong commented Jan 16, 2023

Thank you @florish ,
I will keep investigating this. As the timestamps show multiple compactions one after another, I'll concentrate my efforts on that kind of situation.

Anyway, your solution of manually removing old files is perfectly valid. You might want to make sure that calling CubDB.compact now creates a new file without leaving the old ones around (you might see both the .cub file and a .compact file while the compaction is ongoing, but after it completes only a new .cub file should remain). If that's the case, all is fine.

@lucaong
Copy link
Owner

lucaong commented Jan 16, 2023

One idea of how it could happen. Suppose you copy over the keys like in this script:

{:ok, db} = CubDB.start_link(data_dir)
# Original keys:
CubDB.put_multi(db, (1..1000) |> Enum.map(fn x -> {x, x} end))

# Move each entry to a new key
CubDB.select(db)
|> Stream.each(fn {k, v} ->
  CubDB.put(db, "new-#{k}", v)
  CubDB.delete(db, k)
end)
|> Stream.run()

In this case, since the copying is executed in a lazy stream, the select keeps a reference to the original file until the whole stream completes, so automatic compactions won't be able to clean the old files until the very end. If the app or script is stopped right after, it might not have time to clean up the old files.

What does not convince me is that manually running CubDB.compact afterwards should still clean up the old files.

I would suggest to try calling CubDB.compact and see if the compaction results in a single new .cub file, or if it leaves behind old .cub files. If the result is a single .cub file (after allowing for the compaction to complete, for example by checking with ls that there is no .compact file around anymore), the situation you observed might be just the byproduct of the specific actions when copying the keys, and should not occur during normal operations. If instead you still end up with more than one .cub files, there is something odd going on.

In case you observe something out of ordinary, you can also try subscribe to debugging events about compaction in your console and see what you get. Normally, this is what you should observe (notice the :clean_up_started message, sent when cleaning up the old files):

{:ok, db} = CubDB.start_link(data_dir)
CubDB.subscribe(db)

CubDB.compact(db)

# Wait for compaction to end, or call flush repeatedly

flush
# =>
# :compaction_started
# :compaction_completed
# :clean_up_started
# :catch_up_completed
# :ok

@florish
Copy link
Author

florish commented Jan 17, 2023

@lucaong I've followed the steps (start_link, subscribe, compact) with my local 2.5GB database, and the .compact files are cleaned up afterwards as expected. Even when I quit the application during compaction (Ctrl+C twice from an iex session), a next compaction (triggered manually by myself) removes the unfinished .compact file and creates a new one.

So the cleanup on compaction seems to work as expected. As in my case, multiple .cub files were still in place (no .compact files), I'm suspecting something else is going on. But as I'm not sure what, maybe we should just close this issue for now. I'll reopen the issue in case of new findings.

@lucaong
Copy link
Owner

lucaong commented Jan 17, 2023

Thanks a lot @florish , this is very useful. There is definitely something odd going on, and I'd love to find it and fix it. I'll investigate more.

What version of CubDB and OS are you using? Also, when you call flush, do you see the same messages as I posted above (:compaction_started, :compaction_completed, :clean_up_started, :catch_up_completed)?

@florish
Copy link
Author

florish commented Jan 18, 2023

Thanks a lot @florish , this is very useful. There is definitely something odd going on, and I'd love to find it and fix it. I'll investigate more.

Great!

What version of CubDB and OS are you using?

CubDB 2.0.2 on macOS 11.7, with Elixir 1.14.2.

Also, when you call flush, do you see the same messages as I posted above (:compaction_started, :compaction_completed, :clean_up_started, :catch_up_completed)?

Yes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants