Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vacuum feature #411

Merged
merged 3 commits into from
Jan 13, 2018
Merged

Vacuum feature #411

merged 3 commits into from
Jan 13, 2018

Conversation

cryptocode
Copy link
Contributor

@cryptocode cryptocode commented Jan 5, 2018

This PR introduces a --vacuum feature on rai_node / rai_wallet.

Test case, 13GB wallet on a fast SSD:

  • Vacuuming took about 3 minutes
  • Size after vacuum: 5GB

Since lmdb doesn't offer inline vacuuming, the vacuuming dumps live mdb pages to a copy of the db, which is finally replaced by the compacted db.

A general question about the code base: There doesn't seem to be a good place for shared constants (such as strings) in the various subprojects. Should we introduce some sort of constants.hpp? Sprinkling copies of string constants like "data.ldb" around is brittle and hard to maintain.

@@ -492,6 +492,7 @@ class node : public std::enable_shared_from_this <rai::node>
{
alarm.service.post (action_a);
}
void copy_with_compaction ();
void send_keepalive (rai::endpoint const &);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I avoided fixing existing indentation problems, since there's a patch already for that.

@@ -1527,6 +1527,17 @@ rai::node::~node ()
stop();
}

void rai::node::copy_with_compaction ()
{
auto vacuum_path = application_path / "vacuumed.ldb";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output file path should probably an argument.

This might not need to be a function either, since it's pretty common practice throughout the code to access node.store directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why that would be useful. The fact that it's a separate file is an implementation detail that the user shouldn't are about. In fact, it should probably write to a tmp-file with a unique name.

Copy link
Contributor

@lukealonso lukealonso Jan 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the caller know what the output file name is, to use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that would need to be communicated if I change this to use tmp-files. But for now, using a fixed name seems reasonable enough - it's not like anyone should put conflicting ldb files into the data folder :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could move this entirely into the other block, since it only has one caller, and only one potential caller with the hardcoded name - and avoid spreading the hardcoded names any further.

Copy link
Contributor Author

@cryptocode cryptocode Jan 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note the comment above copy_with_compaction. The inactive_node needs to go out of scope before renaming is possible, otherwise the data.ldb file is locked (mdb_env closes the files when the node goes out of scope)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto data_path = application_path / "data.ldb";
auto vacuum_path = application_path / "vacuumed.ldb";

bool success = false;
{
    inactive_node node (data_path);
    success = !mdb_env_copy2 (node.node->store.environment.environment, vacuum_path.string ().c_str (), MDB_CP_COMPACT);
}

if (success)
{
   std::cout << "Finalizing" << std::endl;
   // Note that these throw on failure
   boost::filesystem::remove (data_path);
   boost::filesystem::rename (vacuum_path, data_path);
   std::cout << "Vacuum completed" << std::endl;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, if it's really a useful function that needs to be on node, it should be more generic and either return the path or take a path. Either way makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can make it take a path, makes sense.

std::cout << "Finalizing" << std::endl;

// Note that these throw on failure
boost::filesystem::remove (data_path / "data.ldb");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename this to backup.ldb instead of removing, and remove backup.ldb?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate? The last step after mdb's compaction copy is to replace the original with the vacuumed copy.

Copy link
Contributor

@lukealonso lukealonso Jan 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a nice bonus to retain the previous data.ldb for at least one more run, in case something bad happens (not indicated by the return code) during the vacuum.

Copy link
Contributor

@lukealonso lukealonso Jan 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like:

generate vacuum.ldb
delete backup.ldb
rename data.ldb backup.ldb
rename vacuum.ldb data.ldb

then you're left with backup.ldb and data.ldb at the end of the process.

Copy link
Contributor Author

@cryptocode cryptocode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukealonso convinced me to retain the pre-vacuum db as a backup. I'll prepare a new commit.

Copy link
Contributor

@lukealonso lukealonso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@androm3da androm3da left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's been a flurry of activity on master, please rebase the changes and make sure they're satisfactory to the CI.

Also, you've resolved to make a backup of the prior database. That seems like a good idea, please include that also.

@cryptocode
Copy link
Contributor Author

@androm3da The backup functionality is already committed. I'll resolve the conflicts after the flurry :)

@cryptocode
Copy link
Contributor Author

@androm3da rebased

@cryptocode
Copy link
Contributor Author

@androm3da Good to go? Still says requested changes after the last commit.

boost::filesystem::path data_path;
if (vm.count ("data_path"))
{
data_path = boost::filesystem::path (vm["data_path"].as <std::string> ());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change this to a ternary expression.

}
catch (const boost::filesystem::filesystem_error& ex)
{
std::cout << "Vacuum failed during a file operation: " << ex.what() << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please write this to std:cerr instead.

}
catch (...)
{
std::cout << "Vacuum failed" << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@cryptocode
Copy link
Contributor Author

@androm3da updated, thanks for the feedback.

@androm3da
Copy link
Contributor

glibc detects double free/heap corruption when running node.bootstrap_no_publish. Perhaps this is just incidental and not related to your change? If so, let me know and we should investigate that issue.

from https://travis-ci.org/clemahieu/raiblocks/jobs/327608083

 RUN      ] node.bootstrap_no_publish
*** Error in `./core_test': double free or corruption (!prev): 0x0000000002002f90 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7fee969fa7e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7fee96a0337a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7fee96a0753c]
./core_test(_ZN5boost4asio6detail18object_pool_access7destroyINS1_13epoll_reactor16descriptor_stateEEEvPT_+0x26)[0xc88909]
./core_test(_ZN5boost4asio6detail11object_poolINS1_13epoll_reactor16descriptor_stateEE12destroy_listEPS4_+0x3e)[0xc8612a]
./core_test(_ZN5boost4asio6detail11object_poolINS1_13epoll_reactor16descriptor_stateEED1Ev+0x39)[0xc81d3d]
./core_test(_ZN5boost4asio6detail13epoll_reactorD1Ev+0x64)[0xc794c6]
./core_test(_ZN5boost4asio6detail13epoll_reactorD0Ev+0x18)[0xc79534]
./core_test(_ZN5boost4asio6detail16service_registry7destroyEPNS0_17execution_context7serviceE+0x2a)[0xbaead4]
./core_test(_ZN5boost4asio6detail16service_registry16destroy_servicesEv+0x39)[0xbaea99]
./core_test(_ZN5boost4asio17execution_context7destroyEv+0x1b)[0xbaeb67]
./core_test(_ZN5boost4asio17execution_contextD2Ev+0x25)[0xbaeafd]
./core_test(_ZN5boost4asio10io_contextD1Ev+0x18)[0xc7bdfc]
./core_test(_ZN3rai6systemD1Ev+0xe4)[0xf5ab58]
./core_test(_ZN30node_bootstrap_no_publish_Test8TestBodyEv+0x906)[0xcbe96a]
./core_test(_ZN7testing8internal38HandleSehExceptionsInMethodIfSupportedINS_4TestEvEET0_PT_MS4_FS3_vEPKc+0x65)[0x113a084]
./core_test(_ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS_4TestEvEET0_PT_MS4_FS3_vEPKc+0x5a)[0x113500d]
./core_test(_ZN7testing4Test3RunEv+0xd0)[0x111ad86]
./core_test(_ZN7testing8TestInfo3RunEv+0x102)[0x111b62e]
./core_test(_ZN7testing8TestCase3RunEv+0x101)[0x111bd1d]
./core_test(_ZN7testing8internal12UnitTestImpl11RunAllTestsEv+0x2b8)[0x1122dfa]
./core_test(_ZN7testing8internal38HandleSehExceptionsInMethodIfSupportedINS0_12UnitTestImplEbEET0_PT_MS4_FS3_vEPKc+0x65)[0x113b64d]
./core_test(_ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS0_12UnitTestImplEbEET0_PT_MS4_FS3_vEPKc+0x5a)[0x1135e17]
./core_test(_ZN7testing8UnitTest3RunEv+0xb6)[0x1121898]
./core_test(_Z13RUN_ALL_TESTSv+0x11)[0x1114b84]
./core_test(main+0x31)[0x1114b1e]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fee969a3830]
./core_test(_start+0x29)[0xba36c9]
======= Memory map: ========
00400000-015a2000 r-xp 00000000 08:01 1211007                            /workspace/build/core_test
017a2000-017ec000 r--p 011a2000 08:01 1211007                            /workspace/build/core_test
017ec000-017ed000 rw-p 011ec000 08:01 1211007                            /workspace/build/core_test
017ed000-017f8000 rw-p 00000000 00:00 0 
01a38000-022d9000 rw-p 00000000 00:00 0                                  [heap]
75ee48000000-75ee48021000 rw-p 00000000 00:00 0 
75ee48021000-75ee4c000000 ---p 00000000 00:00 0 
75ee4c000000-75ee4c021000 rw-p 00000000 00:00 0 
75ee4c021000-75ee50000000 ---p 00000000 00:00 0 
75ee50000000-75ee50021000 rw-p 00000000 00:00 0 
75ee50021000-75ee54000000 ---p 00000000 00:00 0 
75ee54000000-75ee54021000 rw-p 00000000 00:00 0 
75ee54021000-75ee58000000 ---p 00000000 00:00 0 
76ee58000000-76ee58021000 rw-p 00000000 00:00 0 
76ee58021000-76ee5c000000 ---p 00000000 00:00 0 
7cee68000000-7cee68021000 rw-p 00000000 00:00 0 
7cee68021000-7cee6c000000 ---p 00000000 00:00 0 
7cee6c000000-7cee6c021000 rw-p 00000000 00:00 0 
7cee6c021000-7cee70000000 ---p 00000000 00:00 0 
7dee70000000-7dee70021000 rw-p 00000000 00:00 0 
7dee70021000-7dee74000000 ---p 00000000 00:00 0 
7dee74000000-7dee74023000 rw-p 00000000 00:00 0 
7dee74023000-7dee78000000 ---p 00000000 00:00 0 
7dee78000000-7dee78021000 rw-p 00000000 00:00 0 
7dee78021000-7dee7c000000 ---p 00000000 00:00 0 
7dee7c000000-7dee7c021000 rw-p 00000000 00:00 0 
7dee7c021000-7dee80000000 ---p 00000000 00:00 0 
7eee80000000-7eee80021000 rw-p 00000000 00:00 0 
7eee80021000-7eee84000000 ---p 00000000 00:00 0 
7eee84000000-7eee84021000 rw-p 00000000 00:00 0 
7eee84021000-7eee88000000 ---p 00000000 00:00 0 
7eee88000000-7eee88021000 rw-p 00000000 00:00 0 
7eee88021000-7eee8c000000 ---p 00000000 00:00 0 
7eee8d5e5000-7eee8d5fb000 r-xp 00000000 08:01 920428                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7eee8d5fb000-7eee8d7fa000 ---p 00016000 08:01 920428                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7eee8d7fa000-7eee8d7fb000 rw-p 00015000 08:01 920428                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7eee8d7fb000-7eee8d7fc000 ---p 00000000 00:00 0 
7eee8d7fc000-7eee8dffc000 rw-p 00000000 00:00 0 
7eee8dffc000-7eee8dffd000 ---p 00000000 00:00 0 
7eee8dffd000-7eee8e7fd000 rw-p 00000000 00:00 0 
7eee8e7fd000-7eee8e7fe000 ---p 00000000 00:00 0 
7eee8e7fe000-7eee8effe000 rw-p 00000000 00:00 0 
7eee8effe000-7eee8efff000 ---p 00000000 00:00 0 
7eee8efff000-7eee8f7ff000 rw-p 00000000 00:00 0 
7eee8f7ff000-7eee8f800000 ---p 00000000 00:00 0 
7eee8f800000-7eee90000000 rw-p 00000000 00:00 0 
7eee90000000-7fee90000000 r--s 00000000 08:01 1443592                    /root/RaiBlocksTest/de36-4ffa-af46-54fb/data.ldb
7fee90000000-7fee90021000 rw-p 00000000 00:00 0 
7fee90021000-7fee94000000 ---p 00000000 00:00 0 
7fee9413f000-7fee94140000 ---p 00000000 00:00 0 
7fee94140000-7fee94940000 rw-p 00000000 00:00 0 
7fee94940000-7fee94941000 ---p 00000000 00:00 0 
7fee94941000-7fee95141000 rw-p 00000000 00:00 0 
7fee95141000-7fee9514c000 r-xp 00000000 08:01 920456                     /lib/x86_64-linux-gnu/libnss_files-2.23.so
7fee9514c000-7fee9534b000 ---p 0000b000 08:01 920456                     /lib/x86_64-linux-gnu/libnss_files-2.23.so
7fee9534b000-7fee9534c000 r--p 0000a000 08:01 920456                     /lib/x86_64-linux-gnu/libnss_files-2.23.so
7fee9534c000-7fee9534d000 rw-p 0000b000 08:01 920456                     /lib/x86_64-linux-gnu/libnss_files-2.23.so
7fee9534d000-7fee95353000 rw-p 00000000 00:00 0 
7fee95353000-7fee9535e000 r-xp 00000000 08:01 920460                     /lib/x86_64-linux-gnu/libnss_nis-2.23.so
7fee9535e000-7fee9555d000 ---p 0000b000 08:01 920460                     /lib/x86_64-linux-gnu/libnss_nis-2.23.so
7fee9555d000-7fee9555e000 r--p 0000a000 08:01 920460                     /lib/x86_64-linux-gnu/libnss_nis-2.23.so
7fee9555e000-7fee9555f000 rw-p 0000b000 08:01 920460                     /lib/x86_64-linux-gnu/libnss_nis-2.23.so
7fee9555f000-7fee95575000 r-xp 00000000 08:01 920450                     /lib/x86_64-linux-gnu/libnsl-2.23.so
7fee95575000-7fee95774000 ---p 00016000 08:01 920450                     /lib/x86_64-linux-gnu/libnsl-2.23.so
7fee95774000-7fee95775000 r--p 00015000 08:01 920450                     /lib/x86_64-linux-gnu/libnsl-2.23.so
7fee95775000-7fee95776000 rw-p 00016000 08:01 920450                     /lib/x86_64-linux-gnu/libnsl-2.23.so
7fee95776000-7fee95778000 rw-p 00000000 00:00 0 
7fee95778000-7fee95780000 r-xp 00000000 08:01 920452                     /lib/x86_64-linux-gnu/libnss_compat-2.23.so
7fee95780000-7fee9597f000 ---p 00008000 08:01 920452                     /lib/x86_64-linux-gnu/libnss_compat-2.23.so
7fee9597f000-7fee95980000 r--p 00007000 08:01 920452                     /lib/x86_64-linux-gnu/libnss_compat-2.23.so
7fee95980000-7fee95981000 rw-p 00008000 08:01 920452                     /lib/x86_64-linux-gnu/libnss_compat-2.23.so
7fee95981000-7fee95982000 ---p 00000000 00:00 0 
7fee95982000-7fee96182000 rw-p 00000000 00:00 0 
7fee96182000-7fee96183000 ---p 00000000 00:00 0 
7fee96183000-7fee96983000 rw-p 00000000 00:00 0 
7fee96983000-7fee96b43000 r-xp 00000000 08:01 920407                     /lib/x86_64-linux-gnu/libc-2.23.so
7fee96b43000-7fee96d43000 ---p 001c0000 08:01 920407                     /lib/x86_64-linux-gnu/libc-2.23.so
7fee96d43000-7fee96d47000 r--p 001c0000 08:01 920407                     /lib/x86_64-linux-gnu/libc-2.23.so
7fee96d47000-7fee96d49000 rw-p 001c4000 08:01 920407                     /lib/x86_64-linux-gnu/libc-2.23.so
7fee96d49000-7fee96d4d000 rw-p 00000000 00:00 0 
7fee96d4d000-7fee96e55000 r-xp 00000000 08:01 920439                     /lib/x86_64-linux-gnu/libm-2.23.so
7fee96e55000-7fee97054000 ---p 00108000 08:01 920439                     /lib/x86_64-linux-gnu/libm-2.23.so
7fee97054000-7fee97055000 r--p 00107000 08:01 920439                     /lib/x86_64-linux-gnu/libm-2.23.so
7fee97055000-7fee97056000 rw-p 00108000 08:01 920439                     /lib/x86_64-linux-gnu/libm-2.23.so
7fee97056000-7fee97059000 r-xp 00000000 08:01 920420                     /lib/x86_64-linux-gnu/libdl-2.23.so
7fee97059000-7fee97258000 ---p 00003000 08:01 920420                     /lib/x86_64-linux-gnu/libdl-2.23.so
7fee97258000-7fee97259000 r--p 00002000 08:01 920420                     /lib/x86_64-linux-gnu/libdl-2.23.so
7fee97259000-7fee9725a000 rw-p 00003000 08:01 920420                     /lib/x86_64-linux-gnu/libdl-2.23.so
7fee9725a000-7fee97272000 r-xp 00000000 08:01 920475                     /lib/x86_64-linux-gnu/libpthread-2.23.so
7fee97272000-7fee97471000 ---p 00018000 08:01 920475                     /lib/x86_64-linux-gnu/libpthread-2.23.so
7fee97471000-7fee97472000 r--p 00017000 08:01 920475                     /lib/x86_64-linux-gnu/libpthread-2.23.so
7fee97472000-7fee97473000 rw-p 00018000 08:01 920475                     /lib/x86_64-linux-gnu/libpthread-2.23.so
7fee97473000-7fee97477000 rw-p 00000000 00:00 0 
7fee97477000-7fee9749d000 r-xp 00000000 08:01 920387                     /lib/x86_64-linux-gnu/ld-2.23.so
7fee9768d000-7fee97691000 rw-p 00000000 00:00 0 
7fee97697000-7fee97698000 rw-p 00000000 00:00 0 
7fee97698000-7fee9769a000 rw-s 00000000 08:01 1443591                    /root/RaiBlocksTest/de36-4ffa-af46-54fb/data.ldb-lock
7fee9769a000-7fee9769c000 rw-p 00000000 00:00 0 
7fee9769c000-7fee9769d000 r--p 00025000 08:01 920387                     /lib/x86_64-linux-gnu/ld-2.23.so
7fee9769d000-7fee9769e000 rw-p 00026000 08:01 920387                     /lib/x86_64-linux-gnu/ld-2.23.so
7fee9769e000-7fee9769f000 rw-p 00000000 00:00 0 
7ffc8f05e000-7ffc8f07f000 rw-p 00000000 00:00 0                          [stack]
7ffc8f1dd000-7ffc8f1df000 r--p 00000000 00:00 0                          [vvar]
7ffc8f1df000-7ffc8f1e1000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
timeout: the monitored command dumped core
./ci/test.sh: line 35:  1331 Aborted                 ${TIMEOUT_CMD} ${TIMEOUT_TIME_ARG} ${TIMEOUT_SEC-600} ./core_test

@cryptocode
Copy link
Contributor Author

cryptocode commented Jan 12, 2018

@androm3da CI succeeded on the first commits, the failure comes on the last commit where I changed to the ternary, as you requested. I have a hard time imagining that causing a double free ;)

A lot of CI tsan errors also appear in CI failures in commits on master. Could that be the reason for the bootstrap_no_publish issue? Also, won't the failures on master trickle into rebased PR's?

@androm3da androm3da merged commit 6c3c88f into nanocurrency:master Jan 13, 2018
@cryptocode cryptocode deleted the compaction branch January 13, 2018 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants