Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pruning #4843

Merged
merged 1 commit into from Jan 28, 2019
Merged

Pruning #4843

merged 1 commit into from Jan 28, 2019

Conversation

moneromooo-monero
Copy link
Collaborator

[Not quite totally cleaned up yet, but close enough for a first review.]

The blockchain prunes seven eighths of prunable tx data.
This saves about two thirds of the blockchain size, while
keeping the node useful as a sync source for an eighth
of the blockchain.

No other data is currently pruned.

@ghost
Copy link

ghost commented Nov 13, 2018

Is there a way to build in redundancy with error codes to be able to reconstruct the whole blockchain from multiple parties, or is there too much entropy?

@moneromooo-monero
Copy link
Collaborator Author

ECC are used when the sender has the info, and the transmission medium is unreliable. Here, the medium is reliable (TCP), and the sender does not have the info, so can't built the ECC in the first place.

@ghost
Copy link

ghost commented Nov 14, 2018

What if the ECC is added by each node successfully storing the block, then pruning away large amounts of the data but retaining enough that if a threshold can collaborate then they can reconstruct.

e.g. if I know you'll store every even numbered transaction, and I'll store every odd numbered one, we can prune individual storage requirements to 50% + some overlap for redundancy.

@moneromooo-monero
Copy link
Collaborator Author

All ready now. Please review.

@moneromooo-monero
Copy link
Collaborator Author

Needs #4949.

@moneromooo-monero
Copy link
Collaborator Author

And #4950

@moneromooo-monero
Copy link
Collaborator Author

@SamsungGalaxyPlayer
Copy link
Collaborator

SamsungGalaxyPlayer commented Dec 14, 2018

Do we want 1/8 to be a good measure going forward, or do we want to instead say to keep the full set of data for approximately the past 6 months? In 4 years, for example, 1/8 will be a longer time window than it is now.

@moneromooo-monero
Copy link
Collaborator Author

I do not see the link between the two parts, can you expand ?

Copy link
Contributor

@stoffu stoffu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gone through the whole except the networking part (block_queue, cryptonote_protocol_handler, net_node)

@@ -60,7 +61,8 @@ namespace {
peer_id_str >> id_str;
epee::string_tools::xtype_to_string(peer.port, port_str);
std::string addr_str = ip_str + ":" + port_str;
tools::msg_writer() << boost::format("%-10s %-25s %-25s %s") % prefix % id_str % addr_str % elapsed;
std::string pruning_seed = epee::string_tools::to_string_hex(peer.pruning_seed);
tools::msg_writer() << boost::format("%-10s %-25s %-4s %-25s %s") % prefix % id_str % addr_str % pruning_seed % elapsed;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the newly added formatter %-4s should come to the second last in the list.

ASSERT_EQ(tools::get_pruning_log_stripes(tools::make_pruning_seed(1, 7)), 7);
ASSERT_EQ(tools::get_pruning_log_stripes(tools::make_pruning_seed(7, 7)), 7);

for (uint32_t log_stripes = 1; log_stripes < tools::PRUNING_SEED_LOG_STRIPES_SHIFT; ++log_stripes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the test belog_stripes <= PRUNING_SEED_LOG_STRIPES_MASK?

uint32_t make_pruning_seed(uint32_t stripe, uint32_t log_stripes)
{
CHECK_AND_ASSERT_THROW_MES(log_stripes <= PRUNING_SEED_LOG_STRIPES_MASK, "log_stripes out of range");
CHECK_AND_ASSERT_THROW_MES(stripe > 0 && stripe <= (1ul << log_stripes), "stripe out of range");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the same log_stripes ? log_stripes : CRYPTONOTE_PRUNING_LOG_STRIPES switch be used here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it wouldn't make sense here, unless you see a case where it would.
Testing log_stripes is when you receive a seed and it might be 0 (ie, the peer is not striped). If you build a seed, having no stripe makes no sense.

if (h + CRYPTONOTE_PRUNING_TIP_BLOCKS > blockchain_height)
return blockchain_height - CRYPTONOTE_PRUNING_TIP_BLOCKS;
if (h < block_height)
return block_height;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how the execution can reach here. Shouldn't this be considered as a logic error?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, after reading the code, it is an unnecessary test. I'll make it into an assert.

{
uint32_t pruning_seed = tools::get_pruning_seed(h, H, CRYPTONOTE_PRUNING_LOG_STRIPES);
ASSERT_EQ(pruning_seed, 0);
for (pruning_seed = 0; pruning_seed <= 1 << CRYPTONOTE_PRUNING_LOG_STRIPES; ++pruning_seed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe wrap 1 << CRYPTONOTE_PRUNING_LOG_STRIPES with parentheses just like others (though not needed grammar-wise)?


if (n == 1)
{
paths[1] = boost::filesystem::path(data_dir) / (db->get_db_name() + "-pruned");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be helpful to allow the user to customize the temporary DB file path in case the free disk space is very limited.

@@ -9506,7 +9534,7 @@ void wallet2::set_tx_key(const crypto::hash &txid, const crypto::secret_key &tx_
COMMAND_RPC_GET_TRANSACTIONS::request req = AUTO_VAL_INIT(req);
req.txs_hashes.push_back(epee::string_tools::pod_to_hex(txid));
req.decode_as_json = false;
req.prune = false;
req.prune = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this also be accompanied by get_pruned_tx like the other instances?

THROW_WALLET_EXCEPTION_IF(!cryptonote::parse_and_validate_tx_from_blob(bd, tx, tx_hash, tx_prefix_hash), error::wallet_internal_error, "failed to parse tx from blob");
THROW_WALLET_EXCEPTION_IF(tx_hash != txid, error::wallet_internal_error, "txid mismatch");
crypto::hash tx_hash;
THROW_WALLET_EXCEPTION_IF(!get_pruned_tx(res.txs[0], tx, tx_hash), error::wallet_internal_error, "failed to get tx from daemon");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the line req.prune = false; kept unchanged?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I left it as is for code I did not know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function check_spend_proof verifies each ring signature included in the input string. To do so, it uses the tx prefix data (vin) and queries the daemon for the corresponding output keys. I'm pretty sure pruned tx data will do here.

For that matter, I doubt there's any code in the wallet that requires unpruned data from the daemon.

if (res.txs.size() == 1)
ok = string_tools::parse_hexstr_to_binbuff(res.txs.front().as_hex, tx_data);
{
ok = get_pruned_tx(res.txs.front(), tx, tx_hash);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the line req.prune = false; kept unchanged?

@@ -293,6 +297,8 @@ namespace cryptonote
ar.end_object();
}
}
if (!typename Archive<W>::is_saving())
pruned = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this also applicable for v1 tx?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Contributor

@stoffu stoffu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gone through the p2p & protocol part, where I'm a lot less confident

void node_server<t_payload_net_handler>::add_used_stripe_peer(const typename t_payload_net_handler::connection_context &context)
{
const uint32_t stripe = tools::get_pruning_stripe(context.m_pruning_seed);
if (stripe == 0 || stripe >= (1ul << CRYPTONOTE_PRUNING_LOG_STRIPES))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the condition exclude the equality, i.e. stripe > (1ul << CRYPTONOTE_PRUNING_LOG_STRIPES)?

void node_server<t_payload_net_handler>::remove_used_stripe_peer(const typename t_payload_net_handler::connection_context &context)
{
const uint32_t stripe = tools::get_pruning_stripe(context.m_pruning_seed);
if (stripe == 0 || stripe >= (1ul << CRYPTONOTE_PRUNING_LOG_STRIPES))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise

{
const uint64_t want_height_from_blockchain = m_core.get_current_blockchain_height();
const uint64_t want_height_from_block_queue = m_block_queue.get_next_needed_height(want_height_from_blockchain);
const uint64_t want_height = std::max(want_height_from_blockchain, want_height_from_block_queue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding on how block queuing works is weak and probably I'm missing something, but isn't want_height_from_block_queue going to be always greater than want_height_from_blockchain? What's the situation where this isn't the case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically, yes, but if you find another chain, the block queue might have blocks from earlier heights.

return true;
});
const bool use_next = (n_next > m_max_out_peers / 2 && n_subsequent <= 1) || (n_next > 2 && n_subsequent == 0) ||
(n_next > 0 && tools::get_pruning_stripe(want_height_from_block_queue, blockchain_height, CRYPTONOTE_PRUNING_LOG_STRIPES) == subsequent_pruning_stripe);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case where the latter condition

tools::get_pruning_stripe(want_height_from_block_queue, blockchain_height, CRYPTONOTE_PRUNING_LOG_STRIPES) == subsequent_pruning_stripe

holds seems to imply that want_height is set to want_height_from_blockchain which then implies want_height_from_blockchain > want_height_from_block_queue (i.e., if want_height was set to want_height_from_block_queue, the left hand side of the above expression would equal to next_pruning_stripe which always differs from subsequent_pruning_stripe by definition). What does this situation mean? Maybe leaving a comment in the code could be helpful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to be right, this part of the condition should never happen. I'll remove it.

Also, this code is ad hoc to try and find a middle ground between "all on the current stripe to get needed blocks asap" and "start early on future blocks to avoid a pause later while we reconnect", it's a lot of "change it and see how sync speed changes".

{
if (expected < i->start_block_height)
s += std::string(std::max((uint64_t)1, (i->start_block_height - expected) / (i->nblocks ? i->nblocks : 1)), '_');
s += i->blocks.empty() ? "." : i->start_block_height == blockchain_height ? "m" : "o";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string often ends up like [mooooooooooooo..oo.]... so much mooo 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also get small ducks like _< from time to time, rarely though. Not sure you can get the full _o< but I think you can :)

// we might be able to ask for that block directly, as we now can request out of order,
// otherwise we continue out of order, unless this block is the one we need, in which
// case we request block hashes, though it might be safer to disconnect ?
if (start_height > previous_height)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this section be enabled only when the pruning mode is enabled (i.e. m_core.get_blockchain_pruning_seed() > 0)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the local pruning should be irrelevant here. Can you expand on why you think it matters here ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had this question because of this comment

// as we now can request out of order,

I interpreted it as meaning that the new pruning scheme now allows us to request blocks out of order, or conversely, out-of-order request is not allowed when pruning is disabled. My understanding on how block queue and download take place is generally quite limited, so probably my question is irrelevant.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can request (and receive) blocks out of order, regardless of whether we're pruning.

if (current_blockchain_height < target_blockchain_height)
{
uint64_t completion_percent = (current_blockchain_height * 100 / target_blockchain_height);
if (completion_percent == 100) // never show 100% if not actually up to date
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this actually occur when current_blockchain_height < target_blockchain_height?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not...

@@ -1220,27 +1370,22 @@ skip:
std::vector<boost::uuids::uuid> kick_connections;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kick_connections is now unused.

return false;
}
//------------------------------------------------------------------------------------------------------------------------
template<class t_core>
void t_cryptonote_protocol_handler<t_core>::skip_unneeded_hashes(cryptonote_connection_context& context, bool check_block_queue)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function can be made const.

const uint32_t seed = tools::get_pruning_stripe(span.first, context.m_remote_blockchain_height, CRYPTONOTE_PRUNING_LOG_STRIPES);
if (seed != tools::get_pruning_stripe(context.m_pruning_seed))
{
MDEBUG(context << " starting early on next seed (" << span.first << " with seed " << epee::string_tools::to_string_hex(seed) <<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seed seems to actually represent a stripe. I think it needs to be fed to tools::make_pruning_seed in order to be printed like this.

Another question: shouldn't this section be activated only when context.m_pruning_seed > 0?

const uint64_t cycle_start = cycles + ((stripe > block_pruning_stripe) ? 0 : 1);
const uint64_t h = cycle_start * (CRYPTONOTE_PRUNING_STRIPE_SIZE << log_stripes) + (stripe - 1) * CRYPTONOTE_PRUNING_STRIPE_SIZE;
if (h + CRYPTONOTE_PRUNING_TIP_BLOCKS > blockchain_height)
return blockchain_height - CRYPTONOTE_PRUNING_TIP_BLOCKS;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe put blockchain_height < CRYPTONOTE_PRUNING_TIP_BLOCKS ? 0 : blockchain_height - CRYPTONOTE_PRUNING_TIP_BLOCKS just in case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could only trigger if h + CRYPTONOTE_PRUNING_TIP_BLOCKS overflows and blockchain_height is < 5500, so h is very large. Instead, I've added asserts at start that the parameters are in valid range.

Copy link
Contributor

@stoffu stoffu Dec 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could only trigger if h + CRYPTONOTE_PRUNING_TIP_BLOCKS overflows

I wasn't thinking about h being too large, it can trigger simply when blockchain_height < CRYPTONOTE_PRUNING_TIP_BLOCKS even with small h, no? E.g. if h:=10, CRYPTONOTE_PRUNING_TIP_BLOCKS:=5500, blockchain_height:=5000 then 10 + 5500 > 5000 and 5000 - 5500 wraps around.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that can happen, but I've added that change anyway so we're sure.

std::cout << "Instead, parts of the file will be marked as free, so the file will not grow" << std::endl;
std::cout << "until that newly free space is used up. If you want a smaller file size now," << std::endl;
std::cout << "exit monerod and run monero-blockchain-prune (you will temporarily need more" << std::endl;
std::cout << "disk space for the database conversion though. If you are OK with the database" << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A closing parenthesis is missing

THROW_WALLET_EXCEPTION_IF(!cryptonote::parse_and_validate_tx_from_blob(bd, tx, tx_hash, tx_prefix_hash), error::wallet_internal_error, "failed to parse tx from blob");
THROW_WALLET_EXCEPTION_IF(tx_hash != txid, error::wallet_internal_error, "txid mismatch");
crypto::hash tx_hash;
THROW_WALLET_EXCEPTION_IF(!get_pruned_tx(res.txs[0], tx, tx_hash), error::wallet_internal_error, "failed to get tx from daemon");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function check_spend_proof verifies each ring signature included in the input string. To do so, it uses the tx prefix data (vin) and queries the daemon for the corresponding output keys. I'm pretty sure pruned tx data will do here.

For that matter, I doubt there's any code in the wallet that requires unpruned data from the daemon.

@moneromooo-monero
Copy link
Collaborator Author

txs_prunable_{tip, hash} are now dupsort|dupfixed

@moneromooo-monero
Copy link
Collaborator Author

One thing I'm still unsure about is the stripe size. That is the number of contiguous blocks a peer will keep. It's currently set to 4096. Tests by iDunk show that sync performance is significantly better with 16384 [1], as there's less reconnections as sync progresses. However, the larger that value, the more uneven the differences between stripes [2]. Ideally we'd want all stripes to have very similar size, or people will start stripes not at random, which means data duplication will be uneven, making the scarcest data even more scarce. And we don't want that.

[1] This test was done by spoofing stripes so that every single node was treated as if it was striped, which is the worst case sync wise.

[2] This table shows the size difference in MB between the smallest stripe and the largest stripe for stripe sizes between 8 and 65536. For comparison, a striped blockchain will be about 25 GB.

8 69
16 106.98
32 135.02
64 185.98
128 127.24
256 397.05
512 458.24
1024 512.87
2048 719.23
4096 1031.18
8192 1250.13
16384 1783.92
32768 3066.82
65536 7980.35

@moneromooo-monero
Copy link
Collaborator Author

I added a small fix separately, it'll get squashed once it had a chance to get a review.

@@ -668,6 +674,8 @@ block Blockchain::pop_block_from_blockchain()
}
}
}
if (pruned)
MWARNING(pruned << " pruned txes could not be added back to the txpool");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is intended for cases of deep reorg (> 5500 blocks). Do the unpruned txes returned to the pool get re-broadcast to other nodes who may have pruned those txes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is for those deep reorgs, yes. I think those unpruned txes that re-added will get re-broadcast on a timer if they're still in the pool when the timer triggers. However, if we're popping, it's very likely that it's because we've just seen a new longer chain, and that chain will most likely included pretty much all those txes anyway, so they'll get taken out of the pool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying.

@moneromooo-monero
Copy link
Collaborator Author

moneromooo-monero commented Jan 11, 2019

github is not showing the last rev in this "forced pushed" link. Check the commit link itself for the small diff. It should be 04e83ff, not 6620ee3.

@moneromooo-monero
Copy link
Collaborator Author

Rebased to current master.

@iamsmooth
Copy link
Contributor

@NanoAkron @moneromooo-monero For another day: https://en.wikipedia.org/wiki/Erasure_code

@BigslimVdub
Copy link
Contributor

I see you have been hard at work with this. Downstream merges if ever will be quite useful. Thank you for your effort on this (to all that helped with it).

Copy link
Contributor

@vtnerd vtnerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had some of these lined up, will do a better review early tomorrow.

@@ -233,6 +233,15 @@ POP_WARNINGS
return boost::lexical_cast<std::string>(val);
}
//----------------------------------------------------------------------------
template<typename T> inline std::string to_string_hex(const T &val)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is likely only going to work with integers. Floats use a different manipulator, and custom operator<< overloads usually don't lookup std::hex state. I would suggest taking a uint64_t instead of a template and putting the implementation in the cpp. The few running a 32-bit process won't be slowed down terribly. If perf was a concern, an unsigned and uint64_t overload can forward to a templated version in the cpp.

The hex.h file might be a better place - including that file brings in less headers (quicker).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is supposed to work only with integers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, two overloads - std::uintmax_t and std::intmax_t would work for every integer type. Only std::uintmax_t is needed right now. Add int and unsigned if there was some platform performance considerations - the cpp could contain the templated function which is invoked by the non-templated overloads.

Using templates here forces every cpp unit that uses the function to generate ASM, then have the linker filter the extras out. It also means that there were be nearly identical ASM copies, one for each type (short, int, long, and long long + unsigned variants) that was used. Cleaning up the templates in this file is already one of the many todos I have.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made it for just uint32_t since that's what it's used for.

static constexpr uint32_t PRUNING_SEED_STRIPE_SHIFT = 0;
static constexpr uint32_t PRUNING_SEED_STRIPE_MASK = 0xffff;

static inline uint32_t get_pruning_log_stripes(uint32_t pruning_seed) { return (pruning_seed >> PRUNING_SEED_LOG_STRIPES_SHIFT) & PRUNING_SEED_LOG_STRIPES_MASK; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function can be marked constexpr. The static keyword should be removed from both of them. In this context static forces hidden visibility which prevents the linker from merging the copies across compilation units.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you eplain why ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure which point -

constexpr isn't all that important here. There aren't any strong argument for or against in this situation. If a function is trivially constexpr, I add the keyword because its usable in more contexts and kind of indicates to other programmers that the function is "pure" (but that's not really a guarantee, sadly).

static in namespace scope tells the compiler to hide the function from other compilation units (internal linkage). This is nice if the function is in a cpp because it prevents a naming collision, but when used in a header it potentially results in multiple copies of the functions within the executable. Its best to remove static from the namespace variables in the header too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i meant "why constexpr", and "why not static". I use "static inline" as "this should be inlined, and don't emit a public version, this won't be called from another TU". AFAIK constexpr means "this can be collapsed to a constant at compile time", which seems to be not quite the whole truth, since calls to this are made with variables, so part of the question is "what does constexpr do in this case". This is from my C background though, so my question is really "why is your way better because I'm not changing it if I don't understand why it is".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK constexpr means "this can be collapsed to a constant at compile time", which seems to be not quite the whole truth, since calls to this are made with variables, so part of the question is "what does constexpr do in this case".

A constexpr function can be evaluated at compile or runtime. If one or more of the arguments are not a compile-time literal, then the function is evaluated at runtime. It allows for a single implementation to work in either situation. It seemed like it might be useful generating some constants with this function (at a glance), maybe not. Generally if the function is this simple I just mark it as constexpr since it can be used in several contexts.

I use "static inline" as "this should be inlined, and don't emit a public version, this won't be called from another TU.

If the compiler inlines the function, then it doesn't matter because it pretty much did what you wanted (the function ASM is jammed into another). If the compiler chooses to not inline the function (which it is allowed to do, even with this keyword), then multiple copies of the function can exist in the executable. Removing the static keyword makes the function copies from each TU visible to the linker, which is required to merge them into a single version (one-definition rule). There are advantages to hidden visibility, I'm just not seeing them in this situation. Luckily, these are so simple that there is a high probability that they get inlined anyway.

@@ -148,6 +149,17 @@

#define BULLETPROOF_MAX_OUTPUTS 16

#if 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old, now irrelevant comment

m_p2p->for_each_connection([&](const connection_context &ctx, nodetool::peerid_type peer_id, uint32_t support_flags) {
const uint32_t stripe = tools::get_pruning_stripe(ctx.m_pruning_seed);
char state_char = cryptonote::get_protocol_state_char(ctx.m_state);
ss << std::to_string(stripe) + state_char;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_string is a waste here; ss << stripe << state_char.

mdb_env_close(env);
}

static int compare_uint64(const MDB_val *a, const MDB_val *b)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When making that future change, I recommend changing these to memcpy for portability. These current compare functions violate the aliasing rule, and will fail in some cases if the pointer is not aligned. The compiler will automatically convert a memcpy into what is "desired" here on platforms that support unaligned memory access (i.e. on amd64 the code generated is identical in my disassembling tests).

mdb_env_stat(env, &mst);

uint64_t new_mapsize = mei.me_mapsize + bytes;
new_mapsize += (new_mapsize % mst.ms_psize);
Copy link
Contributor

@vtnerd vtnerd Jan 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't LMDB do this page size logic internally? Not a big deal, but the LMDB call seems unnecessary.

The other funkiness is that the lmdb call uses size_t and some overflow funkiness could occur across platforms.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you mean memcmp. If we do that, then the sorting will be reversed on little/big endian archs, no ?

About the page size logic, do you mean the quantizing ? It probably does it internally, yes. I suppose this can be removed, it just felt better I guess.

Your last sentence got clipped.

Copy link
Contributor

@vtnerd vtnerd Jan 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you mean memcmp. If we do that, then the sorting will be reversed on little/big endian archs, no ?

I think this is meant for the comment above. I meant memcpy instead of the cast to an integer which violates the aliasing rule.

About the page size logic, do you mean the quantizing ? It probably does it internally, yes. I suppose this can be removed, it just felt better I guess.

Yes, I meant that alignment logic.

Your last sentence got clipped.

I was commenting on how this was doing math on size_t, assigning to uint64_t, and then implicitly converting back to size_t in the call to resize. Overflows are unlikely on 64-bit platforms, but some platforms still use 32-bit size_t (thus overflow before assignment to uint64_t).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two of your comments show with a single reply box so yes, this is replying to the two comments.

About the types, I now changed it to be exactly like the existing, wihch goes through double, and is shown to work on 32 bits.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the memcmp, I'll change that once those functions get merged in src/blockchain_db/lmdb

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the types, I now changed it to be exactly like the existing, wihch goes through double, and is shown to work on 32 bits.

I'm not sure what this means. I was originally commented on the:

uint64_t new_mapsize = mei.me_mapsize + bytes;
new_mapsize += (new_mapsize % mst.ms_psize);

and I wasn't immediately sure whether that could overflow before assignment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean I changed it to:

  uint64_t new_mapsize = (double)mei.me_mapsize + bytes;
  new_mapsize += (new_mapsize % mst.ms_psize);

which is what the existing code does in db_lmdb.cpp, and this one is shown to work on 32 bit archs since some people ran monerod there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why double instead of uint64_t?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't know. I don't think I wrote the original. I do agree uint64_t would seem more suited to it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at git, it's from ND ages ago, so no asking what that was about. I think I'll just replace with uint64_t in the original separately, then change to uint64_t here in the patch.

throw std::runtime_error("Failed to enumerate " + std::string(table) + " records: " + std::string(mdb_strerror(ret)));

bytes += k.mv_size + v.mv_size;
if (resize_point(++nrecords, env1, &txn1, bytes))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this manually inspecting the LMDB file size instead of letting LMDB return an error when it needs to be resized? This adds lots of logic for detecting when this needs to occur and its likely to be brittle since it has to match internal LMDB logic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might have to add such checks in more places, and I'd have to add code for retrying. This seems simpler, but then I've not done the other way to actually compare. I'm not sure what logic you say it must match.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This already has code that checks + attempts to calculate the data usage. Except if it doesn't match what LMDB does internally fairly close, it should get stuck since the error case isn't handled directly (unless LMDB does an automatic resize on next open).

The LMDB utils patch was somewhat clunky - it would backoff and try the whole operation again (a write is done in a lambda that returns failure so the write function could resize + retry). Since there aren't multiple threads in here, some simple macro and/or function should do the trick. Like

#define TRY_PUT(env, txn, ...) \
  for (unsigned attempts = 0; attempts < 3; ++attempts) \
  { \
    const int error = __VA_ARGS__ ; \
    if (error == 0) \
      break; \
    if (error != MDB_MAP_FULL || 2 <= attempts) \
      throw ...; \
    \
    resize_map(env, &txn); \
  }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Looking at the source, it seems it's just three different places, as opposed to two, so not much more complicated indeed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it is. You end up having to spam parameters to that try_put proxy, which has to commit, reopen cursors, etc. I'll leave it that way.

tx_memory_pool m_mempool(*blockchain);
boost::filesystem::path paths[2];
bool already_pruned = false;
for (size_t n = 0; n < 2; ++n)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n < core_storage.size().

static constexpr uint32_t PRUNING_SEED_STRIPE_SHIFT = 0;
static constexpr uint32_t PRUNING_SEED_STRIPE_MASK = 0xffff;

static inline uint32_t get_pruning_log_stripes(uint32_t pruning_seed) { return (pruning_seed >> PRUNING_SEED_LOG_STRIPES_SHIFT) & PRUNING_SEED_LOG_STRIPES_MASK; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure which point -

constexpr isn't all that important here. There aren't any strong argument for or against in this situation. If a function is trivially constexpr, I add the keyword because its usable in more contexts and kind of indicates to other programmers that the function is "pure" (but that's not really a guarantee, sadly).

static in namespace scope tells the compiler to hide the function from other compilation units (internal linkage). This is nice if the function is in a cpp because it prevents a naming collision, but when used in a header it potentially results in multiple copies of the functions within the executable. Its best to remove static from the namespace variables in the header too.

if (v.mv_size == 0)
throw std::runtime_error("Invalid transaction pruned data");
uint64_t version;
std::string tmp((const char*)v.mv_data, v.mv_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to allocate + copy the bytes here.

...
const char* const begin = static_cast<const char*>(v.mv_data);
if (tools::read_variant(begin, begin + v.mv_size, version) <= 0)
...

This is called very often, so this should have an impact on performance.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not the first time you say that. Still not compiling:

/home/user/src/bitmonero/src/cryptonote_core/blockchain.cpp:2060:14: error: no matching function for call to 'read_varint'
int read = tools::read_varint(begin, begin + bd.size(), version);

static size_t get_transaction_version(const cryptonote::blobdata &bd)
{ 
  size_t version;
  const char* const begin = static_cast(bd.data());
  int read = tools::read_varint(begin, begin + bd.size(), version);
  if (read <= 0)
    throw std::runtime_error("Internal error getting transaction version");
  return version;
}

If you want that, paste a version that compiles, I'm not prodding it randomly till something passes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And removing the second const doens't help.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They both have to be l-values due to the parameter list. And the second const would break the instantiation stage:

const char* begin = static_cast<const char*>(v.mv_data);
const char* end = begin + v.mv_size;
read_varint(begin, end, version);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This version doesn't compile either.

  bool is_v1_tx(const blobdata_ref& tx_blob)
  {
    uint64_t version;
    const char* begin = static_cast(tx_blob.data());
    const char* end = begin + tx_blob.size();
    int read = tools::read_varint(begin, end, version);
    if (read <= 0)
      throw std::runtime_error("Internal error getting transaction version");
    return version <= 1;
  }
[ 31%] Building CXX object src/cryptonote_basic/CMakeFiles/obj_cryptonote_basic.dir/cryptonote_format_utils.cpp.o
In file included from /home/user/src/bitmonero/src/cryptonote_basic/cryptonote_format_utils.cpp:36:
In file included from /home/user/src/bitmonero/src/cryptonote_basic/cryptonote_format_utils.h:33:
In file included from /home/user/src/bitmonero/src/cryptonote_basic/cryptonote_basic_impl.h:33:
In file included from /home/user/src/bitmonero/src/cryptonote_basic/cryptonote_basic.h:41:
In file included from /home/user/src/bitmonero/src/serialization/binary_archive.h:41:
/home/user/src/bitmonero/src/common/varint.h:126:12: error: no matching function for call to 'read_varint'
    return read_varint::digits, InputIt, T>(std::move(first), std::move(last), i);
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/user/src/bitmonero/src/cryptonote_basic/cryptonote_format_utils.cpp:241:23: note: in instantiation of function template specialization 'tools::read_varint' requested here
    int read = tools::read_varint(begin, end, version);
                      ^
/home/user/src/bitmonero/src/common/varint.h:94:5: note: candidate function not viable: no known conversion from 'typename std::remove_reference::type' (aka 'const char *') to 'const char *&&' for 1st argument
    read_varint(InputIt &&first, InputIt &&last, T &write) {
    ^
/home/user/src/bitmonero/src/common/varint.h:125:9: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'InputIt'
    int read_varint(InputIt &&first, InputIt &&last, T &i) {
        ^
1 error generated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blobdata_ref is a epee::span<const char>

Copy link
Contributor

@vtnerd vtnerd Jan 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a bug in the overload (2nd version):

return read_varint<std::numeric_limits<T>::digits>(std::forward<InputIt>(first), std::forward<InputIt>(last), i);

That one did compile, at least with clang 6.

EDIT: I change the source of the read_varint to get the compile to work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that worked eventually. CLANG 6 too though.

pruning_seed = tools::make_pruning_seed(pruning_seed, CRYPTONOTE_PRUNING_LOG_STRIPES);
static char pruning_seed_key[] = "pruning_seed";
k.mv_data = pruning_seed_key;
k.mv_size = strlen("pruning_seed") + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a sizeof pruning_seed_key

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but since it's done that way in the existing code, I'll keep my paint colour here for consistency.


namespace tools
{
static constexpr uint32_t PRUNING_SEED_LOG_STRIPES_SHIFT = 7;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't these dependent on the config setting for stripe size? The size can only be 1,2,3. Why even have that as a build-time-config option? Much of this logic is going to be broken if that value is changed, including the how the peer downloading selection is done. Or maybe the config just needs to state that 3 is the largest possible value (although there's more "bit space" for larger values).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are dependent. There's a build time limit because you do need to know where to place bits for all nodes to know how to parse the data. There's some scope to increase, which is supposed to handle what seemed sane at the time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my point - why have that in the config? Is it realistic that it could be changed at all?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I was intending to do so when the chain's larger and we have more nodes up, but then I suppose I'd need to have done the code to know whether it's realistic in practice. The idea was that every pruned node would drop a random half the eighth they kept, and update their seed accordingly.

throw0(DB_ERROR(lmdb_error("Failed to retrieve pruning seed: ", result).c_str()));
if (v.mv_size != sizeof(uint32_t))
throw0(DB_ERROR("Failed to retrieve or create pruning seed: unexpected value size"));
const uint32_t pruning_seed = *(const uint32_t*)v.mv_data;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memcpy

if (v.mv_size == 0)
throw0(DB_ERROR("Invalid transaction pruned data"));
uint64_t version;
std::string tmp((const char*)v.mv_data, v.mv_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this comment down below, a temporary string does not have to be created here.

if (ret)
throw0(DB_ERROR(lmdb_error("Failed to enumerate transactions: ", ret).c_str()));

const uint64_t block_height = *(const uint64_t*)v.mv_data;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memcpy

}
}
else
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not create a separate function for these portions? Its complete separated in here ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same answer as to why do it I suppose. It could be that way too. It doens't seem particularly better either way. One indent fewer, more code.

Copy link
Contributor

@vtnerd vtnerd Jan 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think its worth changing this here and now, but the codebase needs to shift away from the massive functions seen in wallet2 and elsewhere - the maintainability of a codebase suffers with mondo functions. Its easier to introduce unintended bugs with so many variables to keep track of in temporary (human!) memory. The "unwritten" rule/target I have is generally 7-10 variables in a loop / function. I probably miss that target too frequently (as do most programmers).

BTW, I stole this principle from an effective Java programmer who insisted he was tested to only have a working space memory for ~5 elements, less than the typical average of ~7. Yet, this programmer seemed to have a lower bug count than the average programmer.

EDIT: 7-10 variables in a function is obviously fairly restrictive, only point being its easier on the brain, especially when auditing code, when things are subdivded a bit.

}
}
}
else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a else if here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else if what ? It seems ok: first part for "we want to prune this", second part for "we don't".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The else block contains a single if and nothing else - it can be "hoisted" into an else if. Just an odd thing that makes audting the code at a glance a bit frustrating because I'm trying to match a lot of braces with only two spaces indenting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. It makes more sense this way I think, since the branch is small and it matches the logic in the other one (test pruned/unpruned first, then mode).

if (m_db->get_pruned_tx_blob(tx_hash, tx))
{
txs.push_back(std::make_tuple(tx_hash, std::move(tx), crypto::null_hash, cryptonote::blobdata()));
if ((std::get<1>(txs.back()).empty() || std::get<1>(txs.back())[0] > 1) && !m_db->get_prunable_tx_hash(tx_hash, std::get<2>(txs.back())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a crude, and potentially incorrect varint-version check? Or rather, what is that second check?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is. I'll make it do actual varint parsing.

});
const bool use_next = (n_next > m_max_out_peers / 2 && n_subsequent <= 1) || (n_next > 2 && n_subsequent == 0);
const uint32_t ret_stripe = use_next ? subsequent_pruning_stripe: next_pruning_stripe;
MIDEBUG(const std::string po = get_peers_overview(), "get_next_needed_pruning_stripe: want height " << want_height << " (" <<
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes. Another strategy would be to overload operator<< for this object and then do current peers " << *this below, which would delay the string creation too. Still a massive log statement though.

span_start_height = next_unpruned_height;
}
MDEBUG("span_start_height: " <<span_start_height);
const uint64_t block_hashes_start_height = last_block_height - block_hashes.size() + 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be an existing assumption - how is underflow prevented/checked? It looks like the caller has to do it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a check for underflow, and a remote height sanity check in the caller for overflow.

float block_queue::get_download_rate(const boost::uuids::uuid &connection_id) const
{
boost::unique_lock<boost::recursive_mutex> lock(mutex);
std::unordered_map<boost::uuids::uuid, float> speeds;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable no longer appears necessary.

m_p2p->for_each_connection([&](cryptonote_connection_context& context, nodetool::peerid_type peer_id, uint32_t support_flags)->bool
{
if (context.m_state == cryptonote_connection_context::state_synchronizing || context.m_state == cryptonote_connection_context::state_before_handshake)
if (context.m_state == cryptonote_connection_context::state_synchronizing && context.m_last_request_time != boost::date_time::not_a_date_time)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the other case (state before handshake)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is now a network level timeout that kicks inactive peers, and the action (set the node on standby) does not make sense if the peer hasn't handshaked yet.

const uint32_t index = stripe - 1;
CRITICAL_REGION_LOCAL(m_used_stripe_peers_mutex);
MINFO("adding stripe " << stripe << " peer: " << context.m_remote_address.str());
std::remove_if(m_used_stripe_peers[index].begin(), m_used_stripe_peers[index].end(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How large do these get? std::lower_bound will help with sorted insertion for binary searching.

And why remove then immediately insert? Am I missing something?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not very large, since they're the ones that you started syncing from, and you'll start reusing them when you get back to their stripe . The removal is to avoid any dupe. It could be a set, you might say. But for small things like that, not sure it's a win. Besides, a LIFO might be a good thing to do (FIFO currently).

{
CRITICAL_REGION_LOCAL(m_used_stripe_peers_mutex);
MINFO("clearing used stripe peers");
for (size_t i = 0; i < 1ul << CRYPTONOTE_PRUNING_LOG_STRIPES; ++i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems easier to iterate over the elements as opposed to this so that it always works regardless of the size of m_used_stripe_peers. Unless only a subset need to be cleared?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it clears all the element.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant doing a foreach (for ( : )) loop instead. It links the real size with behavior instead of two separate places.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

if (target && target <= height)
return false;
size_t n_out_peers = 0;
m_p2p->for_each_connection([&](cryptonote_connection_context& ctx, nodetool::peerid_type peer_id, uint32_t support_flags)->bool{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a re-write of get_outgoing_connections_count?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that was because it's not accessible from there.

{
const uint64_t now = tools::get_tick_count();
const uint64_t dt = now - m_last_add_end_time;
if (tools::ticks_to_ns(dt) >= DROP_ON_SYNC_WEDGE_THRESHOLD)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... std::chrono is usually a bit easier for this kind of stuff ...

@moneromooo-monero
Copy link
Collaborator Author

I added a patch with the changes from vtnerd's comments (except the retry thing). That'll get squashed with the main commit later.

The blockchain prunes seven eighths of prunable tx data.
This saves about two thirds of the blockchain size, while
keeping the node useful as a sync source for an eighth
of the blockchain.

No other data is currently pruned.

There are three ways to prune a blockchain:

- run monerod with --prune-blockchain
- run "prune_blockchain" in the monerod console
- run the monero-blockchain-prune utility

The first two will prune in place. Due to how LMDB works, this
will not reduce the blockchain size on disk. Instead, it will
mark parts of the file as free, so that future data will use
that free space, causing the file to not grow until free space
grows scarce.

The third way will create a second database, a pruned copy of
the original one. Since this is a new file, this one will be
smaller than the original one.

Once the database is pruned, it will stay pruned as it syncs.
That is, there is no need to use --prune-blockchain again, etc.
Copy link
Contributor

@fluffypony fluffypony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed

@fluffypony fluffypony merged commit b750fb2 into monero-project:master Jan 28, 2019
fluffypony added a commit that referenced this pull request Jan 28, 2019
b750fb2 Pruning (moneromooo-monero)
@monerorus
Copy link
Contributor

that's awesome. You make me happy, i can use my old ssd (60Gb) for node again :)

@Mansarde
Copy link

Just for clarification:
If I start a completely new sync with pruning enabled, then will the pruning happen continuously after every x verified blocks, or does the whole blockchain have to be downloaded first before pruning can begin?

@moneromooo-monero
Copy link
Collaborator Author

Once you prune, it stays pruned. If you start a new sync with pruning enabled, it will update "pruningly", and you will not have to prune it again later.

@Mansarde
Copy link

If you start a new sync with pruning enabled, it will update "pruningly"

Gotcha, thanks a bunch! :)

stoffu added a commit to stoffu/monero that referenced this pull request May 10, 2022
stoffu added a commit to stoffu/monero that referenced this pull request May 10, 2022
… stuck issue

will hopefully be fixed properly by monero/monero-project#4843
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants