Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pruning #4843

Merged
merged 1 commit into from Jan 28, 2019

Conversation

Projects
None yet
10 participants
@moneromooo-monero
Copy link
Contributor

moneromooo-monero commented Nov 13, 2018

[Not quite totally cleaned up yet, but close enough for a first review.]

The blockchain prunes seven eighths of prunable tx data.
This saves about two thirds of the blockchain size, while
keeping the node useful as a sync source for an eighth
of the blockchain.

No other data is currently pruned.

@NanoAkron

This comment has been minimized.

Copy link
Contributor

NanoAkron commented Nov 13, 2018

Is there a way to build in redundancy with error codes to be able to reconstruct the whole blockchain from multiple parties, or is there too much entropy?

@moneromooo-monero

This comment has been minimized.

Copy link
Contributor Author

moneromooo-monero commented Nov 13, 2018

ECC are used when the sender has the info, and the transmission medium is unreliable. Here, the medium is reliable (TCP), and the sender does not have the info, so can't built the ECC in the first place.

@moneromooo-monero moneromooo-monero force-pushed the moneromooo-monero:pruning branch from efaccb1 to da35d5c Nov 14, 2018

@NanoAkron

This comment has been minimized.

Copy link
Contributor

NanoAkron commented Nov 14, 2018

What if the ECC is added by each node successfully storing the block, then pruning away large amounts of the data but retaining enough that if a threshold can collaborate then they can reconstruct.

e.g. if I know you'll store every even numbered transaction, and I'll store every odd numbered one, we can prune individual storage requirements to 50% + some overlap for redundancy.

@moneromooo-monero moneromooo-monero force-pushed the moneromooo-monero:pruning branch from da35d5c to db7a951 Dec 12, 2018

@moneromooo-monero

This comment has been minimized.

Copy link
Contributor Author

moneromooo-monero commented Dec 12, 2018

All ready now. Please review.

@moneromooo-monero moneromooo-monero force-pushed the moneromooo-monero:pruning branch 2 times, most recently from 3e4f649 to abf9e1e Dec 12, 2018

@moneromooo-monero

This comment has been minimized.

Copy link
Contributor Author

moneromooo-monero commented Dec 13, 2018

Needs #4949.

@moneromooo-monero moneromooo-monero force-pushed the moneromooo-monero:pruning branch from abf9e1e to 4778008 Dec 13, 2018

@moneromooo-monero

This comment has been minimized.

Copy link
Contributor Author

moneromooo-monero commented Dec 13, 2018

And #4950

@moneromooo-monero moneromooo-monero force-pushed the moneromooo-monero:pruning branch from 4778008 to 2c172dc Dec 13, 2018

@moneromooo-monero

This comment has been minimized.

Copy link
Contributor Author

moneromooo-monero commented Dec 13, 2018

@moneromooo-monero moneromooo-monero force-pushed the moneromooo-monero:pruning branch from 2c172dc to d4457c4 Dec 13, 2018

@SamsungGalaxyPlayer

This comment has been minimized.

Copy link

SamsungGalaxyPlayer commented Dec 14, 2018

Do we want 1/8 to be a good measure going forward, or do we want to instead say to keep the full set of data for approximately the past 6 months? In 4 years, for example, 1/8 will be a longer time window than it is now.

@moneromooo-monero

This comment has been minimized.

Copy link
Contributor Author

moneromooo-monero commented Dec 14, 2018

I do not see the link between the two parts, can you expand ?

@stoffu
Copy link
Contributor

stoffu left a comment

Gone through the whole except the networking part (block_queue, cryptonote_protocol_handler, net_node)

@@ -60,7 +61,8 @@ namespace {
peer_id_str >> id_str;
epee::string_tools::xtype_to_string(peer.port, port_str);
std::string addr_str = ip_str + ":" + port_str;
tools::msg_writer() << boost::format("%-10s %-25s %-25s %s") % prefix % id_str % addr_str % elapsed;
std::string pruning_seed = epee::string_tools::to_string_hex(peer.pruning_seed);
tools::msg_writer() << boost::format("%-10s %-25s %-4s %-25s %s") % prefix % id_str % addr_str % pruning_seed % elapsed;

This comment has been minimized.

Copy link
@stoffu

stoffu Dec 19, 2018

Contributor

Seems like the newly added formatter %-4s should come to the second last in the list.

ASSERT_EQ(tools::get_pruning_log_stripes(tools::make_pruning_seed(1, 7)), 7);
ASSERT_EQ(tools::get_pruning_log_stripes(tools::make_pruning_seed(7, 7)), 7);

for (uint32_t log_stripes = 1; log_stripes < tools::PRUNING_SEED_LOG_STRIPES_SHIFT; ++log_stripes)

This comment has been minimized.

Copy link
@stoffu

stoffu Dec 19, 2018

Contributor

Shouldn't the test belog_stripes <= PRUNING_SEED_LOG_STRIPES_MASK?

uint32_t make_pruning_seed(uint32_t stripe, uint32_t log_stripes)
{
CHECK_AND_ASSERT_THROW_MES(log_stripes <= PRUNING_SEED_LOG_STRIPES_MASK, "log_stripes out of range");
CHECK_AND_ASSERT_THROW_MES(stripe > 0 && stripe <= (1ul << log_stripes), "stripe out of range");

This comment has been minimized.

Copy link
@stoffu

stoffu Dec 19, 2018

Contributor

Shouldn't the same log_stripes ? log_stripes : CRYPTONOTE_PRUNING_LOG_STRIPES switch be used here?

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Dec 22, 2018

Author Contributor

No, it wouldn't make sense here, unless you see a case where it would.
Testing log_stripes is when you receive a seed and it might be 0 (ie, the peer is not striped). If you build a seed, having no stripe makes no sense.

if (h + CRYPTONOTE_PRUNING_TIP_BLOCKS > blockchain_height)
return blockchain_height - CRYPTONOTE_PRUNING_TIP_BLOCKS;
if (h < block_height)
return block_height;

This comment has been minimized.

Copy link
@stoffu

stoffu Dec 19, 2018

Contributor

I don't see how the execution can reach here. Shouldn't this be considered as a logic error?

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Dec 22, 2018

Author Contributor

Yes, after reading the code, it is an unnecessary test. I'll make it into an assert.

{
uint32_t pruning_seed = tools::get_pruning_seed(h, H, CRYPTONOTE_PRUNING_LOG_STRIPES);
ASSERT_EQ(pruning_seed, 0);
for (pruning_seed = 0; pruning_seed <= 1 << CRYPTONOTE_PRUNING_LOG_STRIPES; ++pruning_seed)

This comment has been minimized.

Copy link
@stoffu

stoffu Dec 19, 2018

Contributor

Maybe wrap 1 << CRYPTONOTE_PRUNING_LOG_STRIPES with parentheses just like others (though not needed grammar-wise)?


if (n == 1)
{
paths[1] = boost::filesystem::path(data_dir) / (db->get_db_name() + "-pruned");

This comment has been minimized.

Copy link
@stoffu

stoffu Dec 19, 2018

Contributor

It might be helpful to allow the user to customize the temporary DB file path in case the free disk space is very limited.

@@ -9506,7 +9534,7 @@ void wallet2::set_tx_key(const crypto::hash &txid, const crypto::secret_key &tx_
COMMAND_RPC_GET_TRANSACTIONS::request req = AUTO_VAL_INIT(req);
req.txs_hashes.push_back(epee::string_tools::pod_to_hex(txid));
req.decode_as_json = false;
req.prune = false;
req.prune = true;

This comment has been minimized.

Copy link
@stoffu

stoffu Dec 19, 2018

Contributor

Shouldn't this also be accompanied by get_pruned_tx like the other instances?

THROW_WALLET_EXCEPTION_IF(!cryptonote::parse_and_validate_tx_from_blob(bd, tx, tx_hash, tx_prefix_hash), error::wallet_internal_error, "failed to parse tx from blob");
THROW_WALLET_EXCEPTION_IF(tx_hash != txid, error::wallet_internal_error, "txid mismatch");
crypto::hash tx_hash;
THROW_WALLET_EXCEPTION_IF(!get_pruned_tx(res.txs[0], tx, tx_hash), error::wallet_internal_error, "failed to get tx from daemon");

This comment has been minimized.

Copy link
@stoffu

stoffu Dec 19, 2018

Contributor

Why is the line req.prune = false; kept unchanged?

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Dec 22, 2018

Author Contributor

I think I left it as is for code I did not know.

This comment has been minimized.

Copy link
@stoffu

stoffu Dec 23, 2018

Contributor

The function check_spend_proof verifies each ring signature included in the input string. To do so, it uses the tx prefix data (vin) and queries the daemon for the corresponding output keys. I'm pretty sure pruned tx data will do here.

For that matter, I doubt there's any code in the wallet that requires unpruned data from the daemon.

if (res.txs.size() == 1)
ok = string_tools::parse_hexstr_to_binbuff(res.txs.front().as_hex, tx_data);
{
ok = get_pruned_tx(res.txs.front(), tx, tx_hash);

This comment has been minimized.

Copy link
@stoffu

stoffu Dec 19, 2018

Contributor

Why is the line req.prune = false; kept unchanged?

@@ -293,6 +297,8 @@ namespace cryptonote
ar.end_object();
}
}
if (!typename Archive<W>::is_saving())
pruned = true;

This comment has been minimized.

Copy link
@stoffu

stoffu Dec 19, 2018

Contributor

Is this also applicable for v1 tx?

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Dec 22, 2018

Author Contributor

Yes.

@stoffu
Copy link
Contributor

stoffu left a comment

Gone through the p2p & protocol part, where I'm a lot less confident

void node_server<t_payload_net_handler>::add_used_stripe_peer(const typename t_payload_net_handler::connection_context &context)
{
const uint32_t stripe = tools::get_pruning_stripe(context.m_pruning_seed);
if (stripe == 0 || stripe >= (1ul << CRYPTONOTE_PRUNING_LOG_STRIPES))

This comment has been minimized.

Copy link
@stoffu

stoffu Dec 20, 2018

Contributor

Shouldn't the condition exclude the equality, i.e. stripe > (1ul << CRYPTONOTE_PRUNING_LOG_STRIPES)?

void node_server<t_payload_net_handler>::remove_used_stripe_peer(const typename t_payload_net_handler::connection_context &context)
{
const uint32_t stripe = tools::get_pruning_stripe(context.m_pruning_seed);
if (stripe == 0 || stripe >= (1ul << CRYPTONOTE_PRUNING_LOG_STRIPES))

This comment has been minimized.

Copy link
@stoffu

stoffu Dec 20, 2018

Contributor

Likewise

{
const uint64_t want_height_from_blockchain = m_core.get_current_blockchain_height();
const uint64_t want_height_from_block_queue = m_block_queue.get_next_needed_height(want_height_from_blockchain);
const uint64_t want_height = std::max(want_height_from_blockchain, want_height_from_block_queue);

This comment has been minimized.

Copy link
@stoffu

stoffu Dec 20, 2018

Contributor

My understanding on how block queuing works is weak and probably I'm missing something, but isn't want_height_from_block_queue going to be always greater than want_height_from_blockchain? What's the situation where this isn't the case?

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Dec 22, 2018

Author Contributor

Typically, yes, but if you find another chain, the block queue might have blocks from earlier heights.

return true;
});
const bool use_next = (n_next > m_max_out_peers / 2 && n_subsequent <= 1) || (n_next > 2 && n_subsequent == 0) ||
(n_next > 0 && tools::get_pruning_stripe(want_height_from_block_queue, blockchain_height, CRYPTONOTE_PRUNING_LOG_STRIPES) == subsequent_pruning_stripe);

This comment has been minimized.

Copy link
@stoffu

stoffu Dec 20, 2018

Contributor

The case where the latter condition

tools::get_pruning_stripe(want_height_from_block_queue, blockchain_height, CRYPTONOTE_PRUNING_LOG_STRIPES) == subsequent_pruning_stripe

holds seems to imply that want_height is set to want_height_from_blockchain which then implies want_height_from_blockchain > want_height_from_block_queue (i.e., if want_height was set to want_height_from_block_queue, the left hand side of the above expression would equal to next_pruning_stripe which always differs from subsequent_pruning_stripe by definition). What does this situation mean? Maybe leaving a comment in the code could be helpful.

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Dec 22, 2018

Author Contributor

You seem to be right, this part of the condition should never happen. I'll remove it.

Also, this code is ad hoc to try and find a middle ground between "all on the current stripe to get needed blocks asap" and "start early on future blocks to avoid a pause later while we reconnect", it's a lot of "change it and see how sync speed changes".

{
if (expected < i->start_block_height)
s += std::string(std::max((uint64_t)1, (i->start_block_height - expected) / (i->nblocks ? i->nblocks : 1)), '_');
s += i->blocks.empty() ? "." : i->start_block_height == blockchain_height ? "m" : "o";

This comment has been minimized.

Copy link
@stoffu

stoffu Dec 20, 2018

Contributor

The string often ends up like [mooooooooooooo..oo.]... so much mooo 😄

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Dec 22, 2018

Author Contributor

You can also get small ducks like _< from time to time, rarely though. Not sure you can get the full _o< but I think you can :)

// we might be able to ask for that block directly, as we now can request out of order,
// otherwise we continue out of order, unless this block is the one we need, in which
// case we request block hashes, though it might be safer to disconnect ?
if (start_height > previous_height)

This comment has been minimized.

Copy link
@stoffu

stoffu Dec 20, 2018

Contributor

Shouldn't this section be enabled only when the pruning mode is enabled (i.e. m_core.get_blockchain_pruning_seed() > 0)?

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Dec 22, 2018

Author Contributor

No, the local pruning should be irrelevant here. Can you expand on why you think it matters here ?

This comment has been minimized.

Copy link
@stoffu

stoffu Dec 25, 2018

Contributor

I had this question because of this comment

// as we now can request out of order,

I interpreted it as meaning that the new pruning scheme now allows us to request blocks out of order, or conversely, out-of-order request is not allowed when pruning is disabled. My understanding on how block queue and download take place is generally quite limited, so probably my question is irrelevant.

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Dec 25, 2018

Author Contributor

We can request (and receive) blocks out of order, regardless of whether we're pruning.

if (current_blockchain_height < target_blockchain_height)
{
uint64_t completion_percent = (current_blockchain_height * 100 / target_blockchain_height);
if (completion_percent == 100) // never show 100% if not actually up to date

This comment has been minimized.

Copy link
@stoffu

stoffu Dec 20, 2018

Contributor

Can this actually occur when current_blockchain_height < target_blockchain_height?

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Dec 22, 2018

Author Contributor

Probably not...

@@ -1220,27 +1370,22 @@ skip:
std::vector<boost::uuids::uuid> kick_connections;

This comment has been minimized.

Copy link
@stoffu

stoffu Dec 20, 2018

Contributor

kick_connections is now unused.

return false;
}
//------------------------------------------------------------------------------------------------------------------------
template<class t_core>
void t_cryptonote_protocol_handler<t_core>::skip_unneeded_hashes(cryptonote_connection_context& context, bool check_block_queue)

This comment has been minimized.

Copy link
@stoffu

stoffu Dec 20, 2018

Contributor

This function can be made const.

const uint32_t seed = tools::get_pruning_stripe(span.first, context.m_remote_blockchain_height, CRYPTONOTE_PRUNING_LOG_STRIPES);
if (seed != tools::get_pruning_stripe(context.m_pruning_seed))
{
MDEBUG(context << " starting early on next seed (" << span.first << " with seed " << epee::string_tools::to_string_hex(seed) <<

This comment has been minimized.

Copy link
@stoffu

stoffu Dec 20, 2018

Contributor

seed seems to actually represent a stripe. I think it needs to be fed to tools::make_pruning_seed in order to be printed like this.

Another question: shouldn't this section be activated only when context.m_pruning_seed > 0?

@moneromooo-monero moneromooo-monero force-pushed the moneromooo-monero:pruning branch from d4457c4 to e5832fb Dec 22, 2018

const uint64_t cycle_start = cycles + ((stripe > block_pruning_stripe) ? 0 : 1);
const uint64_t h = cycle_start * (CRYPTONOTE_PRUNING_STRIPE_SIZE << log_stripes) + (stripe - 1) * CRYPTONOTE_PRUNING_STRIPE_SIZE;
if (h + CRYPTONOTE_PRUNING_TIP_BLOCKS > blockchain_height)
return blockchain_height - CRYPTONOTE_PRUNING_TIP_BLOCKS;

This comment has been minimized.

Copy link
@stoffu

stoffu Dec 23, 2018

Contributor

Maybe put blockchain_height < CRYPTONOTE_PRUNING_TIP_BLOCKS ? 0 : blockchain_height - CRYPTONOTE_PRUNING_TIP_BLOCKS just in case?

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Dec 23, 2018

Author Contributor

That could only trigger if h + CRYPTONOTE_PRUNING_TIP_BLOCKS overflows and blockchain_height is < 5500, so h is very large. Instead, I've added asserts at start that the parameters are in valid range.

This comment has been minimized.

Copy link
@stoffu

stoffu Dec 25, 2018

Contributor

That could only trigger if h + CRYPTONOTE_PRUNING_TIP_BLOCKS overflows

I wasn't thinking about h being too large, it can trigger simply when blockchain_height < CRYPTONOTE_PRUNING_TIP_BLOCKS even with small h, no? E.g. if h:=10, CRYPTONOTE_PRUNING_TIP_BLOCKS:=5500, blockchain_height:=5000 then 10 + 5500 > 5000 and 5000 - 5500 wraps around.

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Dec 25, 2018

Author Contributor

I don't think that can happen, but I've added that change anyway so we're sure.

std::cout << "Instead, parts of the file will be marked as free, so the file will not grow" << std::endl;
std::cout << "until that newly free space is used up. If you want a smaller file size now," << std::endl;
std::cout << "exit monerod and run monero-blockchain-prune (you will temporarily need more" << std::endl;
std::cout << "disk space for the database conversion though. If you are OK with the database" << std::endl;

This comment has been minimized.

Copy link
@stoffu

stoffu Dec 23, 2018

Contributor

A closing parenthesis is missing

THROW_WALLET_EXCEPTION_IF(!cryptonote::parse_and_validate_tx_from_blob(bd, tx, tx_hash, tx_prefix_hash), error::wallet_internal_error, "failed to parse tx from blob");
THROW_WALLET_EXCEPTION_IF(tx_hash != txid, error::wallet_internal_error, "txid mismatch");
crypto::hash tx_hash;
THROW_WALLET_EXCEPTION_IF(!get_pruned_tx(res.txs[0], tx, tx_hash), error::wallet_internal_error, "failed to get tx from daemon");

This comment has been minimized.

Copy link
@stoffu

stoffu Dec 23, 2018

Contributor

The function check_spend_proof verifies each ring signature included in the input string. To do so, it uses the tx prefix data (vin) and queries the daemon for the corresponding output keys. I'm pretty sure pruned tx data will do here.

For that matter, I doubt there's any code in the wallet that requires unpruned data from the daemon.

@moneromooo-monero moneromooo-monero force-pushed the moneromooo-monero:pruning branch 2 times, most recently from 543d837 to 619ebdc Dec 23, 2018

@moneromooo-monero

This comment has been minimized.

Copy link
Contributor Author

moneromooo-monero commented Dec 24, 2018

txs_prunable_{tip, hash} are now dupsort|dupfixed

@moneromooo-monero moneromooo-monero force-pushed the moneromooo-monero:pruning branch 2 times, most recently from 6b0fe1c to 8552349 Dec 25, 2018

@stoffu

stoffu approved these changes Dec 25, 2018

@moneromooo-monero

This comment has been minimized.

Copy link
Contributor Author

moneromooo-monero commented Dec 25, 2018

One thing I'm still unsure about is the stripe size. That is the number of contiguous blocks a peer will keep. It's currently set to 4096. Tests by iDunk show that sync performance is significantly better with 16384 [1], as there's less reconnections as sync progresses. However, the larger that value, the more uneven the differences between stripes [2]. Ideally we'd want all stripes to have very similar size, or people will start stripes not at random, which means data duplication will be uneven, making the scarcest data even more scarce. And we don't want that.

[1] This test was done by spoofing stripes so that every single node was treated as if it was striped, which is the worst case sync wise.

[2] This table shows the size difference in MB between the smallest stripe and the largest stripe for stripe sizes between 8 and 65536. For comparison, a striped blockchain will be about 25 GB.

8 69
16 106.98
32 135.02
64 185.98
128 127.24
256 397.05
512 458.24
1024 512.87
2048 719.23
4096 1031.18
8192 1250.13
16384 1783.92
32768 3066.82
65536 7980.35

@moneromooo-monero

This comment has been minimized.

Copy link
Contributor Author

moneromooo-monero commented Jan 1, 2019

I added a small fix separately, it'll get squashed once it had a chance to get a review.

@@ -668,6 +674,8 @@ block Blockchain::pop_block_from_blockchain()
}
}
}
if (pruned)
MWARNING(pruned << " pruned txes could not be added back to the txpool");

This comment has been minimized.

Copy link
@stoffu

stoffu Jan 4, 2019

Contributor

I assume this is intended for cases of deep reorg (> 5500 blocks). Do the unpruned txes returned to the pool get re-broadcast to other nodes who may have pruned those txes?

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 4, 2019

Author Contributor

It is for those deep reorgs, yes. I think those unpruned txes that re-added will get re-broadcast on a timer if they're still in the pool when the timer triggers. However, if we're popping, it's very likely that it's because we've just seen a new longer chain, and that chain will most likely included pretty much all those txes anyway, so they'll get taken out of the pool.

This comment has been minimized.

Copy link
@stoffu

stoffu Jan 4, 2019

Contributor

Thanks for clarifying.

@stoffu

stoffu approved these changes Jan 4, 2019

@moneromooo-monero moneromooo-monero force-pushed the moneromooo-monero:pruning branch from c0f8988 to 6620ee3 Jan 11, 2019

@moneromooo-monero

This comment has been minimized.

Copy link
Contributor Author

moneromooo-monero commented Jan 11, 2019

github is not showing the last rev in this "forced pushed" link. Check the commit link itself for the small diff. It should be 04e83ff, not 6620ee3.

@moneromooo-monero moneromooo-monero force-pushed the moneromooo-monero:pruning branch from 04e83ff to bac84cb Jan 12, 2019

@moneromooo-monero

This comment has been minimized.

Copy link
Contributor Author

moneromooo-monero commented Jan 12, 2019

Rebased to current master.

@iamsmooth

This comment has been minimized.

Copy link
Contributor

iamsmooth commented Jan 16, 2019

@BigslimVdub

This comment has been minimized.

Copy link
Contributor

BigslimVdub commented Jan 16, 2019

I see you have been hard at work with this. Downstream merges if ever will be quite useful. Thank you for your effort on this (to all that helped with it).

@vtnerd
Copy link
Contributor

vtnerd left a comment

Had some of these lined up, will do a better review early tomorrow.

@@ -233,6 +233,15 @@ POP_WARNINGS
return boost::lexical_cast<std::string>(val);
}
//----------------------------------------------------------------------------
template<typename T> inline std::string to_string_hex(const T &val)

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 16, 2019

Contributor

This is likely only going to work with integers. Floats use a different manipulator, and custom operator<< overloads usually don't lookup std::hex state. I would suggest taking a uint64_t instead of a template and putting the implementation in the cpp. The few running a 32-bit process won't be slowed down terribly. If perf was a concern, an unsigned and uint64_t overload can forward to a templated version in the cpp.

The hex.h file might be a better place - including that file brings in less headers (quicker).

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 16, 2019

Author Contributor

It is supposed to work only with integers.

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 17, 2019

Contributor

Exactly, two overloads - std::uintmax_t and std::intmax_t would work for every integer type. Only std::uintmax_t is needed right now. Add int and unsigned if there was some platform performance considerations - the cpp could contain the templated function which is invoked by the non-templated overloads.

Using templates here forces every cpp unit that uses the function to generate ASM, then have the linker filter the extras out. It also means that there were be nearly identical ASM copies, one for each type (short, int, long, and long long + unsigned variants) that was used. Cleaning up the templates in this file is already one of the many todos I have.

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 17, 2019

Author Contributor

I've made it for just uint32_t since that's what it's used for.

static constexpr uint32_t PRUNING_SEED_STRIPE_SHIFT = 0;
static constexpr uint32_t PRUNING_SEED_STRIPE_MASK = 0xffff;

static inline uint32_t get_pruning_log_stripes(uint32_t pruning_seed) { return (pruning_seed >> PRUNING_SEED_LOG_STRIPES_SHIFT) & PRUNING_SEED_LOG_STRIPES_MASK; }

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 16, 2019

Contributor

This function can be marked constexpr. The static keyword should be removed from both of them. In this context static forces hidden visibility which prevents the linker from merging the copies across compilation units.

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 16, 2019

Author Contributor

Can you eplain why ?

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 16, 2019

Contributor

I'm not sure which point -

constexpr isn't all that important here. There aren't any strong argument for or against in this situation. If a function is trivially constexpr, I add the keyword because its usable in more contexts and kind of indicates to other programmers that the function is "pure" (but that's not really a guarantee, sadly).

static in namespace scope tells the compiler to hide the function from other compilation units (internal linkage). This is nice if the function is in a cpp because it prevents a naming collision, but when used in a header it potentially results in multiple copies of the functions within the executable. Its best to remove static from the namespace variables in the header too.

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 16, 2019

Author Contributor

i meant "why constexpr", and "why not static". I use "static inline" as "this should be inlined, and don't emit a public version, this won't be called from another TU". AFAIK constexpr means "this can be collapsed to a constant at compile time", which seems to be not quite the whole truth, since calls to this are made with variables, so part of the question is "what does constexpr do in this case". This is from my C background though, so my question is really "why is your way better because I'm not changing it if I don't understand why it is".

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 17, 2019

Contributor

AFAIK constexpr means "this can be collapsed to a constant at compile time", which seems to be not quite the whole truth, since calls to this are made with variables, so part of the question is "what does constexpr do in this case".

A constexpr function can be evaluated at compile or runtime. If one or more of the arguments are not a compile-time literal, then the function is evaluated at runtime. It allows for a single implementation to work in either situation. It seemed like it might be useful generating some constants with this function (at a glance), maybe not. Generally if the function is this simple I just mark it as constexpr since it can be used in several contexts.

I use "static inline" as "this should be inlined, and don't emit a public version, this won't be called from another TU.

If the compiler inlines the function, then it doesn't matter because it pretty much did what you wanted (the function ASM is jammed into another). If the compiler chooses to not inline the function (which it is allowed to do, even with this keyword), then multiple copies of the function can exist in the executable. Removing the static keyword makes the function copies from each TU visible to the linker, which is required to merge them into a single version (one-definition rule). There are advantages to hidden visibility, I'm just not seeing them in this situation. Luckily, these are so simple that there is a high probability that they get inlined anyway.

@@ -148,6 +149,17 @@

#define BULLETPROOF_MAX_OUTPUTS 16

#if 0

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 16, 2019

Contributor

Remove?

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 16, 2019

Contributor

Old, now irrelevant comment

m_p2p->for_each_connection([&](const connection_context &ctx, nodetool::peerid_type peer_id, uint32_t support_flags) {
const uint32_t stripe = tools::get_pruning_stripe(ctx.m_pruning_seed);
char state_char = cryptonote::get_protocol_state_char(ctx.m_state);
ss << std::to_string(stripe) + state_char;

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 16, 2019

Contributor

to_string is a waste here; ss << stripe << state_char.

mdb_env_close(env);
}

static int compare_uint64(const MDB_val *a, const MDB_val *b)

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 16, 2019

Contributor

When making that future change, I recommend changing these to memcpy for portability. These current compare functions violate the aliasing rule, and will fail in some cases if the pointer is not aligned. The compiler will automatically convert a memcpy into what is "desired" here on platforms that support unaligned memory access (i.e. on amd64 the code generated is identical in my disassembling tests).

mdb_env_stat(env, &mst);

uint64_t new_mapsize = mei.me_mapsize + bytes;
new_mapsize += (new_mapsize % mst.ms_psize);

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 16, 2019

Contributor

Doesn't LMDB do this page size logic internally? Not a big deal, but the LMDB call seems unnecessary.

The other funkiness is that the lmdb call uses size_t and some overflow funkiness could occur across platforms.

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 17, 2019

Author Contributor

I assume you mean memcmp. If we do that, then the sorting will be reversed on little/big endian archs, no ?

About the page size logic, do you mean the quantizing ? It probably does it internally, yes. I suppose this can be removed, it just felt better I guess.

Your last sentence got clipped.

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 17, 2019

Contributor

I assume you mean memcmp. If we do that, then the sorting will be reversed on little/big endian archs, no ?

I think this is meant for the comment above. I meant memcpy instead of the cast to an integer which violates the aliasing rule.

About the page size logic, do you mean the quantizing ? It probably does it internally, yes. I suppose this can be removed, it just felt better I guess.

Yes, I meant that alignment logic.

Your last sentence got clipped.

I was commenting on how this was doing math on size_t, assigning to uint64_t, and then implicitly converting back to size_t in the call to resize. Overflows are unlikely on 64-bit platforms, but some platforms still use 32-bit size_t (thus overflow before assignment to uint64_t).

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 17, 2019

Author Contributor

Two of your comments show with a single reply box so yes, this is replying to the two comments.

About the types, I now changed it to be exactly like the existing, wihch goes through double, and is shown to work on 32 bits.

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 17, 2019

Author Contributor

About the memcmp, I'll change that once those functions get merged in src/blockchain_db/lmdb

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 18, 2019

Contributor

About the types, I now changed it to be exactly like the existing, wihch goes through double, and is shown to work on 32 bits.

I'm not sure what this means. I was originally commented on the:

uint64_t new_mapsize = mei.me_mapsize + bytes;
new_mapsize += (new_mapsize % mst.ms_psize);

and I wasn't immediately sure whether that could overflow before assignment.

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 18, 2019

Author Contributor

I mean I changed it to:

  uint64_t new_mapsize = (double)mei.me_mapsize + bytes;
  new_mapsize += (new_mapsize % mst.ms_psize);

which is what the existing code does in db_lmdb.cpp, and this one is shown to work on 32 bit archs since some people ran monerod there.

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 18, 2019

Contributor

Why double instead of uint64_t?

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 18, 2019

Author Contributor

i don't know. I don't think I wrote the original. I do agree uint64_t would seem more suited to it.

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 18, 2019

Author Contributor

Looking at git, it's from ND ages ago, so no asking what that was about. I think I'll just replace with uint64_t in the original separately, then change to uint64_t here in the patch.

throw std::runtime_error("Failed to enumerate " + std::string(table) + " records: " + std::string(mdb_strerror(ret)));

bytes += k.mv_size + v.mv_size;
if (resize_point(++nrecords, env1, &txn1, bytes))

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 16, 2019

Contributor

Why is this manually inspecting the LMDB file size instead of letting LMDB return an error when it needs to be resized? This adds lots of logic for detecting when this needs to occur and its likely to be brittle since it has to match internal LMDB logic.

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 17, 2019

Author Contributor

I think I might have to add such checks in more places, and I'd have to add code for retrying. This seems simpler, but then I've not done the other way to actually compare. I'm not sure what logic you say it must match.

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 17, 2019

Contributor

This already has code that checks + attempts to calculate the data usage. Except if it doesn't match what LMDB does internally fairly close, it should get stuck since the error case isn't handled directly (unless LMDB does an automatic resize on next open).

The LMDB utils patch was somewhat clunky - it would backoff and try the whole operation again (a write is done in a lambda that returns failure so the write function could resize + retry). Since there aren't multiple threads in here, some simple macro and/or function should do the trick. Like

#define TRY_PUT(env, txn, ...) \
  for (unsigned attempts = 0; attempts < 3; ++attempts) \
  { \
    const int error = __VA_ARGS__ ; \
    if (error == 0) \
      break; \
    if (error != MDB_MAP_FULL || 2 <= attempts) \
      throw ...; \
    \
    resize_map(env, &txn); \
  }

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 17, 2019

Author Contributor

OK. Looking at the source, it seems it's just three different places, as opposed to two, so not much more complicated indeed.

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 18, 2019

Author Contributor

Actually, it is. You end up having to spam parameters to that try_put proxy, which has to commit, reopen cursors, etc. I'll leave it that way.

tx_memory_pool m_mempool(*blockchain);
boost::filesystem::path paths[2];
bool already_pruned = false;
for (size_t n = 0; n < 2; ++n)

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 16, 2019

Contributor

n < core_storage.size().

static constexpr uint32_t PRUNING_SEED_STRIPE_SHIFT = 0;
static constexpr uint32_t PRUNING_SEED_STRIPE_MASK = 0xffff;

static inline uint32_t get_pruning_log_stripes(uint32_t pruning_seed) { return (pruning_seed >> PRUNING_SEED_LOG_STRIPES_SHIFT) & PRUNING_SEED_LOG_STRIPES_MASK; }

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 16, 2019

Contributor

I'm not sure which point -

constexpr isn't all that important here. There aren't any strong argument for or against in this situation. If a function is trivially constexpr, I add the keyword because its usable in more contexts and kind of indicates to other programmers that the function is "pure" (but that's not really a guarantee, sadly).

static in namespace scope tells the compiler to hide the function from other compilation units (internal linkage). This is nice if the function is in a cpp because it prevents a naming collision, but when used in a header it potentially results in multiple copies of the functions within the executable. Its best to remove static from the namespace variables in the header too.

if (v.mv_size == 0)
throw std::runtime_error("Invalid transaction pruned data");
uint64_t version;
std::string tmp((const char*)v.mv_data, v.mv_size);

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 17, 2019

Contributor

There's no need to allocate + copy the bytes here.

...
const char* const begin = static_cast<const char*>(v.mv_data);
if (tools::read_variant(begin, begin + v.mv_size, version) <= 0)
...

This is called very often, so this should have an impact on performance.

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 17, 2019

Author Contributor

It's not the first time you say that. Still not compiling:

/home/user/src/bitmonero/src/cryptonote_core/blockchain.cpp:2060:14: error: no matching function for call to 'read_varint'
int read = tools::read_varint(begin, begin + bd.size(), version);

static size_t get_transaction_version(const cryptonote::blobdata &bd)
{ 
  size_t version;
  const char* const begin = static_cast(bd.data());
  int read = tools::read_varint(begin, begin + bd.size(), version);
  if (read <= 0)
    throw std::runtime_error("Internal error getting transaction version");
  return version;
}

If you want that, paste a version that compiles, I'm not prodding it randomly till something passes.

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 17, 2019

Author Contributor

And removing the second const doens't help.

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 17, 2019

Contributor

They both have to be l-values due to the parameter list. And the second const would break the instantiation stage:

const char* begin = static_cast<const char*>(v.mv_data);
const char* end = begin + v.mv_size;
read_varint(begin, end, version);

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 17, 2019

Author Contributor

This version doesn't compile either.

  bool is_v1_tx(const blobdata_ref& tx_blob)
  {
    uint64_t version;
    const char* begin = static_cast(tx_blob.data());
    const char* end = begin + tx_blob.size();
    int read = tools::read_varint(begin, end, version);
    if (read <= 0)
      throw std::runtime_error("Internal error getting transaction version");
    return version <= 1;
  }
[ 31%] Building CXX object src/cryptonote_basic/CMakeFiles/obj_cryptonote_basic.dir/cryptonote_format_utils.cpp.o
In file included from /home/user/src/bitmonero/src/cryptonote_basic/cryptonote_format_utils.cpp:36:
In file included from /home/user/src/bitmonero/src/cryptonote_basic/cryptonote_format_utils.h:33:
In file included from /home/user/src/bitmonero/src/cryptonote_basic/cryptonote_basic_impl.h:33:
In file included from /home/user/src/bitmonero/src/cryptonote_basic/cryptonote_basic.h:41:
In file included from /home/user/src/bitmonero/src/serialization/binary_archive.h:41:
/home/user/src/bitmonero/src/common/varint.h:126:12: error: no matching function for call to 'read_varint'
    return read_varint::digits, InputIt, T>(std::move(first), std::move(last), i);
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/user/src/bitmonero/src/cryptonote_basic/cryptonote_format_utils.cpp:241:23: note: in instantiation of function template specialization 'tools::read_varint' requested here
    int read = tools::read_varint(begin, end, version);
                      ^
/home/user/src/bitmonero/src/common/varint.h:94:5: note: candidate function not viable: no known conversion from 'typename std::remove_reference::type' (aka 'const char *') to 'const char *&&' for 1st argument
    read_varint(InputIt &&first, InputIt &&last, T &write) {
    ^
/home/user/src/bitmonero/src/common/varint.h:125:9: note: candidate template ignored: invalid explicitly-specified argument for template parameter 'InputIt'
    int read_varint(InputIt &&first, InputIt &&last, T &i) {
        ^
1 error generated.

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 17, 2019

Author Contributor

blobdata_ref is a epee::span<const char>

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 18, 2019

Contributor

Its a bug in the overload (2nd version):

return read_varint<std::numeric_limits<T>::digits>(std::forward<InputIt>(first), std::forward<InputIt>(last), i);

That one did compile, at least with clang 6.

EDIT: I change the source of the read_varint to get the compile to work.

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 18, 2019

Author Contributor

Thanks, that worked eventually. CLANG 6 too though.

pruning_seed = tools::make_pruning_seed(pruning_seed, CRYPTONOTE_PRUNING_LOG_STRIPES);
static char pruning_seed_key[] = "pruning_seed";
k.mv_data = pruning_seed_key;
k.mv_size = strlen("pruning_seed") + 1;

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 17, 2019

Contributor

Seems like a sizeof pruning_seed_key

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 17, 2019

Author Contributor

Yes, but since it's done that way in the existing code, I'll keep my paint colour here for consistency.


namespace tools
{
static constexpr uint32_t PRUNING_SEED_LOG_STRIPES_SHIFT = 7;

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 17, 2019

Contributor

Aren't these dependent on the config setting for stripe size? The size can only be 1,2,3. Why even have that as a build-time-config option? Much of this logic is going to be broken if that value is changed, including the how the peer downloading selection is done. Or maybe the config just needs to state that 3 is the largest possible value (although there's more "bit space" for larger values).

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 17, 2019

Author Contributor

They are dependent. There's a build time limit because you do need to know where to place bits for all nodes to know how to parse the data. There's some scope to increase, which is supposed to handle what seemed sane at the time.

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 17, 2019

Contributor

That's my point - why have that in the config? Is it realistic that it could be changed at all?

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 17, 2019

Author Contributor

Well, I was intending to do so when the chain's larger and we have more nodes up, but then I suppose I'd need to have done the code to know whether it's realistic in practice. The idea was that every pruned node would drop a random half the eighth they kept, and update their seed accordingly.

throw0(DB_ERROR(lmdb_error("Failed to retrieve pruning seed: ", result).c_str()));
if (v.mv_size != sizeof(uint32_t))
throw0(DB_ERROR("Failed to retrieve or create pruning seed: unexpected value size"));
const uint32_t pruning_seed = *(const uint32_t*)v.mv_data;

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 17, 2019

Contributor

memcpy

if (v.mv_size == 0)
throw0(DB_ERROR("Invalid transaction pruned data"));
uint64_t version;
std::string tmp((const char*)v.mv_data, v.mv_size);

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 17, 2019

Contributor

I made this comment down below, a temporary string does not have to be created here.

if (ret)
throw0(DB_ERROR(lmdb_error("Failed to enumerate transactions: ", ret).c_str()));

const uint64_t block_height = *(const uint64_t*)v.mv_data;

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 17, 2019

Contributor

memcpy

}
}
else
{

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 17, 2019

Contributor

Why not create a separate function for these portions? Its complete separated in here ...

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 17, 2019

Author Contributor

Same answer as to why do it I suppose. It could be that way too. It doens't seem particularly better either way. One indent fewer, more code.

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 17, 2019

Contributor

I don't think its worth changing this here and now, but the codebase needs to shift away from the massive functions seen in wallet2 and elsewhere - the maintainability of a codebase suffers with mondo functions. Its easier to introduce unintended bugs with so many variables to keep track of in temporary (human!) memory. The "unwritten" rule/target I have is generally 7-10 variables in a loop / function. I probably miss that target too frequently (as do most programmers).

BTW, I stole this principle from an effective Java programmer who insisted he was tested to only have a working space memory for ~5 elements, less than the typical average of ~7. Yet, this programmer seemed to have a lower bug count than the average programmer.

EDIT: 7-10 variables in a function is obviously fairly restrictive, only point being its easier on the brain, especially when auditing code, when things are subdivded a bit.

}
}
}
else

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 17, 2019

Contributor

looks like a else if here

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 17, 2019

Author Contributor

else if what ? It seems ok: first part for "we want to prune this", second part for "we don't".

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 17, 2019

Contributor

The else block contains a single if and nothing else - it can be "hoisted" into an else if. Just an odd thing that makes audting the code at a glance a bit frustrating because I'm trying to match a lot of braces with only two spaces indenting.

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 17, 2019

Author Contributor

OK. It makes more sense this way I think, since the branch is small and it matches the logic in the other one (test pruned/unpruned first, then mode).

if (m_db->get_pruned_tx_blob(tx_hash, tx))
{
txs.push_back(std::make_tuple(tx_hash, std::move(tx), crypto::null_hash, cryptonote::blobdata()));
if ((std::get<1>(txs.back()).empty() || std::get<1>(txs.back())[0] > 1) && !m_db->get_prunable_tx_hash(tx_hash, std::get<2>(txs.back())))

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 17, 2019

Contributor

Is this a crude, and potentially incorrect varint-version check? Or rather, what is that second check?

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 17, 2019

Author Contributor

It is. I'll make it do actual varint parsing.

});
const bool use_next = (n_next > m_max_out_peers / 2 && n_subsequent <= 1) || (n_next > 2 && n_subsequent == 0);
const uint32_t ret_stripe = use_next ? subsequent_pruning_stripe: next_pruning_stripe;
MIDEBUG(const std::string po = get_peers_overview(), "get_next_needed_pruning_stripe: want height " << want_height << " (" <<

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 17, 2019

Contributor

Yikes. Another strategy would be to overload operator<< for this object and then do current peers " << *this below, which would delay the string creation too. Still a massive log statement though.

span_start_height = next_unpruned_height;
}
MDEBUG("span_start_height: " <<span_start_height);
const uint64_t block_hashes_start_height = last_block_height - block_hashes.size() + 1;

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 18, 2019

Contributor

This seems to be an existing assumption - how is underflow prevented/checked? It looks like the caller has to do it.

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 18, 2019

Author Contributor

I've added a check for underflow, and a remote height sanity check in the caller for overflow.

float block_queue::get_download_rate(const boost::uuids::uuid &connection_id) const
{
boost::unique_lock<boost::recursive_mutex> lock(mutex);
std::unordered_map<boost::uuids::uuid, float> speeds;

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 18, 2019

Contributor

Variable no longer appears necessary.

m_p2p->for_each_connection([&](cryptonote_connection_context& context, nodetool::peerid_type peer_id, uint32_t support_flags)->bool
{
if (context.m_state == cryptonote_connection_context::state_synchronizing || context.m_state == cryptonote_connection_context::state_before_handshake)
if (context.m_state == cryptonote_connection_context::state_synchronizing && context.m_last_request_time != boost::date_time::not_a_date_time)

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 18, 2019

Contributor

Why remove the other case (state before handshake)?

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 18, 2019

Author Contributor

There is now a network level timeout that kicks inactive peers, and the action (set the node on standby) does not make sense if the peer hasn't handshaked yet.

const uint32_t index = stripe - 1;
CRITICAL_REGION_LOCAL(m_used_stripe_peers_mutex);
MINFO("adding stripe " << stripe << " peer: " << context.m_remote_address.str());
std::remove_if(m_used_stripe_peers[index].begin(), m_used_stripe_peers[index].end(),

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 18, 2019

Contributor

How large do these get? std::lower_bound will help with sorted insertion for binary searching.

And why remove then immediately insert? Am I missing something?

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 18, 2019

Author Contributor

Probably not very large, since they're the ones that you started syncing from, and you'll start reusing them when you get back to their stripe . The removal is to avoid any dupe. It could be a set, you might say. But for small things like that, not sure it's a win. Besides, a LIFO might be a good thing to do (FIFO currently).

{
CRITICAL_REGION_LOCAL(m_used_stripe_peers_mutex);
MINFO("clearing used stripe peers");
for (size_t i = 0; i < 1ul << CRYPTONOTE_PRUNING_LOG_STRIPES; ++i)

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 18, 2019

Contributor

Seems easier to iterate over the elements as opposed to this so that it always works regardless of the size of m_used_stripe_peers. Unless only a subset need to be cleared?

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 18, 2019

Author Contributor

No, it clears all the element.

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 18, 2019

Contributor

I meant doing a foreach (for ( : )) loop instead. It links the real size with behavior instead of two separate places.

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 18, 2019

Author Contributor

OK

if (target && target <= height)
return false;
size_t n_out_peers = 0;
m_p2p->for_each_connection([&](cryptonote_connection_context& ctx, nodetool::peerid_type peer_id, uint32_t support_flags)->bool{

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 18, 2019

Contributor

Is this a re-write of get_outgoing_connections_count?

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 18, 2019

Author Contributor

Looks like it.

This comment has been minimized.

Copy link
@moneromooo-monero

moneromooo-monero Jan 18, 2019

Author Contributor

Ah, that was because it's not accessible from there.

{
const uint64_t now = tools::get_tick_count();
const uint64_t dt = now - m_last_add_end_time;
if (tools::ticks_to_ns(dt) >= DROP_ON_SYNC_WEDGE_THRESHOLD)

This comment has been minimized.

Copy link
@vtnerd

vtnerd Jan 18, 2019

Contributor

... std::chrono is usually a bit easier for this kind of stuff ...

@moneromooo-monero moneromooo-monero force-pushed the moneromooo-monero:pruning branch from bac84cb to 07b20db Jan 18, 2019

@moneromooo-monero

This comment has been minimized.

Copy link
Contributor Author

moneromooo-monero commented Jan 18, 2019

I added a patch with the changes from vtnerd's comments (except the retry thing). That'll get squashed with the main commit later.

@moneromooo-monero moneromooo-monero force-pushed the moneromooo-monero:pruning branch 2 times, most recently from dc1da67 to 53e59fc Jan 18, 2019

Pruning
The blockchain prunes seven eighths of prunable tx data.
This saves about two thirds of the blockchain size, while
keeping the node useful as a sync source for an eighth
of the blockchain.

No other data is currently pruned.

There are three ways to prune a blockchain:

- run monerod with --prune-blockchain
- run "prune_blockchain" in the monerod console
- run the monero-blockchain-prune utility

The first two will prune in place. Due to how LMDB works, this
will not reduce the blockchain size on disk. Instead, it will
mark parts of the file as free, so that future data will use
that free space, causing the file to not grow until free space
grows scarce.

The third way will create a second database, a pruned copy of
the original one. Since this is a new file, this one will be
smaller than the original one.

Once the database is pruned, it will stay pruned as it syncs.
That is, there is no need to use --prune-blockchain again, etc.

@moneromooo-monero moneromooo-monero force-pushed the moneromooo-monero:pruning branch from a2fc8b8 to b750fb2 Jan 22, 2019

@fluffypony
Copy link
Collaborator

fluffypony left a comment

Reviewed