Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QCop autonomously sends deregisters #105

Merged
merged 7 commits into from Aug 2, 2018
Merged

Conversation

Doy-lee
Copy link
Collaborator

@Doy-lee Doy-lee commented Jul 24, 2018

Quorum cop takes the core and service node list to query the quorum and begins voting off if it's participating in the quorum for nodes up to the (latest height - some safety buffer) if no ping has been seen since some threshold.

Nodes ping every 1 hour and also send out a ping on demand if they detect that they are included in the quorum for the latest block height.

There's an implicit dependency in Quorum Cop that it relies on Service Node Lists to have finished calculating the quorum state before Quorum Cop starts voting otherwise it receives an empty quorum.
I've chosen to enforce this by making it that the hooks for service node lists have to be registered explicitly and requires you to pass in the quorum cop.

return false;

if (!m_service_node_list.is_service_node(pubkey))
return false;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only dependency on service_node_lists. Once we get db storage of service node lists we can remove this and check if the pubkey belongs to some range of heights.

I can imagine we'll want to store uptime proof over some range of heights so we have the tools to provide statistics on the network.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted

", which is greater than the recommended REORG_SAFETY_BUFFER_IN_BLOCKS: " << REORG_SAFETY_BUFFER_IN_BLOCKS);
}

m_last_height = (height < REORG_SAFETY_BUFFER_IN_BLOCKS) ? height : height - REORG_SAFETY_BUFFER_IN_BLOCKS;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_last_height should not be updated in the event of a blockchain detach

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale? You need to start voting from the height onwards again. This probably should be m_last_height = height; because you may not have the vote data anymore depending on the height you popped back to.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_last_height may be from well before the blockchain detach height.

Those blocks still need to be checked.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if m_last_height >= blockchain detach height then we have problems with our assumption that large reorgs will not happen

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

void quorum_cop::blockchain_detached(uint64_t height)
{
uint64_t delta_height = std::abs(static_cast<int64_t>(m_core.get_current_blockchain_height() - height));
if (delta_height > REORG_SAFETY_BUFFER_IN_BLOCKS)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_core.get_current_blockchain_height() should always be >= height, I think...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use m_last_height. Height will be the height that the chain split at and blockchain has already been popped back to the split height.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be

if (m_last_height >= height) {

Warn user; adjust m_last_height; }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

m_last_height = (height < REORG_SAFETY_BUFFER_IN_BLOCKS) ? height : height - REORG_SAFETY_BUFFER_IN_BLOCKS;
}

static bool participates_in_quorum(const std::vector<crypto::public_key>& quorum, crypto::public_key const &my_key, size_t *index_in_quorum = nullptr)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The written function returns true if the pubkey does not exist in the quorum. I don't think this is intended behaviour.

Implement or replace entirely with a call to std::find(). You can get the index by subtracting vec.begin() from the returned iterator

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the bool return.

if (height < execute_justice_from_height)
return;

if (height == latest_height)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this branch. The uptime proofs are submitted every hour by the daemon when it is idle, not just when it is in the list of nodes to test.

bool vote_off_node = false;

CRITICAL_REGION_LOCAL(m_lock);
auto const it = m_uptime_proof_seen.find(node_key);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vote_off_node = (this daemon has been running for at least two hours && m_uptime_proof_seen.find(node_key) == m_uptime_proof_seen.end());

Somewhere else in the code we must remove old entries in the m_uptime_proof_seen table.

}
}

prune_uptime_proof_below(execute_justice_from_height);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would triggered this during idle in m_core, say once every 30 seconds. Works fine here too, just doesn't feel right because the uptime proof thing is not happening related to the blockchain. This also means the block height stuff in this function is unnecessary - you just have to remove them as long as their timestamp < now - 2 hours and 5 minutes

return false;

if (!m_service_node_list.is_service_node(pubkey))
return false;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted

return;
}

const uint64_t prune_from_timestamp = blocks.front().timestamp;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prune_from_timestamp = now() - 2 hours 5 minutes is fine

CRITICAL_REGION_LOCAL(m_lock);

for (auto it = m_uptime_proof_seen.begin(); it != m_uptime_proof_seen.end();)
it = (it->second < prune_from_timestamp) ? m_uptime_proof_seen.erase(it) : it;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an infinite loop.

change to:

if (it->second < prune_from_timestamp)
    it = m_uptime_proof_seen.erase(it);
else
    it++;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively just add ++it to the second branch in your ternary operator. Not sure if this is defined behaviour.

m_last_height = (height < REORG_SAFETY_BUFFER_IN_BLOCKS) ? height : height - REORG_SAFETY_BUFFER_IN_BLOCKS;
}

static bool participates_in_quorum(const std::vector<crypto::public_key>& quorum, crypto::public_key const &my_key, size_t *index_in_quorum = nullptr)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the bool return.

void quorum_cop::blockchain_detached(uint64_t height)
{
uint64_t delta_height = std::abs(static_cast<int64_t>(m_core.get_current_blockchain_height() - height));
if (delta_height > REORG_SAFETY_BUFFER_IN_BLOCKS)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use m_last_height. Height will be the height that the chain split at and blockchain has already been popped back to the split height.

", which is greater than the recommended REORG_SAFETY_BUFFER_IN_BLOCKS: " << REORG_SAFETY_BUFFER_IN_BLOCKS);
}

m_last_height = (height < REORG_SAFETY_BUFFER_IN_BLOCKS) ? height : height - REORG_SAFETY_BUFFER_IN_BLOCKS;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale? You need to start voting from the height onwards again. This probably should be m_last_height = height; because you may not have the vote data anymore depending on the height you popped back to.

const crypto::public_key &node_key = state->nodes_to_test[node_index];

CRITICAL_REGION_LOCAL(m_lock);
bool vote_off_node = (m_uptime_proof_seen.find(node_key) != m_uptime_proof_seen.end());
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to just find the key, since the entries in the list should be under UPTIME_PROOF_MAX_IN_SECONDS +- 30seconds (depending on the pruner being called every 30s on idle).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a buffer. Right now there is a chance that we will drop the ping from 1 hour ago, and then this evaluation happens before the next ping arrives.

Copy link
Collaborator Author

@Doy-lee Doy-lee Aug 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If worked this out in my head correctly, this change would just require L155 to change from

if (m_uptime_proof_seen[pubkey] > now - UPTIME_PROOF_MAX_TIME_IN_SECONDS)
  return false; // already received one uptime proof for this node recently.

to

if (m_uptime_proof_seen[pubkey] >= now - (UPTIME_PROOF_FREQUENCY_IN_SECONDS / 2))
  return false; // already received one uptime proof for this node recently.

So we accept proofs every 30mins, everything else is the same and prune proofs older than 1hr 10mins.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sounds good

if (it->second < prune_from_timestamp)
m_uptime_proof_seen.erase(it);
else
it++;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed iterator inf loop.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be:

it = map.erase(it);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

}

m_last_height = (height < REORG_SAFETY_BUFFER_IN_BLOCKS) ? height : height - REORG_SAFETY_BUFFER_IN_BLOCKS;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment became outdated. So just rewriting it here.

What's the rationale? You need to start voting from the height onwards again. This probably should be m_last_height = height; because you may not have the vote data anymore depending on the height you popped back to.

" blocks which is greater than the recommended REORG_SAFETY_BUFFER_IN_BLOCKS: " << REORG_SAFETY_BUFFER_IN_BLOCKS);
}

m_last_height = height;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment became outdated, so just rewriting it here.
What's the rationale? You need to start voting from the height you popped back to and onwards again. We could recover from this by making a mandatory ping here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_last_height may be from well before the blockchain detach height.

Those blocks still need to be checked.

if m_last_height >= blockchain detach height then we have problems with our assumption that large reorgs will not happen

@@ -1004,6 +1033,8 @@ namespace cryptonote
epee::math_helper::once_a_time_seconds<60*2, false> m_deregisters_auto_relayer; //!< interval for checking re-relaying deregister votes
epee::math_helper::once_a_time_seconds<60*60*12, true> m_check_updates_interval; //!< interval for checking for new versions
epee::math_helper::once_a_time_seconds<60*10, true> m_check_disk_space_interval; //!< interval for checking for disk space
epee::math_helper::once_a_time_seconds<UPTIME_PROOF_FREQUENCY_IN_SECONDS, true> m_submit_uptime_proof_interval; //!< interval for submitting uptime proof
epee::math_helper::once_a_time_seconds<30, true> m_uptime_proof_pruner;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prune triggered every 30s if you're a service node.


void quorum_cop::blockchain_detached(uint64_t height)
{
uint64_t delta_height = std::abs(static_cast<int64_t>(m_last_height - height));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this abs() ?

This check should simply be if (m_last_height > height)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

return;

time_t const now = time(nullptr);
bool been_alive_for_2hrs = (now - m_core.get_start_time()) >= (60 * 60 * 2);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#define please, or const time_t, whatever

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


void quorum_cop::blockchain_detached(uint64_t height)
{
if (m_last_height >= height)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always an error, when a very large reorg occurs that it is before m_last_height.

This is sufficient:

if (m_last_height >= height)
{
    LOG_ERROR("The blockchain was detached to height: " << etc << " This should never happen, please report");
    m_last_height = height;
}

continue;
}

auto it = std::find_if(state->quorum_nodes.begin(), state->quorum_nodes.end(), [&my_pubkey](crypto::public_key const &key) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this not work? std::find(nodes->quorum_nodes.begin(), state->quorum_nodes.end(), my_pubkey);

@jackmurray90 jackmurray90 merged commit 049ce2d into oxen-io:dev Aug 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants