Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Superblock validation - Part II #1542

Merged

Conversation

jamescowens
Copy link
Member

Currently this has the new formulation for the per project uncached SB validation (ValidateSuperblock function)

@jamescowens jamescowens added this to the Fern milestone Oct 1, 2019
Copy link
Member

@cyrossignol cyrossignol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're going to deadlock cs_mapManifest and cs_mapParts. At least one place in main.cpp locks cs_mapManifest to call AlreadyHave(), which can then lock cs_mapParts. The scraper threads lock cs_mapManifest for various functions, the CSplitBlob destructor locks cs_mapParts, and the superblock validation will lock both.

Going to try to trace the locks/threads more closely...pretty hard to reason about at this point...we may need to think of a better synchronization strategy for these two.

if (const auto SBProjectIter = NewFormatSuperblock.m_projects.Try(iWhitelistProject.m_name))
{
// Pull the convergence hint (reduced content hash) for the project object.
uint32_t nReducedSBProjectObjectContentHash = SBProjectIter->m_convergence_hint;

for (const auto& iter : mProjectObjectsBinnedbyContent)
if (iter.first <= NewFormatSuperblock.m_timestamp && nReducedProjectObjectContentHash == nReducedSBProjectObjectContentHash)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a node's time is behind, this could result in a false validation failure. A scraper could publish a manifest with a time ahead of the perceived time that a node received the superblock. As in...

  1. Scraper detects new etag, pulls the stats, and publishes a new manifest
  2. 10 seconds later, a node 20 seconds behind stakes a superblock with that manifest
  3. Another node immediately receives the manifest and the superblock...
    manifest time: t - 10,
    superblock time: t - 20,
    where t is roughly the node's current time.
  4. Validation fails because the manifest was not considered when reconstructing the convergence

The superblock timestamp will be set to the timestamp of its containing block when received by a node. I don't think we need to check the timestamp here because the content hash should adequately filter out ineligible candidates.

Copy link
Member Author

@jamescowens jamescowens Oct 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I put this in there to try and conform the validation to the existing implementation of the formation of a convergence from the source node, while trying to deal with the exact opposite case that you describe above, which is that the convergence that is hinted by the incoming superblock may not be the latest by project convergence that can be formed on the receiving node.

Copy link
Member

@cyrossignol cyrossignol Oct 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may have to check every combination of the matching project parts (breadth-and-most-recent-first) to also protect against temporary network partitions or delays during the superblock window. Not a pretty algorithm...

There will only be one combination almost all of the time, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. And it is 99.9% likely the head of the list. But it is the 0.1% I am worried about. I do not want to get into an 2^n (n the number of projects) algorithm here for a 0.1% corner case. I have an idea. I will have to draw it out on a picture.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
image
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rare

To say the least 🙂

And this only matters in a fallback-to-project-level convergence.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the node has only been running for a short enough while that it is not cached...

But we will do the final algo anyway, because I like to be thorough. I don't think we should delay the interim release over this. Let's merge this tracking PR as is and continue our work on this on integrated_scraper_2.

src/scraper/scraper.cpp Show resolved Hide resolved
src/main.cpp Outdated Show resolved Hide resolved
@jamescowens
Copy link
Member Author

On the cs_mapManifest and cs_mapParts, unless the critical sections are locked and unlocked in different order we will not deadlock. We could probably solve the problem by moving to a single critical section to protect both entities, but I am loath to do that, because I am concerned about the loss of parallelism...

@jamescowens
Copy link
Member Author

Not that it absolutely proves anything, but I think that if we were to see a deadlock, we would have seen it already on the mainnet scraper node I am running with these changes, which has 250+connections that exercise the net code + doing the scraper work for two consecutive days.

@cyrossignol
Copy link
Member

I agree—let’s keep the parallelism if we can. I did not find any lock scenarios that will result in a deadlock either. Running this also now 👍

Copy link
Member

@cyrossignol cyrossignol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No locking issues observed. @jamescowens and I talked about the by-project manifest-to-superblock timestamp comparison and agreed to refactor it in a follow-up PR. The code is not yet active.

@jamescowens
Copy link
Member Author

Yep. We have some work to do on it. We know what we have to do...

@jamescowens jamescowens merged commit 9a798b2 into gridcoin-community:development Oct 5, 2019
@jamescowens jamescowens changed the title integrated_scraper_2 branch tracking PR Superblock validation - Part II Oct 22, 2019
jamescowens added a commit that referenced this pull request Oct 22, 2019
Added
 - Add testnet desktop launcher action for Linux #1516 (@caraka)
 - Shuffle vSideStakeAlloc if necessary to support sidestaking to more than 6 destinations #1532 (@jamescowens)
 - New Superblock format preparations for Fern #1526, #1542 (@jamescowens, @cyrossignol)
 - Multisigtools
   - Consolidate multisig unspent #1529 (@iFoggz)
   - Scanforunspent #1547 (@iFoggz)
   - consolidatemsunspent and scanforunspent bug fix #1561 (@iFoggz)
 - New banning misbehavior handling and Peers Tab on Debug Console #1537 (@jamescowens)
 - Reimplement getunconfirmedbalance rpc #1548 (@jamescowens)
 - Add CLI switch to display binary version #1553 (@cyrossignol)

Changed
 - Select smallest coins for contracts #1519 (@iFoggz)
 - Move some functionality from miner to SelectCoinsForStaking + Respect the coin reserve setting + Randomize UTXO order #1525 (@iFoggz)
 - For voting - if url does not contain http then add it #1531 (@iFoggz)
 - Backport newer serialization facilities from Bitcoin #1535 (@cyrossignol)
 - Refactor ThreadSocketHandler2() Inactivity checks #1538 (@iFoggz)
 - Update outdated checkpoints #1539 (@barton2526)
 - Change needed to build Gridcoin for OSX using homebrew #1540 (@Git-Jiro)
 - Optimize scraper traffic for expiring manifests #1542 (@jamescowens)
 - Move legacy neural vote warnings to debug log level #1560 (@cyrossignol)
 - Change banlist save interval to 5 minutes #1564 (@jamescowens)
 - Change default rpcconsole.ui window size to better support new Peers tab #1566 (@jamescowens)

Removed
 - Remove deprecated RSA weight and legacy kernel #1507 (@cyrossignol)

Fixed
 - Clean up compiler warnings #1521 (@cyrossignol)
 - Handle missing external CPID in client_state.xml #1530 (@cyrossignol)
 - Support boost 1.70+ #1533 (@iFoggz)
 - Fix diagnostics failed to make connection to NTP server #1545 (@Git-Jiro)
 - Install manpages in correct system location #1546 (@Git-Jiro)
 - Fix ability to show help and version without a config file #1553 (@cyrossignol)
 - Refactor QT UI variable names to be more consistent, Fix Difficulty default #1563 (@barton2526)
 - Fix two regressions in previous UI refactor #1565 (@barton2526)
 - Fix "Owed" amount in output of "magnitude" RPC method #1569 (@cyrossignol)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants