Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMD-0083: Relax Entry Constraints #83

Merged
merged 5 commits into from
Apr 14, 2024

Conversation

apfitzge
Copy link
Contributor

@apfitzge apfitzge commented Nov 2, 2023

The second in a series of SIMDs aimed at the relaxation of constraints blocking us from (potentially) having asynchronous execution.
This proposal does not lock Solana into moving towards asynchronous execution.
This proposal stands on its own to simplify the protocol and give more flexibility in implementation of block-production.

@apfitzge apfitzge changed the title SIMD-XXXX: Relax Entry Constraints SIMD-0083: Relax Entry Constraints Nov 2, 2023
@apfitzge apfitzge marked this pull request as ready for review November 2, 2023 06:10
Comment on lines 50 to 56
## Security Considerations

None

## Drawbacks

None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change breaks consensus in a pretty fundamental way, which is both a security issue and has some drawbacks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

second that

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lheeger-jump can you please elaborate how will it break the consensus.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Huzaifa696 - Per the below snippet, a self-conflicting entry is currently considered invalid, and will raise an error:
https://github.com/solana-labs/solana/blob/230779d4590ac625e6710117987ef505c0cf71c6/ledger/src/blockstore_processor.rs#L606-L619

That error results in the slot getting marked dead by ReplayStage:
https://github.com/solana-labs/solana/blob/230779d4590ac625e6710117987ef505c0cf71c6/core/src/replay_stage.rs#L2782-L2786

If the "rules" are changed to allow a self-conflicting entry, then a slot with a self-conflicting entry will no longer error / result in dead slot.

Suppose that the change to allow self-conflicting entries first lands in v1.18. Nodes running v1.17 will mark slots with a self-conflicting entry dead whereas nodes running v1.18 will allow it. This is obviously problematic during an upgrade period where the stake is evenly distributed between v1.17 and v1.18. As such, the change to allow self-conflicting entries would have to sit behind a feature gate, and only get enabled once the entire cluster has adopted v1.18.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I agree with the problem and the solution you explained.
In conclusion, the required changes for this SIMD to allow self-conflicting entries have to sit behind a feature gate.

So, as far as I understand, the flow is: approved SIMD -> feature gate issue -> PR (that implements the SIMD with a feature gate). Is that right? @steviez

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, as far as I understand, the flow is: approved SIMD -> feature gate issue -> PR (that implements the SIMD with a feature gate). Is that right?

I believe so, yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, as far as I understand, the flow is: approved SIMD -> feature gate issue -> PR (that implements the SIMD with a feature gate). Is that right? @steviez

@jacobcreech would know best.

@lheeger-jump can you please elaborate how will it break the consensus.

As @steviez mentioned, this changes the validation criteria for blocks, which affects whether they are rejected. It needs a feature gate which implies coordination with other clients.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved SIMD -> feature gate issue -> PR

That would be ideal to avoid any work that may be thrown away due to never getting consensus on a SIMD.

Comment on lines +38 to +44
## Detailed Design

Currently, if a transaction entry within a block contains transactions that
conflict, the entire block is marked invalid.
The proposal is that this constraint is removed entirely, and entries are
allowed to contain transactions that conflict with each other.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are clients supposed to handle write contention? Is it sequential in the entry? There is a lot to consider here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The execution in an entry is always sequential whether its execution or validation. So, we have to change the lock function in a way that allows conflicting txs in the same entry as they will be executed in a sequential manner. You can refer to this comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldnt hurt to add that the execution now becomes sequential in the transactions intra microblock.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This proposal is not making execution sequential within an entry. It has already been sequential, it is only that individual entries run in parallel (provided that they are not conflicting with each other).

@apfitzge can you please confirm this behavior.

Copy link
Contributor Author

@apfitzge apfitzge Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not currently sequential, and actually has no implied or guaranteed execution order during replay.
I will explicitly state that an entry has a sequential ordering, so for conflicting txs they must now be executed sequentially.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added clarifying comments in f76e7d3.
The proposed behavior is to execute conflciting transactions sequentially, in the order they appear in the entry.

Comment on lines +45 to +49
## Impact

- The protocol is simplified
- Block-production is more flexible

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has a large affect on consensus

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I don't disagree with your comments that a consensus altering change could fit many of the sections on this document (such as Impact / Security / Drawbacks), it could get repetitive to mention it multiple times. I'm wondering if mentioning it once is sufficient; maybe here given the prompt:

## Backwards Compatibility *(Optional)*
Does the feature introduce any breaking changes? All incompatibilities and
consequences should be listed.

Copy link
Contributor Author

@apfitzge apfitzge Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose I was under the assumption that all SIMDs are consensus breaking. Is that not the case?

I was under the asssumption that a consensus change and feature-gating was implicit all proposals here; I can be explicit about that.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose I was under the assumption that all SIMDs are consensus breaking. Is that not the case?

I was under the asssumption that a consensus change and feature-gating was implicit all proposals here; I can be explicit about that

I haven't been involved with enough SIMD's to know 😬

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All feature gates should be consensus changing, SIMDs do not need to be. A SIMD can propose application standards or networking changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f76e7d3: Added comments about consensus breakage under security considerations.

Comment on lines +60 to +62
This proposal is backwards compatible with the current protocol, since it only
relaxes constraints, and does not add any new constraints.
All previously valid blocks would still be valid.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to beat a dead horse, but Liam is correct about this being a consensus altering change.

All previously valid blocks would still be valid.

This is true. However, a block that was previously invalid might now be valid. Or stated another way, a block might be valid under this new set of rules but invalid under the old set. As with other consensus altering changes, this could create a problem during a network upgrade where say 60% is running with old rules / 40% with new rules. As such, this change would need to be rolled out with a feature gate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See other comment here: #83 (comment)

However, a block that was previously invalid might now be valid.

Your statement is true, but this seems like a *forward compatibility issue to me.
After the activation of the feature, we don't have previously invalid blocks because they cannot, by definition, be in the history. Therefore all previous (backward) blocks are still valid with the changes

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*forward compatibility issue to me.

Yep, agreed. I guess this gets back to whether a SIMD is implicitly thought of as a consensus breaking change or not. In any case, you're seemingly well aware of the implications of the change so marking this item as resolved, and we can figure out how explicit we need to be in stating consensus implications

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a small comment that newly produced blocks may not be valid under the old protocol: f76e7d3

Comment on lines +45 to +49
## Impact

- The protocol is simplified
- Block-production is more flexible

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I don't disagree with your comments that a consensus altering change could fit many of the sections on this document (such as Impact / Security / Drawbacks), it could get repetitive to mention it multiple times. I'm wondering if mentioning it once is sufficient; maybe here given the prompt:

## Backwards Compatibility *(Optional)*
Does the feature introduce any breaking changes? All incompatibilities and
consequences should be listed.

The current constraint that transaction entries must not conflict with each
other is unnecessary.
It is a constraint, that made the current labs-client implementation easier,
but is not required for the protocol to function correctly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should describe more what is intended behavior incase of transactions writing the same accounts.

Copy link

@Huzaifa696 Huzaifa696 Nov 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@godmodegalactus currently, what happens is that before actual execution, all accounts of an entry are loaded and then the same state is used for execution. This behavior does not allow "writing the same accounts" as is.
But if we carry the temporary changed state as the execution proceeds sequentially in an entry and then commit the final state then there is no issue in writing the same accounts in an entry.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an entry all the transactions should be able to run in parallel. If in an entry two transactions are loading same accounts, these transactions cannot be executed in parallel right. Even they will have to maintain ordering of the executing because if one is executed before another in replay then consensus will break. So i guess just explicitly mentioning these cases is important.

Copy link

@Huzaifa696 Huzaifa696 Nov 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the latest implementation, txs "within an entry" are not executed in parallel in block production or in replay, rather multiple entries run in parallel if they don't conflict with each other.

so, we are safe from the race condition scenario you mentioned

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@godmodegalactus added comments in f76e7d3. Conflicting transactions within an entry must be executed sequentially, in the order they appear in the entry.

@Huzaifa696 current master still does rebatching of transactions though. Prior versions, which are the majority of the network, even do randomization on the order of txs within the non-conflicting entries.

conflict, the entire block is marked invalid.
The proposal is that this constraint is removed entirely, and entries are
allowed to contain transactions that conflict with each other.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We implemented and tested the required changes to enable self conflicting entries in our lab setup and this improves the performance a lot depending on how you schedule the transactions on every execution thread.
The changes include:

  • changing account locking function
  • carrying temporary account state within the execution with every transaction in an entry

As the functions changed are used by both the executor and validator, so it doesn't break any consensus and rules of the chain, and is backward compatible.
These changes can be implemented through a feature gate.

@apfitzge we can add these points to the detailed design if you think it's appropriate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the functions changed are used by both the executor and validator, so it doesn't break any consensus and rules of the chain, and is backward compatible.
These changes can be implemented through a feature gate.

This changes the structure of blocks and that directly affects consensus. In Firedancer today you would break validation criteria for a block and there would be a hard fork from Solana Labs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lheeger-jump is correct, this is a consensus breaking change. If you produce a block that contains entries with conflicting transactions, existing validators will mark it as invalid. For this reason, it needs to sit behind a feature, and only be activated when both validator clients have implemented the feature-gate and those versions have been rolled out to cluster(s).

@apfitzge we can add these points to the detailed design if you think it's appropriate.

This doesn't seem to be appropriate for a SIMD. SIMDs should be independent of any specific client's implementation. The issue in the labs repo you created would be a more appropriate place to share implementation-specific code and any results.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've shared some implementation specifics on the issue—please take a look.

@apfitzge
Copy link
Contributor Author

apfitzge commented Jan 9, 2024

@lheeger-jump finally found time to update this. Added a list of the remaining constraints on individual entries.
Wasn't too sure how to list the bincode deserialization constraint, seems Richard took a shot at documenting that in the specs document: solana-foundation/specs#17

I think I've now hit all your requested changes, lmk if this is not the case.

@0xSol
Copy link

0xSol commented Apr 8, 2024

@lheeger-jump please review requested changes.

Copy link
Contributor

@lheeger-jump lheeger-jump left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Simple, concise and precise such that I can implement. Good work.

@0xSol
Copy link

0xSol commented Apr 9, 2024

@tao-stones could you or someone from Anza please review?

Copy link
Contributor

@tao-stones tao-stones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am happy with this change

@0xSol
Copy link

0xSol commented Apr 9, 2024

@t-nelson Could you please review then comment or approve from Anza side?

@lheeger-jump
Copy link
Contributor

lheeger-jump commented Apr 14, 2024

Since we have an approval from Anza in the form of @tao-stones (please become an official reviewer via process in SIMD-0007), and no substantive comments since January, I am going to merge this PR so it does not go stale.

Thanks @apfitzge for your patience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants