-
Notifications
You must be signed in to change notification settings - Fork 786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review Unique Index Usage #1952
Comments
@youkaicountry Most of work has been done. First measures shows significant reduction of mem. consumption:
|
Not bad. Although I don't think 300MB of 30GB is "significant reduction". |
Significant performance changes will appear after (finally) made decision to move from shared memory file based storage to rocksdb based one, which was not yet enough prioritized, even quite successfully done AH replacement in #2066. |
This ticket should be closed. This optimization should have never been implemented. It is the policy of this project that all consensus indexes must be
Yes, we do, and that's the problem: From a practical standpoint, that's an impossible level of "careful."
You can't assume that bugs will always be caught by reindexing. If the business logic iterates on non-unique indexes, it is possible to get nodes with undefined state synced to the live blockchain. I personally observed this, in production, in an early version of BitShares 2.0. Consequences?
Fortunately, I've never observed any of these cases except the "lucky" case on any live blockchain. I don't want Steem to be the first. (Shortly after the "lucky" case was observed on the BitShares 2.0 blockchain, an emergency witness update was implemented to change all non-unique indexes to unique indexes, and the policy forbidding non-unique indexes was added.)
One problem here is that "business logic" is already very complicated, it would require a lot of review to get a comprehensive list of the indexes that are iterated on. Worse, business logic is also a moving target. Every time we add any iteration code anywhere in the business logic, we would have to carefully check that we aren't adding reliance on any non-unique index. I don't trust our ability to always remember to do that check. I wouldn't even trust myself to always remember to do that check.
It isn't, and that's good. We need to resist the temptation to merge this, because it risks seriously breaking the blockchain. If the performance reward is small, then it is easier to resist the temptation. |
Any index that must be unique (id, account_name, etc) should remain unique. However, indices that do not need to be unique, can be ordered_nonunique so long as they also contain a unique key as part of the composite key and the index will happen to be unique, without needing to enforce those checks. For example, the comment object needs (author, permlink) and (id) to be unique indices. But the (payout, id) index does not need to be unique and the non-unique index will produce an identical ordering to the unique index with less overhead. I have not taken a look at the PR, but if we ever have a non-unique index that does not contain a unique key in the composite key, then it is blatantly wrong. |
Reopening because I want to have this discussion. |
As this isn't in the current sprint, and there are major reservations please stop work on it for the time being. We will discuss it in planning the next sprint. |
Since we still apparently have major process questions to answer, it seems like this is a good place to get one answered: AFAICT this work was actually done back on Jan 5th and only now committed (probably so it didn't get lost/out of sync). So no work of significance was done (again so far as I can tell, I haven't had a chance to talk to mtrela, but that's how I interpret the github info). Should such commits be avoided? Personally, I think it's better to get them into the central repo... |
I agree that pushing completed work should not be delayed due to a sprint. It looks weird to me two commits pushed 4 days and 12 days ago have the same (~1 month ago) commit age. Why would work that was completed a month ago be pushed like that Frankly, it does not matter. What has happened, happened. Any additional work on this issue is halted until the next sprint. As per process, we can discuss the merits of the issue before deciding to spend any more time on it. |
Because the commit was on another branch or repository (perhaps removed now) and at last be cherry-picked to a branch in the main repository. |
@mvandeberg @youkaicountry @theoreticalbts Most of the work was done in January. This task was resumed because of other works got finished/blocked last week (and that's why I put it into proposed tasks). Other planned tasks was done or are continued without any negative impact on the project. |
Just to clarify: @mariusztrela isn't currently working on this issue. But the team would like to resume it in the next sprint. |
Why don't I want this merged?
That is not true. As I outlined above, we experienced a very serious problem on a production blockchain due to not using As I explained before, "business logic" is a moving target. The Steem blockchain's design is not frozen; we're still adding new features to Steem. When new business logic is implemented, all the existing non- This kind of analysis is very hard to do. It's not scalable; it would take What do I recommend?
Fine, let's discuss. My position is simple: The explicit policy of this project ought to be (and, as far as I am aware, is) that no consensus index will be anything other than As a consequence of this policy, the above mentioned patch will never be merged, and no further developer time should be spent on this kind of code.
As far as I'm concerned, the answer is No. But can't we change our minds just this once?Let me further describe my decision making process:
There are very large risks from getting rid of the policy that the consensus indexes are |
Agreed. I think it is helpful for this conversation to define what this "kind of bug" is, because it is not related only to unique indices. It is non-determinism. (@theoreticalbts, I know you know this. Documenting for the sake the discussion) Non-unique indices that collide are allocated as they arrive. Undo state and fork logic can change the order in which objects are added to a multi-index container, resulting in a non-deterministic ordering and potential non-deterministic evaluation of operations that can fork the network. Implementing everything as a unique index guarantees a deterministic ordering. As I outlined above, there is a programatic way that we can guarantee that a non unique index does not suffer from non-determinism.
If we used code generation for our boost multi-index code then we can specify unique and non-unique keys and guarantee that every non-unique key meets this requirement when C++ is generated. We could get performance gains from the removal of unique indices without the option of programmer mistake leading to non-determinism. However, if that is the direction we want to go to guarantee determinism and increase performance, I would recommend we bundle it with the discussed IDL changes (#1964), which is a much larger issue that I would not recommend for the next sprint. |
Mentioned in #1923
We need to be careful with this optimization to preserve iteration ordering on some blockchain functions. Changing the ordering should result in failed reindexing if the unit tests do not catch it (Some of the changes are subtle enough to not show up in unit tests).
The text was updated successfully, but these errors were encountered: