Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review Unique Index Usage #1952

Closed
mvandeberg opened this issue Dec 27, 2017 · 14 comments
Closed

Review Unique Index Usage #1952

mvandeberg opened this issue Dec 27, 2017 · 14 comments
Assignees

Comments

@mvandeberg
Copy link
Contributor

Review unique index usage - there shall be left only these unique indices where such uniqueness is defined by business logic. Correct item ordering shall be still still preserved by superfluous storing 'id' of the object together with actual key. Removing unneeded uniqueness can allow reduction of key definition which now must match to enforced uniqueness.

Mentioned in #1923

We need to be careful with this optimization to preserve iteration ordering on some blockchain functions. Changing the ordering should result in failed reindexing if the unit tests do not catch it (Some of the changes are subtle enough to not show up in unit tests).

@mariusztrela mariusztrela self-assigned this Jan 5, 2018
mariusztrela pushed a commit that referenced this issue Jan 5, 2018
@vogel76 vogel76 removed their assignment Jan 16, 2018
mariusztrela pushed a commit that referenced this issue Feb 19, 2018
mariusztrela pushed a commit that referenced this issue Feb 27, 2018
@vogel76
Copy link
Contributor

vogel76 commented Mar 2, 2018

@youkaicountry Most of work has been done. First measures shows significant reduction of mem. consumption:

  • on fullnode without AH there was ~9GB less memory used
  • on consensus node 300MB less
    We have also ideas how to improve other cases.

@abitmore
Copy link
Contributor

abitmore commented Mar 2, 2018

Not bad. Although I don't think 300MB of 30GB is "significant reduction".

@vogel76
Copy link
Contributor

vogel76 commented Mar 2, 2018

Significant performance changes will appear after (finally) made decision to move from shared memory file based storage to rocksdb based one, which was not yet enough prioritized, even quite successfully done AH replacement in #2066.

@theoreticalbts
Copy link
Contributor

theoreticalbts commented Mar 2, 2018

This ticket should be closed. This optimization should have never been implemented. It is the policy of this project that all consensus indexes must be ordered_unique.

We need to be careful with this optimization to preserve iteration ordering on some blockchain functions.

Yes, we do, and that's the problem: From a practical standpoint, that's an impossible level of "careful."

Changing the ordering should result in failed reindexing if the unit tests do not catch it (Some of the changes are subtle enough to not show up in unit tests).

You can't assume that bugs will always be caught by reindexing. If the business logic iterates on non-unique indexes, it is possible to get nodes with undefined state synced to the live blockchain. I personally observed this, in production, in an early version of BitShares 2.0. Consequences?

  • If you're lucky, nodes which load from state files won't be able to sync to the network, alerting you that there's a problem before undefined state creeps into consensus.
  • In the worst case, block producers are affected by the undefined state, leading to forks and missed blocks.
  • In the absolute worst case, you have a completely broken blockchain which included blocks and transactions after a certain time based on the undefined state. But the undefined state is not reconstructable, so it is not reindexable past that time.

Fortunately, I've never observed any of these cases except the "lucky" case on any live blockchain. I don't want Steem to be the first. (Shortly after the "lucky" case was observed on the BitShares 2.0 blockchain, an emergency witness update was implemented to change all non-unique indexes to unique indexes, and the policy forbidding non-unique indexes was added.)

there shall be left only these unique indices where such uniqueness is defined by business logic

One problem here is that "business logic" is already very complicated, it would require a lot of review to get a comprehensive list of the indexes that are iterated on.

Worse, business logic is also a moving target. Every time we add any iteration code anywhere in the business logic, we would have to carefully check that we aren't adding reliance on any non-unique index. I don't trust our ability to always remember to do that check. I wouldn't even trust myself to always remember to do that check.

I don't think 300MB of 30GB is "significant reduction".

It isn't, and that's good. We need to resist the temptation to merge this, because it risks seriously breaking the blockchain. If the performance reward is small, then it is easier to resist the temptation.

@mvandeberg
Copy link
Contributor Author

Any index that must be unique (id, account_name, etc) should remain unique. However, indices that do not need to be unique, can be ordered_nonunique so long as they also contain a unique key as part of the composite key and the index will happen to be unique, without needing to enforce those checks.

For example, the comment object needs (author, permlink) and (id) to be unique indices. But the (payout, id) index does not need to be unique and the non-unique index will produce an identical ordering to the unique index with less overhead.

I have not taken a look at the PR, but if we ever have a non-unique index that does not contain a unique key in the composite key, then it is blatantly wrong.

@mvandeberg
Copy link
Contributor Author

Reopening because I want to have this discussion.

@mvandeberg mvandeberg reopened this Mar 2, 2018
@youkaicountry
Copy link
Contributor

As this isn't in the current sprint, and there are major reservations please stop work on it for the time being. We will discuss it in planning the next sprint.

@dnotestein
Copy link

Since we still apparently have major process questions to answer, it seems like this is a good place to get one answered: AFAICT this work was actually done back on Jan 5th and only now committed (probably so it didn't get lost/out of sync). So no work of significance was done (again so far as I can tell, I haven't had a chance to talk to mtrela, but that's how I interpret the github info). Should such commits be avoided? Personally, I think it's better to get them into the central repo...

@mvandeberg
Copy link
Contributor Author

I agree that pushing completed work should not be delayed due to a sprint. It looks weird to me two commits pushed 4 days and 12 days ago have the same (~1 month ago) commit age. Why would work that was completed a month ago be pushed like that

Frankly, it does not matter. What has happened, happened. Any additional work on this issue is halted until the next sprint. As per process, we can discuss the merits of the issue before deciding to spend any more time on it.

@abitmore
Copy link
Contributor

abitmore commented Mar 4, 2018

Why would work that was completed a month ago be pushed like that

Because the commit was on another branch or repository (perhaps removed now) and at last be cherry-picked to a branch in the main repository.

@vogel76
Copy link
Contributor

vogel76 commented Mar 5, 2018

@mvandeberg @youkaicountry @theoreticalbts Most of the work was done in January. This task was resumed because of other works got finished/blocked last week (and that's why I put it into proposed tasks). Other planned tasks was done or are continued without any negative impact on the project.
As a part of technical discussion, motivation to obligatory unique indices is wrong and misleading - unique index shall be used only when business logic needs it. Otherwise, it is a mess, hard to understand and maintain. Also supplementing keys by ID is wrong, negatively impacts performance and pollutes the code - its motivation was to put object at the same position (relatively to its siblings) during undo operation. To do it, undo state just shall additionally hold ID of the preceding object and insert restored one before that during restoring data. This way definitions of all indices (unique/non_unique) will use ID only if needed (mostly never) and insertion order during undo will be preserved.
Index uniqueness has no impact on iteration order.
According to results, better (absolute) values are present on FullNode version, where is 9GB less memory used after those changes.
What is worth to say - in this issue only one container (for votes) has been modified (1 or 2 indices changed). Task shall be continued to do similar changes in containers holding other types of objects (probably next will be comment container as another one holding most of data). Such changes make sense also for persistent layer built on RocksDB, because here we are changing logical definition of indices.

@dnotestein
Copy link

Just to clarify: @mariusztrela isn't currently working on this issue. But the team would like to resume it in the next sprint.

@theoreticalbts
Copy link
Contributor

Why don't I want this merged?

motivation to obligatory unique indices is wrong and misleading - unique index shall be used only when business logic needs it

That is not true. As I outlined above, we experienced a very serious problem on a production blockchain due to not using ordered_unique indexes.

As I explained before, "business logic" is a moving target. The Steem blockchain's design is not frozen; we're still adding new features to Steem. When new business logic is implemented, all the existing non-ordered_unique indexes need to be carefully analyzed to determine whether the reasoning by which non-ordered_unique was deemed acceptable is still valid for the new business logic.

This kind of analysis is very hard to do. It's not scalable; it would take O(n) developer-hours to review once we have n non-ordered-unique indexes. Additionally, it's not just putting in the time; you have to be very smart and very careful to catch this kind of bug by code review. I like to think I'm pretty smart and careful, and I don't even trust myself to always catch this kind of issue in code review.

What do I recommend?

Reopening because I want to have this discussion.

Fine, let's discuss.

My position is simple: The explicit policy of this project ought to be (and, as far as I am aware, is) that no consensus index will be anything other than ordered_unique.

As a consequence of this policy, the above mentioned patch will never be merged, and no further developer time should be spent on this kind of code.

resume it in the next sprint

As far as I'm concerned, the answer is No.

But can't we change our minds just this once?

Let me further describe my decision making process:

  • (a) The potential consequences of the kind of bug that can result from non-ordered_unique indexes are extremely bad.
  • (b) This kind of bug is not unlikely to occur, since it has occurred before.
  • (c) This kind of bug would be very difficult and time-consuming to catch manually, by careful coding or code review.
  • (d) Catching this kind of bug in an automated way (for example, by unit tests) also seems like a hard problem.
  • (e) It is easy to prevent this kind of bug with a simple policy that all consensus indexes are ordered_unique.
  • (f) The main motivation behind non-ordered_unique index is performance. According to both theory and measurement, the performance benefits of using non-ordered_unique indexes are small.

There are very large risks from getting rid of the policy that the consensus indexes are ordered_unique. In this case, the benefits are small, so there is no reason to even consider taking that kind of risk.

@mvandeberg
Copy link
Contributor Author

The potential consequences of the kind of bug that can result from non-ordered_unique indexes are extremely bad.

Agreed. I think it is helpful for this conversation to define what this "kind of bug" is, because it is not related only to unique indices. It is non-determinism. (@theoreticalbts, I know you know this. Documenting for the sake the discussion) Non-unique indices that collide are allocated as they arrive. Undo state and fork logic can change the order in which objects are added to a multi-index container, resulting in a non-deterministic ordering and potential non-deterministic evaluation of operations that can fork the network. Implementing everything as a unique index guarantees a deterministic ordering.

As I outlined above, there is a programatic way that we can guarantee that a non unique index does not suffer from non-determinism.

indices that do not need to be unique, can be ordered_nonunique so long as they also contain a unique key as part of the composite key and the index will happen to be unique, without needing to enforce those checks.

If we used code generation for our boost multi-index code then we can specify unique and non-unique keys and guarantee that every non-unique key meets this requirement when C++ is generated. We could get performance gains from the removal of unique indices without the option of programmer mistake leading to non-determinism.

However, if that is the direction we want to go to guarantee determinism and increase performance, I would recommend we bundle it with the discussed IDL changes (#1964), which is a much larger issue that I would not recommend for the next sprint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants