Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change DataCache.Add() to never allow overwriting existing values #1177

Closed
wants to merge 8 commits into from
Closed

Conversation

ixje
Copy link
Contributor

@ixje ixje commented Oct 22, 2019

close #1173

@ixje ixje changed the title Change Add() to never allow overwriting existing values Fix 1173 Oct 22, 2019
neo/IO/Caching/DataCache.cs Outdated Show resolved Hide resolved
@codecov-io
Copy link

codecov-io commented Oct 22, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@0f75741). Click here to learn what that means.
The diff coverage is 92.3%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #1177   +/-   ##
=========================================
  Coverage          ?   65.15%           
=========================================
  Files             ?      199           
  Lines             ?    13605           
  Branches          ?        0           
=========================================
  Hits              ?     8864           
  Misses            ?     4741           
  Partials          ?        0
Impacted Files Coverage Δ
neo/SmartContract/Native/PolicyContract.cs 94.23% <100%> (ø)
neo/IO/Caching/DataCache.cs 99.53% <88.88%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0f75741...105f7e9. Read the comment docs.

@@ -46,19 +46,19 @@ private bool CheckValidators(ApplicationEngine engine)
internal override bool Initialize(ApplicationEngine engine)
{
if (!base.Initialize(engine)) return false;
engine.Snapshot.Storages.Add(CreateStorageKey(Prefix_MaxBlockSize), new StorageItem
engine.Snapshot.Storages.GetAndChange(CreateStorageKey(Prefix_MaxBlockSize), () => new StorageItem
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change to GetAndChange()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the original code relied on the overwriting behaviour of the old Add() implementation. This now causes unit test failures because the value already exists and an exception is thrown. By using GetAndChange we basically reproduce the old Add() behaviour.

Alternatively I can insert some code like the following in all the affected tests to clear the backend data.

foreach (KeyValuePair<StorageKey, StorageItem> pair in snapshot.Storages.Find())
{
   snapshot.Storages.DeleteInternal(pair.Key);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to fix the unit test

@ixje
Copy link
Contributor Author

ixje commented Oct 22, 2019

I need help in fixing the following test

public void All_Tests_Cultures()

It fails on e.g. UT_GasToken, but if I run that test separately it passes fine so I don't know what this specific test is doing to break it.

bla

@shargon
Copy link
Member

shargon commented Oct 22, 2019

It's a problem of isolation, if some test alter the storages, and it doesn't clean this changes, it could affect the other test.. until you solve the other test, you can skip this class from Cultures like this:

image

Add the NotReRunnable attribute

@ixje
Copy link
Contributor Author

ixje commented Oct 23, 2019

It's a problem of isolation, if some test alter the storages, and it doesn't clean this changes, it could affect the other test.. until you solve the other test, you can skip this class from Cultures like this:

If I read something like

[TestInitialize]
public void TestSetup()
{
   TestBlockchain.InitializeMockNeoSystem();
   Store = TestBlockchain.GetStore();
}

Then I expect Store to be a fresh copy not affected by any previous test. Unfortunately this is not the case as concluded from running All_Tests_Cultures.

I clearly do not know enough about C# unit testing because I can't determine what test to fix if all tests pass when running individually (see my previous screenshot). I tried adding NotReRunnable to all classes that it initially flags as failures when I do "Run All" (tests) but no luck.

I'm thinking somebody needs to fix the global isolation problem of the test store.

@shargon
Copy link
Member

shargon commented Oct 23, 2019

I'm thinking somebody needs to fix the global isolation problem of the test store.

Yes, It is a big problem

@erikzhang erikzhang closed this Nov 25, 2019
@ixje
Copy link
Contributor Author

ixje commented Nov 26, 2019

Why is this closed? I believe the fix for Add() is still valid. There just wasn't any activity because there was a global issue with the TestBlockchain. If that's solved I can revisit the tests.

@erikzhang erikzhang reopened this Dec 6, 2019
@erikzhang
Copy link
Member

This fix is good. But it make Add() becomes slow.

@vncoelho
Copy link
Member

vncoelho commented Dec 7, 2019

@ixje, can you give a title to this FIX? aheuahuea, something to remember the idea behind it.

Another thing, could you take a look at #1318? During that tests we noticed some things strange (but I am also not an expert as @erikzhang to use all tools), however, it was noticed that:

  • After modifying storage with GetAndChange it stays modified;
  • Even after Snapshot.Commit()
  • The idea was to Dispose and Get the updated Snapshot from Blockchain.Singleton.GetSnapshot()
  • The idea worked nice, however, when the Snapshot would be modified locally and cleaned for the next state?

Thus, It mad me fell that some improvements on Cache might still be possible.
In this sense, I would like to understand better what you are fixing and also thank you for your efforts in doing that!

@shargon, thank you for the tips and examples during that tests on Consensus, helping with the manipulation of Cache.

@ixje ixje changed the title Fix 1173 Change DataCache.Add() to never allow overwriting existing values Dec 9, 2019
@ixje
Copy link
Contributor Author

ixje commented Dec 9, 2019

Another thing, could you take a look at #1318?

@vncoelho is this question still relevant as I see it's already merged?

During that tests we noticed some things strange (but I am also not an expert as @erikzhang to use all tools), however, it was noticed that:

  • After modifying storage with GetAndChange it stays modified;
  • Even after Snapshot.Commit()

I think that's expected. You create a snapshot, you get a value, you modify it. Now for as long as you keep requesting data from that snapshot it should get your modified state. If you create a second snapshot before calling commit on the first, then the value shouldn't be changed in the second snapshot, also not after committing the first snapshot. Note that this behaviour is specific to LevelDB. Committing with say a Postgresql backend would update snapshot 2 as there a snapshot is connection based.

Snapshot.Commit() writes to the real backend (which does a LevelDB.WriteBatch.write()) and makes it permanent.

  • The idea worked nice, however, when the Snapshot would be modified locally and cleaned for the next state?

Each call to GetSnapshot() creates a new snapshot with new local DataCaches. At least that is how it worked before #1249 , I haven't analyzed those changes yet.

@vncoelho
Copy link
Member

vncoelho commented Dec 9, 2019

@vncoelho is this question still relevant as I see it's already merged?

I think it is relevant, @ixje, because I did a workaround there.

Thanks for your explanation:

"(...) then the value shouldn't be changed in the second snapshot, also not after committing the first snapshot. (...)"

Thanks is strange, neither after committing the first?
In the case of that aforementioned UTs, we are:

  • Using Add to add Next_Validators (specifying a specific Multi-Sig for the Consensus) using a Snapshot obtained from Blockchain.Singleton
  • After commit even Blockchain.Singleton is being modified, thus, we need to revert, otherwise, the remaining tests will behave differently then expected, because we set random Validators for each test.
  • Thus, we update the consensus.context Snapshot by Disposing and seting it to Blockchain.Singleton.GetSnapshot() again.
  • Then, we delete the Next_Validators field and commit again.
  • We are ready for the other standard tests

In summary, I think, as you said, this is the expected, right? Thanks for your attention.

@vncoelho
Copy link
Member

vncoelho commented Dec 9, 2019

It fails on e.g. UT_GasToken, but if I run that test separately it passes fine so I don't know what this specific test is doing to break it.

Do you still suffer from this problem? I possible known how to fix this.

@vncoelho
Copy link
Member

vncoelho commented Dec 9, 2019

I'm thinking somebody needs to fix the global isolation problem of the test store.

It is possible to use like this @ixje and @shargon, we just need to ensure that every tests is moving the state back to the original standard.

The template of #1318 is good to be followed.

@vncoelho
Copy link
Member

vncoelho commented Dec 9, 2019

I rebase with master for fixing, @ixje.

If it is possible, it would be better if you open direct as a branch here.

@vncoelho
Copy link
Member

vncoelho commented Dec 9, 2019

@shargon @ixje @erikzhang

image

Perhaps we need to apply the same logic of Ut_Consensus to all these other UTs that use snapshot.commit(), reverting the Storage back at the end of the tests.

Check #1343

@vncoelho
Copy link
Member

vncoelho commented Dec 9, 2019

@ixje, the last error was here:

  X TestUpdatePoolForBlockPersisted [2ms]
  Error Message:
   Test method Neo.UnitTests.Ledger.UT_MemoryPool.TestUpdatePoolForBlockPersisted threw exception: 
System.ArgumentException: Value does not fall within the expected range.
  Stack Trace:
      at Neo.IO.Caching.DataCache`2.Add(TKey key, TValue value) in /home/runner/work/neo/neo/src/neo/IO/Caching/DataCache.cs:line 58
   at Neo.UnitTests.Ledger.UT_MemoryPool.TestUpdatePoolForBlockPersisted() in /home/runner/work/neo/neo/tests/neo.UnitTests/Ledger/UT_MemoryPool.cs:line 490
        public void TestUpdatePoolForBlockPersisted()
        {
            var snapshot = Blockchain.Singleton.GetSnapshot();
            byte[] transactionsPerBlock = { 0x18, 0x00, 0x00, 0x00 }; // 24
            byte[] feePerByte = { 0x00, 0x00, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00 }; // 1048576
            StorageItem item1 = new StorageItem
            {
                Value = transactionsPerBlock
            };
            StorageItem item2 = new StorageItem
            {
                Value = feePerByte
            };
            var key1 = CreateStorageKey(Prefix_MaxTransactionsPerBlock);
            var key2 = CreateStorageKey(Prefix_FeePerByte);
            key1.ScriptHash = NativeContract.Policy.Hash;
            key2.ScriptHash = NativeContract.Policy.Hash;
            snapshot.Storages.Add(key1, item1);
            snapshot.Storages.Add(key2, item2);

Perhaps, since the Key already exists we need to use GetAndChange now, yes?

Check 29cebaf

@vncoelho
Copy link
Member

vncoelho commented Dec 9, 2019

Another error is on CheckPolicy.

In fact, the value really already exists, thus, it needs GetAndChange instead of Add as well

        public void TestCheckPolicy()
        {
            Transaction tx = Blockchain.GenesisBlock.Transactions[0];
            var snapshot = Blockchain.Singleton.GetSnapshot();

            StorageKey storageKey = new StorageKey
            {
                ScriptHash = NativeContract.Policy.Hash,
                Key = new byte[sizeof(byte)]
            };
            storageKey.Key[0] = 15;
            var sITest = snapshot.Storages.TryGet(storageKey);
            Console.WriteLine($"sITest {sITest.Value.ToScriptHash()}");

            snapshot.Storages.GetAndChange(storageKey, () => new StorageItem
            {
                Value = new UInt160[] { tx.Sender }.ToByteArray()
            });

    /*
            snapshot.Storages.Add(storageKey, new StorageItem
            {
                Value = new UInt160[] { tx.Sender }.ToByteArray()
            });
*/
            NativeContract.Policy.CheckPolicy(tx, snapshot).Should().BeFalse();

            snapshot = Blockchain.Singleton.GetSnapshot();
            NativeContract.Policy.CheckPolicy(tx, snapshot).Should().BeTrue();
        }

However, the first assert is returning True

    Expected NativeContract.Policy.CheckPolicy(tx, snapshot) to be false, but found True.

@vncoelho
Copy link
Member

vncoelho commented Dec 9, 2019

In fact, it looks like that this test was wrong, I do not know why it would return False since it just checks BlockedAccounts?

Any idea, @erikzhang @shargon ?

@@ -231,21 +235,6 @@ public void TestCheckPolicy()
{
Transaction tx = Blockchain.GenesisBlock.Transactions[0];
var snapshot = Blockchain.Singleton.GetSnapshot();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erikzhang @eryeer @shargon, double check this because perhaps this test was wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about change it like this.

        [TestMethod]
        public void TestCheckPolicy()
        {
            Transaction tx = Blockchain.GenesisBlock.Transactions[0];
            var snapshot = Blockchain.Singleton.GetSnapshot();
            StorageKey storageKey = new StorageKey
            {
                ScriptHash = NativeContract.Policy.Hash,
                Key = new byte[sizeof(byte)]
            };
            storageKey.Key[0] = 15;
        
            var item = snapshot.Storages.GetAndChange(storageKey);
            item.Value = new UInt160[] { tx.Sender }.ToByteArray();
            NativeContract.Policy.CheckPolicy(tx, snapshot).Should().BeFalse();

            snapshot = Blockchain.Singleton.GetSnapshot();
            NativeContract.Policy.CheckPolicy(tx, snapshot).Should().BeTrue();
        }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried, it says True instead of False, that is why I believe the test was wrong. Why was it False in any situation?

Copy link
Contributor

@eryeer eryeer Dec 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using this code, the test will pass. We need to change the return value of snapshot.Storages.GetAndChange(storageKey) instead of passing a StorageItem creating function into it. Because the initalization will add a storageItem into storage, trackable.Item won't be null.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anhhh, I see, but it would be better to DeleteKey, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storageKey.Key[0] = 15; this key is the blockedAccount key

private const byte Prefix_BlockedAccounts = 15;

Because it is private, we can only use the 15 instead of Prefix_BlockedAccounts

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I partially understood! Thanks for the explanation, @eryeer.

However, I still need to figure out this part:

We need to change the return value of snapshot.Storages.GetAndChange(storageKey) instead of passing a StorageItem creating function into it. Because the initalization will add a storageItem into storage, trackable.Item won't be null.

Copy link
Contributor

@eryeer eryeer Dec 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄 Here is the initalization.

engine.Snapshot.Storages.Add(CreateStorageKey(Prefix_BlockedAccounts), new StorageItem
{
Value = new UInt160[0].ToByteArray()
});

So whenever we get the BlockedAccounts from Internal storage, the value will be new UInt160[0].ToByteArray() by default. We need to change the defult value instead of create a new one to add to storage.

Copy link
Member

@vncoelho vncoelho Dec 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that I understood.

This would also work:

        [TestMethod]
        public void TestCheckPolicy()
        {
            Transaction tx = Blockchain.GenesisBlock.Transactions[0];
            var snapshot = Blockchain.Singleton.GetSnapshot();
            StorageKey storageKey = new StorageKey
            {
                ScriptHash = NativeContract.Policy.Hash,
                Key = new byte[sizeof(byte)]
            };
            storageKey.Key[0] = 15;
        
            //var item = snapshot.Storages.GetAndChange(storageKey);
            //item.Value = new UInt160[] { tx.Sender }.ToByteArray();
            snapshot.Storages.Delete(storageKey);
            snapshot.Storages.GetAndChange(storageKey, () => new StorageItem
            {
                Value = new UInt160[] { tx.Sender }.ToByteArray()
            });
            snapshot.Commit();
            NativeContract.Policy.CheckPolicy(tx, snapshot).Should().BeFalse();
            snapshot = Blockchain.Singleton.GetSnapshot();

            snapshot.Storages.Delete(storageKey);
            snapshot.Storages.GetAndChange(storageKey, () => new StorageItem
            {
                Value = new UInt160[0].ToByteArray()
            });
            snapshot.Commit();

            NativeContract.Policy.CheckPolicy(tx, snapshot).Should().BeTrue();
        }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, congratulations!

@@ -487,8 +487,14 @@ public void TestUpdatePoolForBlockPersisted()
var key2 = CreateStorageKey(Prefix_FeePerByte);
key1.ScriptHash = NativeContract.Policy.Hash;
key2.ScriptHash = NativeContract.Policy.Hash;
snapshot.Storages.Add(key1, item1);
snapshot.Storages.Add(key2, item2);
snapshot.Storages.GetAndChange(key1, () => new StorageItem
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ixje, this change is expected, right? Since the value was previously existing.

Copy link
Contributor Author

@ixje ixje Dec 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vncoelho I don't remember, but it is most likely caused by the TestBlockchain isolation problem. I think the best way forward is to have that fixed first, reset all changes to the unit tests and try again. Some of the changes (like here) have likely been applied because of the isolation problem and might not be needed anymore once that is fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AHeauehuaea, @ixje, this change was done by me, please check the last commits.

The question was to be sure this was consistent.

All UTs were fixed, the PR is kind of ready to merge.

@vncoelho vncoelho dismissed erikzhang’s stale review December 9, 2019 22:54

Changed already done. Re-review requested.

Comment on lines +50 to +59
if (dictionary.TryGetValue(key, out Trackable trackable))
{
if (trackable.State != TrackState.Deleted)
throw new ArgumentException();
}
else
{
if (TryGetInternal(key) != null)
throw new ArgumentException();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we are using Add(), we assume that the data does not exist. If we can't assume that, we should use GetAndChange(). So, since we are using Add(), there is no need to add another data read operation.

Copy link
Contributor Author

@ixje ixje Dec 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind this PR (and the above code change) is to create consistent behaviour of the cache by not just validating that the value exists in cache, but to also validate that it does not exist in the real backend. You previously choose this exact solution (#1173 (comment)) instead of always overriding the data (described in the linked comment as option 1).

If we don't choose either option 1 or 2 we will have inconsistent behaviour as described in #1173

I don't mind which of the 2 we choose, but right now the feedback looks like you don't want to change anything?! 😕 I think it is wrong to assume that data does not exist and I think you agreed with that when you wrote the code initially, otherwise this check would not exists

    if (trackable.State != TrackState.Deleted)
                        throw new ArgumentException(); 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overhead of this check is very small, so just do it by the way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not understand the point as well, @erikzhang.

The implementation here seems to be option 2 of that aforementioned issue.
Do you want to leave the code as in the original implementation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to change it. The purpose of Add() is to add records when they do not exist, and save the cost of a read operation. If you are not sure that the record exists or not, you should use GetAndChange().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, @erikzhang.
It looks like to be more efficient in the original code, yes? In the sense of avoiding a second check with if (TryGetInternal(key) != null).

@ixje, if that is the case, perhaps it is better to close PR and issue and open another one for commenting the code properly in order to avoid future misinterpretations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So everybody basically wasted their time in creating the PR, reviewing it and fixing unit tests (still thanks for that @vncoelho) because 1.5 months after opening the PR and the initial review we're suddenly trying to save performance on a db operation that takes < 10 microseconds (according to leveldb benchmarks for random reads) while elsewhere in the code base millisecond delays are accepted.

Don't get me wrong here. I'm all cool with keeping performance in mind for reaching the highest TPS, but how are we supposed to make a sound decision what makes sense to spend time on when performance criteria seems randomly applied?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ixje, I really apologize for that.

I really did not have enough knowledge for checking it before.
In fact, I would like to thank you for your dedication. In addition, your insights helped me to investigate further.

Perhaps, there are methods that are not being used from the DataCache, such as GetOrAdd.
We need to solve these questions and improve what needs to be improved.

@ixje
Copy link
Contributor Author

ixje commented Dec 10, 2019

"(...) then the value shouldn't be changed in the second snapshot, also not after committing the first snapshot. (...)"

Thanks is strange, neither after committing the first?

For LevelDB not. LevelDB makes a 100% separate copy and this copy will not listen anymore to new changes happening to the real DB. Here's what it looks like for LevelDB
leveldb
and this is for Postgresql
postgresql

Note how for postgresql there is a syncing between the real backend and any outstanding snapshots (=a network connection/session in their case).

I think the big problem with the tests is that this TestBlockchain is using the same Blockchain.singleton everytime. It needs to be a new, clean one for each call.

@vncoelho
Copy link
Member

Such a nice explanation and figure, @ixje. Did you create that?

@ixje
Copy link
Contributor Author

ixje commented Dec 10, 2019

Such a nice explanation and figure, @ixje. Did you create that?

Yes, with the free online https://www.draw.io/ platform

@@ -47,19 +47,19 @@ private bool CheckValidators(ApplicationEngine engine)
internal override bool Initialize(ApplicationEngine engine)
{
if (!base.Initialize(engine)) return false;
engine.Snapshot.Storages.Add(CreateStorageKey(Prefix_MaxBlockSize), new StorageItem
engine.Snapshot.Storages.GetAndChange(CreateStorageKey(Prefix_MaxBlockSize), () => new StorageItem
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be the same, if we already know that the item doesn't exists, why we need to call GetAndChange?

@ixje
Copy link
Contributor Author

ixje commented Dec 10, 2019

closing according to #1177 (comment)

@ixje ixje closed this Dec 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Why does DataCache.Add() not check the real backend?
6 participants