Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

itest: add optional 10k asset test #325

Merged
merged 10 commits into from
Jun 22, 2023
Merged

itest: add optional 10k asset test #325

merged 10 commits into from
Jun 22, 2023

Conversation

jharveyb
Copy link
Collaborator

@jharveyb jharveyb commented May 25, 2023

Adds support for optional itests, starting with a stress test of the minter with a batch of 10,000 assets, each with 4kB of metadata.

Fixes #361.

Copy link
Member

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I like the idea with the two test lists, probably a bit easier to understand (and also to implement) than the build tag variant.

Makefile Outdated Show resolved Hide resolved
@jharveyb
Copy link
Collaborator Author

Looks good! I like the idea with the two test lists, probably a bit easier to understand (and also to implement) than the build tag variant.

Another motivating reason was that trying to use mutually exclusive build tags to swap out the test list caused some IDE issues with gopls, which could cause issues later on.

@jharveyb
Copy link
Collaborator Author

jharveyb commented May 31, 2023

Updated with a minimal batch minting stress test, which reliably reproduces the first perf. issue mentioned in #314.

With a batchSize of 100, minting succeeds, and we can also reproduce #313.

TODO list:

  • Use a custom flag for triggering optional itests instead of testing.short
  • Extend the batch minting stress test with other asserts like correct proof upload to the universe server, correct group anchor, correct syncing of all universe leaves to a second tapd, etc.

@jharveyb jharveyb changed the base branch from main to universe_rpc_fixes June 9, 2023 20:16
@jharveyb jharveyb marked this pull request as ready for review June 12, 2023 17:46
@jharveyb jharveyb changed the base branch from universe_rpc_fixes to main June 12, 2023 17:47
@jharveyb jharveyb requested a review from guggero June 12, 2023 17:47
@lightninglabs-deploy
Copy link

@guggero: review reminder

@jharveyb jharveyb force-pushed the 10k_asset_test branch 2 times, most recently from 8ccece5 to 40073b8 Compare June 13, 2023 19:07
Copy link
Member

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool test, great to know we can mint that many assets in a decent time. And also great to have found some performance tweaks along the way with this!

itest/assertions.go Outdated Show resolved Hide resolved
itest/assertions.go Outdated Show resolved Hide resolved
itest/assertions.go Show resolved Hide resolved
itest/assertions.go Outdated Show resolved Hide resolved
itest/multi_asset_group_test.go Outdated Show resolved Hide resolved
itest/assets_test.go Outdated Show resolved Hide resolved
universe/base.go Outdated Show resolved Hide resolved
itest/mint_batch_stress_test.go Show resolved Hide resolved
itest/assets_test.go Outdated Show resolved Hide resolved
@guggero
Copy link
Member

guggero commented Jun 20, 2023

I pushed up a couple of fixes and confirmed the 1k test succeeds locally on my machine both with SQLite and Postgres. The 10k test still takes too long to finish within the 60m test timeout, so I think we first need to get some of the optimizations in.
But IMO all the fixes in this PR are enough to justify getting it in as it currently is.

So there are multiple fixes in here that address #361.

FROM universe_leaves leaves
JOIN genesis_info_view gen
ON leaves.asset_genesis_id = gen.gen_asset_id
WHERE gen.asset_id = @asset_id
RIGHT OUTER JOIN key_group_info_view groups
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think we can simplify this, and just target the group key directly.

Retracing a bit of history here, the query first looked like this: https://github.com/lightninglabs/taproot-assets/blame/c037025ebb22c54087736589bd31fb570a04b738/tapdb/sqlc/queries/universe.sql#L101-L105

The issue with that is that the asset_id field there for assets with a group key is basically whichever asset was inserted first. As a result, if you tried to log the sync of one of those leaves, then things fail as maybe that was the wrong leaf.

We then changed it to target the leaf instead (since we know the asset id of the leaf), then use that to get the pointer ID back to the root.

However if we have the group key, then we can just target that diretly. So we can have a query that looks similar to the old one:

    SELECT id
    FROM universe_roots
    WHERE group_key = @group_key

So with this, we don't need to modify that other view, we can just combine the queries and pick whichever doesn't return NULL

SELECT COALESCE(
    (SELECT leaves.universe_root_id AS id
    FROM universe_leaves leaves
    JOIN genesis_info_view gen
        ON leaves.asset_genesis_id = gen.gen_asset_id
    WHERE gen.asset_id = @asset_id),
    (SELECT id
    FROM universe_roots
    WHERE group_key = @group_key)
) AS id;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! I think that did the trick (I went with a slightly different approach since I'm not sure how COALESCE behaves if there are no results or multiple results returned by a sub query). But the idea is the same.

In this commit, we add additional make commands to run optional itests,
where specific test cases are still specified with the 'icase' variable.
Additional optional itests are added in the same way as default itests.
We also remove the old-style build tags that were replaced with Go 1.17.
In this commit, we update the logic used in all itests to mint assets
to reduce the number of RPC calls made and total time spent on asserts.
We also separate waiting for the planter to change state and checking
the daemon state.
@jharveyb jharveyb requested a review from Roasbeef June 21, 2023 16:10
@jharveyb
Copy link
Collaborator Author

LGTM but not approving b/c original author.

Copy link
Contributor

@ffranr ffranr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work guys!

universe/base.go Show resolved Hide resolved
universe/interface.go Outdated Show resolved Hide resolved
tapgarden/caretaker.go Show resolved Hide resolved
tapdb/sqlc/queries/universe.sql Show resolved Hide resolved
guggero and others added 7 commits June 22, 2023 08:40
Because the String() function just returns the hash of the ID, it is
hard to debug whether both the asset ID and group key are set during
universe sync. This commit adds a bit more information to certain log
lines.
In this commit, we add a test that mints a large batch of collectibles
with large metadata, and then checks that the batch mint succeeded.
This includes correctly updating the universe server of the minting
node, and syncing that universe tree to a second node.
If SQLITE_BUSY is returned, it means waiting for a transaction failed
because too many other readers/writers are currently using the DB. We
can detect that error, convert it into a serialization error and detect
that in the existing retry loop.
We also re-try if creating or committing a transaction fails with a
serialization error.
@guggero guggero added this pull request to the merge queue Jun 22, 2023
Merged via the queue into main with commit 6c4bd43 Jun 22, 2023
11 checks passed
@guggero guggero deleted the 10k_asset_test branch June 22, 2023 07:50
// operations to prepare and validate the keys. But since we're now
// actually starting to write, we want to limit this to a single writer
// at a time.
b.registrationMtx.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should revert this change. We want to make sure that our db concurrency control can handle situations like this. For sqlite, there's only a single writer. For postgres, the new serialization mode should detect conflicts, then abort so it can retry safely.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this change we ran into "number of retries exhausted" errors quite quickly, both on SQLite and Postgres. That's mainly because we constantly sync multiple leaves at a time (number of CPUs in parallel).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, perhaps we need to increase either the back off timeout, or the number of attempts (I just kinda chose 10 randomly). Wanting to try to tune these params, as otherwise we may end up hitting this in other areas that have more involved db transactions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like batching these transactions would also work? E.x. commit batches of 5 issuance proofs vs. single proofs.

IIUC in all cases where we hit this issue we also don't need to write these proofs individually to recover from a restart.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that'll work too, rn we just do them one by one, but we can also batch the insertion as well. We can easily batch when adding to our own universe (that for loop in the planter), and then for RPC, we can let it be a repeated field.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

tapgarden+universe: DB issues with universe updates after minting or universe sync
6 participants