Hanging for transactionCollector#collect to finish all the yields #3

katat · 2020-05-23T06:58:09Z

I am having a test to query all the transactions with the lock args 0xdde7801c073dfb3464c7b1f05b806bb2bbb84e99 using the transactionCollector.collect() generator, but it seems to have to take a very long time to finish the query. I started the process 10+ mins ago, and still not finish yet.

Meanwhile, I noticed the CKB node seems to be busy handling the requests initiated from lumos as it is utilizing 100% for one core of the CPU.

Any ideas on what is happening behind the scene?

Code: https://github.com/katat/lumos-performance-test/blob/2afa9f043e37865f97c3e042933076b1d707aae1/index.js#L41

The text was updated successfully, but these errors were encountered:

This should fix #3

xxuejie · 2020-06-11T07:06:56Z

This shall be resolved in #15

This should fix #3

katat · 2020-06-11T07:36:24Z

Thanks @xxuejie

I had a quick look at the PR. I guess the major change is to make the fetching function of the transaction hashes to be a iterator, which is good for optimizing the performance of the init time of transactionCollector#collect and make the data initialization more efficient.

Not sure if I got it right. Although this change can improve the init time of that function, for my example described above I estimate that it would still take at least 2 hours to finish the load of all the transactions.

xxuejie · 2020-06-11T07:38:31Z

You got it right, and yes to fetch all of the transactions, it will take a long time. But my question is: in what scenario will we need to use all 1.6 million transactions at a time? I would personally assume the main use case here, is to display relevant transactions for a lock script, which you will naturally do paging to only show a small part of them first. The iterator is designed for this pattern.

katat · 2020-06-11T07:52:12Z

It is because we need to cache all the transactions in our current database sqlite, so we can do flexible queries for the application needs, such as querying for the dead cells. If fetching all the transactions for some big wallets takes more than 2 hours, then it is much longer than the time indexer takes to index all the CKB blocks.

The other reason is for the convenience of paginating if these data are cached in databases like sqlite. Suppose there are 100 pages of transactions, it is straightforward to get any one of the pages with one SQL query without the need of loading all data and filtering. I am not sure how to do the page jump with this generator function.

xxuejie · 2020-06-11T07:58:12Z

I think we need to talk about your specific use cases. It is really anti-pattern when you have to do indexing on your own when you are using lumos-indexer. Having a second indexing like SQLite is not something we would recommend here. You mentioned querying for dead cells, do you mind elaborate more what is exactly needed here?

Page jumps shouldn't be hard to support, for arbitrary jumps, lumos-indexer can provide another query parameter such as skip, that allows you to jump directly to, for example, page 100 and goes from there. In that sense I'd say there will be 2 querying patterns:

When you go through consecutive pages, a single iterator will do the job
When you jump to non-consecutive pages, you can simple create a new iterator with the skip query option

Of course it is possible to always create new iterator, but the fundamental data structures used in lumos-indexer(and I believe SQLite is also the case), are optimized for sequential read, so doing pattern 1 when you can, will result in better performance.

katat · 2020-06-11T08:19:48Z

It will be very useful if it supports a skip query parameter to do the page jump. I am looking forward to it.

I think we need to talk about your specific use cases. It is really anti-pattern when you have to do indexing on your own when you are using lumos-indexer. Having a second indexing like SQLite is not something we would recommend here. You mentioned querying for dead cells, do you mind elaborate more what is exactly needed here?

Sure, let me try to introduce some of the existing use cases here:

In the Neuron's DAO page, it needs to show both the deposits and completed transactions, which will require querying on the dead cells.
Neuron has a mechanism to support one address for one transaction. If an address has been used for a transaction, the address will be removed from the addresses pool for future transactions, while the new ones will be created and put int the pool. This will need knowledge of the transaction history in order to properly organize the addresses pool.
There are quite a number of places that need to filter out the cells with the combination of type script and lock script. This type of query is much more convenient and efficient to do with a SQL db.

xxuejie · 2020-06-11T08:31:13Z

If you are showing each individual transactions, the dead cells will be there, won't they? In addition, I think it's safe to assume those page have pagination, you are only showing some number of transactions but not all, I think this is totally doable in current design.
I feel like this task suits better with the notifier mechanism we talked about somewhere. It should not be the job of an indexer.
I personally don't feel those queries are that complicated enough to require a SQL query, with some basic and and or logic, we should be able to support them without much problems.

katat · 2020-06-11T08:35:09Z

If you are showing each individual transactions, the dead cells will be there, won't they? In addition, I think it's safe to assume those page have pagination, you are only showing some number of transactions but not all, I think this is totally doable in current design.

Then this would mean it needs to fetch all the transactions with the generator, which would lead to the potential performance issue we discussed above.

xxuejie · 2020-06-11T08:37:33Z

I don't see why you need to fetch all the transactions. If your page can show 10 entries, you can just fetch 10 transactions. Maybe you need to also fetch the transactions for each input cell in those 10 transactions, but that will be it. A second page would need 10 more transactions, but that is also another page. You won't need all to show the initial results.

katat · 2020-06-11T08:40:32Z

I don't see why you need to fetch all the transactions. If your page can show 10 entries, you can just fetch 10 transactions. Maybe you need to also fetch the transactions for each input cell in those 10 transactions, but that will be it. A second page would need 10 more transactions, but that is also another page. You won't need all to show the initial results.

In the context of the DAO page, it needs to have the knowledge of the full transaction history so as to know which transactions have been done for the deposits/withdraw/unlock and display them in the application. I am not sure how the pagination could help in this case.

xxuejie · 2020-06-11T08:45:48Z

which transactions have been done for the deposits/withdraw/unlock and display them in the application

I assume you mean which cells, not transactions. To me this part is quite simple:

From the transaction structure, you would already know the type of an output cell, such as deposited, in withdraw step 1.
For each output cell, there will only be one action: a deposited cell can only be in withdraw step 1, a withdraw step 1 cell can only be withdrawed. And it's quite simple to tell if the action has been performed: if the cell in question is a live cell, the action is not yet done, if the cell is not a live cell, the action has already been performed.

Now the question is converted to if there will be a way to tell an OutPoint represents a live cell, this will be a real simple operation we can support in the indexer.

If I didn't miss anything, this workflow solves our problem here.

katat · 2020-06-11T08:53:27Z

Yes, the check will mainly on the cells.

I am just thinking about a scenario in which the state transactions of the cell is as below:

cell(deposited) => 2. cell(withdrawal) => 3. cell(unlocked) => 4. cell(input of a payment) => 5. cell

Assume cell at 5 is a live cell, how can we know its history of DAO operations?

xxuejie · 2020-06-11T08:56:30Z

First question: how is this a history of DAO operations? 3 -> 4 -> 5 will never be reliably checked. All you can check in NervosDAO operation, is from a deposited cell, to a withdraw step 1 cell, to any cell that is released in withdraw step 2, any step after that, cannot be reliably checked. And this history we are talking here will have an upper bound of 3 generations, which can be fetched without problems.

katat · 2020-06-11T09:03:03Z

3 -> 4 -> 5 will never be reliably checked. All you can check in NervosDAO operation, is from a deposited cell, to a withdraw step 1 cell, to any cell that is released in withdraw step 2, any step after that, cannot be reliably checked.

If I got it right, this would mean it is not practical to check all the live cells for the DAO related operations.

xxuejie · 2020-06-11T09:05:38Z

If I got it right, this would mean it is not practical to check all the live cells for the DAO related operations.

Hmm I don't get what this means.

xxuejie · 2020-06-11T09:18:23Z

Also I've checked the current NervosDAO UI, I think our conflict now is how to track all completed Nervos DAO operations under current account. My gut feeling now, is that this might also be a task of the cell notifier module. We can think about how best to handle it in lumos, but I don't feel this is a task of lumos-indexer here right now.

xxuejie · 2020-06-11T10:08:40Z

Upon further thinking, I think #16 would provide the solution for the problem here. To gather all completed Nervos DAO operations, one just need to query for all transactions with input_type set to Nervos DAO script, and input_lock set to specified lock script. There might still be minimal filtering needed at app side but that should be able to solve our issues here.

katat · 2020-06-17T00:51:55Z

Hello @xxuejie

Could you please give an estimation on when it will support skip for arbitrary page jump? I appreciated it if this feature can be prioritized a bit, since we have to take care of the addresses having a large volume of transactions or live cells in the upcoming release of Neuron.

fix website build error

katat changed the title ~~Take a very long time for transactionCollector#collect to finish all the yields~~ Hanging for transactionCollector#collect to finish all the yields May 23, 2020

katat mentioned this issue May 27, 2020

Suggestions for new indexer APIs #4

Closed

xxuejie added a commit that referenced this issue Jun 11, 2020

feat: Use iterator to fetch transaction one at a time

2eaba3c

This should fix #3

xxuejie mentioned this issue Jun 11, 2020

feat: Use iterator to fetch transaction one at a time #15

Merged

xxuejie added a commit that referenced this issue Jun 11, 2020

feat: Use iterator to fetch transaction one at a time

103a693

This should fix #3

xxuejie closed this as completed in #15 Jun 12, 2020

bitrocks mentioned this issue Jul 7, 2020

feat: Add more query options(skip, fromBlock, toBlock) to CellCollector #49

Merged

homura added a commit that referenced this issue Jan 21, 2022

Merge pull request #3 from sighWang/integrete-api-docs

0827367

fix website build error

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hanging for transactionCollector#collect to finish all the yields #3

Hanging for transactionCollector#collect to finish all the yields #3

katat commented May 23, 2020

xxuejie commented Jun 11, 2020

katat commented Jun 11, 2020 •

edited

Loading

xxuejie commented Jun 11, 2020

katat commented Jun 11, 2020

xxuejie commented Jun 11, 2020

katat commented Jun 11, 2020 •

edited

Loading

xxuejie commented Jun 11, 2020

katat commented Jun 11, 2020

xxuejie commented Jun 11, 2020

katat commented Jun 11, 2020

xxuejie commented Jun 11, 2020

katat commented Jun 11, 2020

xxuejie commented Jun 11, 2020

katat commented Jun 11, 2020

xxuejie commented Jun 11, 2020

xxuejie commented Jun 11, 2020

xxuejie commented Jun 11, 2020

katat commented Jun 17, 2020 •

edited

Loading

Hanging for transactionCollector#collect to finish all the yields #3

Hanging for transactionCollector#collect to finish all the yields #3

Comments

katat commented May 23, 2020

xxuejie commented Jun 11, 2020

katat commented Jun 11, 2020 • edited Loading

xxuejie commented Jun 11, 2020

katat commented Jun 11, 2020

xxuejie commented Jun 11, 2020

katat commented Jun 11, 2020 • edited Loading

xxuejie commented Jun 11, 2020

katat commented Jun 11, 2020

xxuejie commented Jun 11, 2020

katat commented Jun 11, 2020

xxuejie commented Jun 11, 2020

katat commented Jun 11, 2020

xxuejie commented Jun 11, 2020

katat commented Jun 11, 2020

xxuejie commented Jun 11, 2020

xxuejie commented Jun 11, 2020

xxuejie commented Jun 11, 2020

katat commented Jun 17, 2020 • edited Loading

katat commented Jun 11, 2020 •

edited

Loading

katat commented Jun 11, 2020 •

edited

Loading

katat commented Jun 17, 2020 •

edited

Loading