Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hanging for transactionCollector#collect to finish all the yields #3

Closed
katat opened this issue May 23, 2020 · 18 comments · Fixed by #15
Closed

Hanging for transactionCollector#collect to finish all the yields #3

katat opened this issue May 23, 2020 · 18 comments · Fixed by #15

Comments

@katat
Copy link

katat commented May 23, 2020

I am having a test to query all the transactions with the lock args 0xdde7801c073dfb3464c7b1f05b806bb2bbb84e99 using the transactionCollector.collect() generator, but it seems to have to take a very long time to finish the query. I started the process 10+ mins ago, and still not finish yet.

Meanwhile, I noticed the CKB node seems to be busy handling the requests initiated from lumos as it is utilizing 100% for one core of the CPU.

Any ideas on what is happening behind the scene?

Code: https://github.com/katat/lumos-performance-test/blob/2afa9f043e37865f97c3e042933076b1d707aae1/index.js#L41

@katat katat changed the title Take a very long time for transactionCollector#collect to finish all the yields Hanging for transactionCollector#collect to finish all the yields May 23, 2020
xxuejie added a commit that referenced this issue Jun 11, 2020
@xxuejie
Copy link
Collaborator

xxuejie commented Jun 11, 2020

This shall be resolved in #15

xxuejie added a commit that referenced this issue Jun 11, 2020
@katat
Copy link
Author

katat commented Jun 11, 2020

Thanks @xxuejie

I had a quick look at the PR. I guess the major change is to make the fetching function of the transaction hashes to be a iterator, which is good for optimizing the performance of the init time of transactionCollector#collect and make the data initialization more efficient.

Not sure if I got it right. Although this change can improve the init time of that function, for my example described above I estimate that it would still take at least 2 hours to finish the load of all the transactions.

@xxuejie
Copy link
Collaborator

xxuejie commented Jun 11, 2020

You got it right, and yes to fetch all of the transactions, it will take a long time. But my question is: in what scenario will we need to use all 1.6 million transactions at a time? I would personally assume the main use case here, is to display relevant transactions for a lock script, which you will naturally do paging to only show a small part of them first. The iterator is designed for this pattern.

@katat
Copy link
Author

katat commented Jun 11, 2020

It is because we need to cache all the transactions in our current database sqlite, so we can do flexible queries for the application needs, such as querying for the dead cells. If fetching all the transactions for some big wallets takes more than 2 hours, then it is much longer than the time indexer takes to index all the CKB blocks.

The other reason is for the convenience of paginating if these data are cached in databases like sqlite. Suppose there are 100 pages of transactions, it is straightforward to get any one of the pages with one SQL query without the need of loading all data and filtering. I am not sure how to do the page jump with this generator function.

@xxuejie
Copy link
Collaborator

xxuejie commented Jun 11, 2020

I think we need to talk about your specific use cases. It is really anti-pattern when you have to do indexing on your own when you are using lumos-indexer. Having a second indexing like SQLite is not something we would recommend here. You mentioned querying for dead cells, do you mind elaborate more what is exactly needed here?

Page jumps shouldn't be hard to support, for arbitrary jumps, lumos-indexer can provide another query parameter such as skip, that allows you to jump directly to, for example, page 100 and goes from there. In that sense I'd say there will be 2 querying patterns:

  1. When you go through consecutive pages, a single iterator will do the job
  2. When you jump to non-consecutive pages, you can simple create a new iterator with the skip query option

Of course it is possible to always create new iterator, but the fundamental data structures used in lumos-indexer(and I believe SQLite is also the case), are optimized for sequential read, so doing pattern 1 when you can, will result in better performance.

@katat
Copy link
Author

katat commented Jun 11, 2020

It will be very useful if it supports a skip query parameter to do the page jump. I am looking forward to it.

I think we need to talk about your specific use cases. It is really anti-pattern when you have to do indexing on your own when you are using lumos-indexer. Having a second indexing like SQLite is not something we would recommend here. You mentioned querying for dead cells, do you mind elaborate more what is exactly needed here?

Sure, let me try to introduce some of the existing use cases here:

  1. In the Neuron's DAO page, it needs to show both the deposits and completed transactions, which will require querying on the dead cells.
  2. Neuron has a mechanism to support one address for one transaction. If an address has been used for a transaction, the address will be removed from the addresses pool for future transactions, while the new ones will be created and put int the pool. This will need knowledge of the transaction history in order to properly organize the addresses pool.
  3. There are quite a number of places that need to filter out the cells with the combination of type script and lock script. This type of query is much more convenient and efficient to do with a SQL db.

@xxuejie
Copy link
Collaborator

xxuejie commented Jun 11, 2020

  1. If you are showing each individual transactions, the dead cells will be there, won't they? In addition, I think it's safe to assume those page have pagination, you are only showing some number of transactions but not all, I think this is totally doable in current design.
  2. I feel like this task suits better with the notifier mechanism we talked about somewhere. It should not be the job of an indexer.
  3. I personally don't feel those queries are that complicated enough to require a SQL query, with some basic and and or logic, we should be able to support them without much problems.

@katat
Copy link
Author

katat commented Jun 11, 2020

If you are showing each individual transactions, the dead cells will be there, won't they? In addition, I think it's safe to assume those page have pagination, you are only showing some number of transactions but not all, I think this is totally doable in current design.

Then this would mean it needs to fetch all the transactions with the generator, which would lead to the potential performance issue we discussed above.

@xxuejie
Copy link
Collaborator

xxuejie commented Jun 11, 2020

I don't see why you need to fetch all the transactions. If your page can show 10 entries, you can just fetch 10 transactions. Maybe you need to also fetch the transactions for each input cell in those 10 transactions, but that will be it. A second page would need 10 more transactions, but that is also another page. You won't need all to show the initial results.

@katat
Copy link
Author

katat commented Jun 11, 2020

I don't see why you need to fetch all the transactions. If your page can show 10 entries, you can just fetch 10 transactions. Maybe you need to also fetch the transactions for each input cell in those 10 transactions, but that will be it. A second page would need 10 more transactions, but that is also another page. You won't need all to show the initial results.

In the context of the DAO page, it needs to have the knowledge of the full transaction history so as to know which transactions have been done for the deposits/withdraw/unlock and display them in the application. I am not sure how the pagination could help in this case.

@xxuejie
Copy link
Collaborator

xxuejie commented Jun 11, 2020

which transactions have been done for the deposits/withdraw/unlock and display them in the application

I assume you mean which cells, not transactions. To me this part is quite simple:

  • From the transaction structure, you would already know the type of an output cell, such as deposited, in withdraw step 1.
  • For each output cell, there will only be one action: a deposited cell can only be in withdraw step 1, a withdraw step 1 cell can only be withdrawed. And it's quite simple to tell if the action has been performed: if the cell in question is a live cell, the action is not yet done, if the cell is not a live cell, the action has already been performed.

Now the question is converted to if there will be a way to tell an OutPoint represents a live cell, this will be a real simple operation we can support in the indexer.

If I didn't miss anything, this workflow solves our problem here.

@katat
Copy link
Author

katat commented Jun 11, 2020

Yes, the check will mainly on the cells.

I am just thinking about a scenario in which the state transactions of the cell is as below:

  1. cell(deposited) => 2. cell(withdrawal) => 3. cell(unlocked) => 4. cell(input of a payment) => 5. cell

Assume cell at 5 is a live cell, how can we know its history of DAO operations?

@xxuejie
Copy link
Collaborator

xxuejie commented Jun 11, 2020

First question: how is this a history of DAO operations? 3 -> 4 -> 5 will never be reliably checked. All you can check in NervosDAO operation, is from a deposited cell, to a withdraw step 1 cell, to any cell that is released in withdraw step 2, any step after that, cannot be reliably checked. And this history we are talking here will have an upper bound of 3 generations, which can be fetched without problems.

@katat
Copy link
Author

katat commented Jun 11, 2020

3 -> 4 -> 5 will never be reliably checked. All you can check in NervosDAO operation, is from a deposited cell, to a withdraw step 1 cell, to any cell that is released in withdraw step 2, any step after that, cannot be reliably checked.

If I got it right, this would mean it is not practical to check all the live cells for the DAO related operations.

@xxuejie
Copy link
Collaborator

xxuejie commented Jun 11, 2020

If I got it right, this would mean it is not practical to check all the live cells for the DAO related operations.

Hmm I don't get what this means.

@xxuejie
Copy link
Collaborator

xxuejie commented Jun 11, 2020

Also I've checked the current NervosDAO UI, I think our conflict now is how to track all completed Nervos DAO operations under current account. My gut feeling now, is that this might also be a task of the cell notifier module. We can think about how best to handle it in lumos, but I don't feel this is a task of lumos-indexer here right now.

@xxuejie
Copy link
Collaborator

xxuejie commented Jun 11, 2020

Upon further thinking, I think #16 would provide the solution for the problem here. To gather all completed Nervos DAO operations, one just need to query for all transactions with input_type set to Nervos DAO script, and input_lock set to specified lock script. There might still be minimal filtering needed at app side but that should be able to solve our issues here.

@katat
Copy link
Author

katat commented Jun 17, 2020

Hello @xxuejie

Could you please give an estimation on when it will support skip for arbitrary page jump? I appreciated it if this feature can be prioritized a bit, since we have to take care of the addresses having a large volume of transactions or live cells in the upcoming release of Neuron.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants