Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix accounts_receivable RPC command with sorting and count #3845

Conversation

JerzyStanislawski
Copy link
Contributor

Until now accounts_receivable RPC command retrieved only first count blocks, then sorted them by amount. The correct flow is to get all the receivable blocks, sort them by amount and apply count at the end, which this PR is about.
Related issue: #3801

@clemahieu
Copy link
Contributor

clemahieu commented Jun 27, 2022

I'd like to improve the algorithm we use here since we're in the area.

The first issue I see is all the pending entries are added followed at the end by clamping to the requested size. This consumes memory proportional to the total number of pending entries but we can solve this a more efficient way. Every time we add an item to the map if we check to see if the map has exceeded the requested size we can erase the minimum item in the map.

The second issue is using conditional logic for checking if sorting was requested. This can be replaced with a strategy pattern that only needs to be constructed once and then the rest of the algorithm proceeds simply by iterating all pending entries and pasing them through the strategy object.

It would be something like: ledger pending entry -> strategy -> result map -> boost ptree

I see that across the RPCs we use 3 different ways to sort things, maps, sorting a vector, and sorting directly in the boost ptree. We should use maps unless there's a quantifyable reason not to since they'll generally be the most efficient. Stable sort is guaranteed by std::multimap "The order of the key-value pairs whose keys compare equivalent is the order of insertion and does not change. (since C++11)" https://en.cppreference.com/w/cpp/container/multimap

We would have a class "result_strategy" which has a pure virtual function result_strategy::operator () (std::pair<nano::pending_key, nano::pending_info>). The main rpc handler function would iterate all pending entries and pass in each item. The non-sorted version would simply insert each item in to the ptree. The sorted version would have an std::multimap<nano::pending_key, nano::pending_info> object state and operator () would insert/trim on each call. The sorted destructor would insert all items from the std::multimap in to the boost ptree.

Future work would be making this a template that could be used in the other places we do this same pattern to eliminate duplicate code.

@JerzyStanislawski
Copy link
Contributor Author

Thank you, good points, I'll try to work on that.

@JerzyStanislawski
Copy link
Contributor Author

@clemahieu do we want to introduce sorting field in wallet_receivable?

@JerzyStanislawski JerzyStanislawski force-pushed the rpc_accounts_receivable_sorting_fix branch from e8a3bde to fee4b88 Compare July 1, 2022 04:42
@JerzyStanislawski
Copy link
Contributor Author

PR updated, following your suggestion I used strategy pattern on two layers - for sorting/non-sorting and for response format (which fields to use in response).
I unified all three receivable commands, so that all of them support sorting and the same set of options - source, min_version, offset, sorting, threshold, count. Let me know if this is not desired for some reason and should be reverted.
The next step I think would be unification of tests, possibly covering more scenarios. Just wanted to know your opinion on implementation and unification first.

@JerzyStanislawski
Copy link
Contributor Author

Hi @clemahieu, I know it's been a long time and it's not a priority now, however would you be able to take a look at changes here and comment if this is what you expected?

@dsiganos
Copy link
Contributor

This is a great time to revisit this, just as a new cycle is about to begin.
@JerzyStanislawski , to make it easier to review your code, could you describe your design?
A diagram would be awesome, if possible.

@JerzyStanislawski
Copy link
Contributor Author

JerzyStanislawski commented Dec 27, 2022

Sure, so the issue is related to 3 methods - receivable, accounts_receivable and wallet_receivable, in which we apply pretty similar algorithm. In current solution we iterate through pending blocks and build result tree applying conditional logic based on input parameters. Then we sort the tree (if necessary - according to request parameter). This causes a few problems:

  1. Sorting with count parameter may return incorrect results - when stop processing after count iterations we may miss block with higher amount - that's the essence of this issue.
  2. Performance - iterating and sorting in the end is unnecessary when we can build sorted results at a time of iterating.
  3. Conditional logic - many ifs applied on each iteration.
  4. Code duplication across 3 above methods.

The solution that Colin suggested is based on strategy pattern, so instead of applying conditions on each iteration we define the behaviour in a separate class. In fact I created two levels of strategy - the behavioral one in which we define how we handle the blocks (do we sort them, do we apply count parameter, etc.) - this is result_strategy abstract class. It contains result_format_strategy - the abstract class in which implementations we define a func which describes what to return in result json - this is the second level.

So the RPC method implentations are pretty simple now:

  1. First, create proper result_strategy implementation object based on input parameters.
  • This involves creating result_format_strategy also based on input parameters.
  1. Iterate pending blocks:
  • Check with strategy object if iteration should be continued (depends on count parameter).
  • Check with strategy if block should be skipped according to offset parameter.
  • Apply strategy behaviour to the block.
  • In case of sorting_result_strategy its destructor (end of iterations) takes care of creating result from multimap. Other implementations work directly on result ptree.

This solution solves problems described above. Maybe the disadvantage is the number of additional classes to maintain? Maybe we should move them to separate namespace? Anyway I don't consider this as final PR, I would like to add more test cases but first I wanted to hear your opinion if this is a good direction at all. Hope it's understandable, let me know if you need further clarifications.

@pwojcikdev
Copy link
Contributor

pwojcikdev commented Dec 27, 2022

With the upcoming upgrade to C++20 this feature could be implemented using the new ranges library, it provides all the necessary tools for sorting, filtering and pagination without the need to reimplement our own. This will ease the burden of maintaining our own algorithms for that purpose.

@dsiganos
Copy link
Contributor

@JerzyStanislawski would you like to redo this using the C++20 ranges library as suggested by Piotr?
There is a C++20 branch in development right now:
https://github.com/clemahieu/nano-node/tree/cpp20

If yes, before, before you start coding, let's have a meeting to decide the architecture of the solution, so we are all on the same page.

@JerzyStanislawski
Copy link
Contributor Author

Yeah, happy to rework that. My initial understanding is that we should first filter pending transactions to get confirmed blocks for the account then apply other ranges functions according to input parameters (offset, sorting).
If this is correct I could redo one of the receivable methods and show you for review first, otherwise feel free to setup some meeting.

@dsiganos
Copy link
Contributor

Hi @JerzyStanislawski, the develop branch in now using C++20 and we can start using the ranges library of C++20 to implement this.
Do you have an account in our Nano discord, or some some chat account, where we can discuss this?

@JerzyStanislawski
Copy link
Contributor Author

The plan is to add 2 wrapper classes around pending_store to enable interaction with ranges. They must satisfy concepts required by ranges, something like:

class pending_store_iterator_wrapper
{
public:
    nano::pending_store_iterator_wrapper(nano::store_iterator<nano::pending_key, nano::pending_info> & store_iterator)
        store_iterator (store_iterator)
    {
    }

    nano::pending_store_iterator_wrapper & operator++ ()
    {
        ++store_iterator;
        return *this;
    }
// other methods/constructors/operators to satisfy input_iterator concept
private:
    nano::store_iterator<nano::pending_key, nano::pending_info> store_iterator;
}

class viewable_pending_store
{
public:
    nano::pending_store_iterator_wrapper begin() const;
    nano::pending_store_iterator_wrapper end() const;
// other methods/constructors/operators to satisfy viewable_range concept
private:
    nano::pending_store pending_store;
}

So viewable_pending_store.begin() will internally call pending_store.begin() and construct pending_store_iterator_wrapper from nano::store_iterator<nano::pending_key, nano::pending_info> returned by pending_store.

Once we have that we should be ready to use viewable_pending_store with ranges. We'll probably pipe operations according to input parameters:

  • filter with block_confirmed predicate (always)
  • drop for offset parameter
  • take for count parameter
  • sort if sort parameter
  • transform to construct output

I don't have strict plan on how to use ranges in receivable methods yet, first I'd like to code PoC with wrapper classes to ensure it works.

@pwojcikdev
Copy link
Contributor

pwojcikdev commented Jan 20, 2023

I wouldn't worry about making a dedicated wrapper for each store, we need to have the results in memory anyway for sorting and serialization. Simply creating a std::vector { store.begin(key), store.end() } should be fine in this case.

@pwojcikdev
Copy link
Contributor

pwojcikdev commented Jan 20, 2023

And if you want to adapt an existing store to work with ranges, maybe using the standard subrange class will do, like this auto view = std::ranges::subrange(store.begin(key), store.end())

@JerzyStanislawski
Copy link
Contributor Author

There is a lot of overhead with adapting so I'd be happy to avoid that, but can we afford that in terms of performance?
Correct me if I'm wrong but I think loading vector and then ranges operations would iterate all elements twice, while with adapting it would be once. Sorting is exception - with this it's always iterated twice.
I'm aware this gain is not big in comparison with loading to memory, but still... Please confirm.

@pwojcikdev
Copy link
Contributor

pwojcikdev commented Jan 20, 2023

Using the std::ranges::subrange should have the same behavior as your adapter class without writing any new code, I haven't tested it, but looking at the documentation it generates an adapter view without loading elements in memory. So I would go with this first. The overhead of iterating elements that are already in memory is minuscule in comparison to loading those from disk in the first place, so I definitely wouldn't worry about iterating the vector a second time either.

The rest of the code (filtering/sorting/transforming) should be the same regardless of the choice here, so it should be possible to easily profile those two approaches and see if there is any meaningful difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants