Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TreeMap impl based on AVL-tree #154

Merged
merged 57 commits into from
Jun 2, 2020
Merged

TreeMap impl based on AVL-tree #154

merged 57 commits into from
Jun 2, 2020

Conversation

sergey-melnychuk
Copy link
Contributor

@sergey-melnychuk sergey-melnychuk commented May 16, 2020

TreeMap based on AVL-tree

Method Complexity (worst case)
get/contains_key O(1) // UnorderedMap lookup
insert/remove O(log(N))
min/max O(log(N))
lower/higher O(log(N))
range of K elements O(Klog(N))

Closes #146

@sergey-melnychuk
Copy link
Contributor Author

sergey-melnychuk commented May 17, 2020

@nearmax , please check out the draft (please note it is a draft: I'll catch up with all TODOs) - does it look like what's expected?

Heap incapsulates indices, thus HeapMap only maps keys to values, letting Heap do all housekeeping of min-heap. In final version insert & remove will be O(log(N)), and iterator will be O(Nlog(N)), as promised.

Half of CI checks fail (for env: NEAR_RELEASE=false) with error HostError(GasLimitExceeded), am I doing something wrong? :)

UPDATE: I also did some refactorings/cleanups (like function extraction to avoid code duplication), hope it is OK.

@MaksymZavershynskyi
Copy link
Contributor

Hey @sergey-melnychuk ! Thank you for working on it. Sorry for not reviewing it yesterday, I will review it today.

@sergey-melnychuk sergey-melnychuk marked this pull request as ready for review May 18, 2020 18:35
@sergey-melnychuk sergey-melnychuk changed the title [WIP] SortedMap based on Heap HeapMap impl May 18, 2020
@sergey-melnychuk
Copy link
Contributor Author

@nearmax , no problem, PR is ready for review now.

Copy link
Contributor

@MaksymZavershynskyi MaksymZavershynskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the comments before merging.

near-sdk/src/collections/heap_map.rs Outdated Show resolved Hide resolved
near-sdk/src/collections/heap_map.rs Outdated Show resolved Hide resolved
near-sdk/src/collections/heap_map.rs Outdated Show resolved Hide resolved
near-sdk/src/collections/heap.rs Outdated Show resolved Hide resolved
near-sdk/src/collections/heap.rs Outdated Show resolved Hide resolved
near-sdk/src/collections/heap.rs Outdated Show resolved Hide resolved
near-sdk/src/collections/heap_map.rs Outdated Show resolved Hide resolved
Copy link

@SkidanovAlex SkidanovAlex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I didn't initially realize that this is a PR for an accepted gitcoin proposal. Editing out parts of the comment which are irrelevant in the context.

What is the rationale behind using the heap?

If we are OK sorting each time we call to iter (which we are likely not), why not just keep a map and a vector, and sort it using whatever sorting algorithm whenever one calls to iter?

Depending on the API that we want to expose, the ordered map generally needs to allow one or both of the following:

  1. Iterate over the first (few) elements in the sorted order. The current implementation is not usable for that, because such an iterator is expected to work in O(k log n) to fetch k elements, not O(n log n)
  2. Iterate over the (few) elements after some key (lower_bound or range in most languages in which sorted dicts are implemented). Again, the current implementation doesn't support it.

If we want to support either of the scenarios above (and I'm sure we do want to support (1)), we should implement an AVL or a RedBlack tree or something along the lines.

If we actually just want to support iterating that sorts the entire collection we should just have a vector and a map, and sort each time we call to iter. It will be significantly simpler and cleaner than the Heap.

Using a heap is certainly not the right tool for this problem? Unless I'm missing something :)

@MaksymZavershynskyi
Copy link
Contributor

MaksymZavershynskyi commented May 19, 2020

I made a mistake when reviewing gitcoin proposal. I did remember the requirements from the partners, but I forgot, that it is not possible to iterate heap while sorting it (in log N per step) -- which is how I interpreted the proposal initially.

Since there is a value in Heap and HeapMap, for the cases when there are order books, but not when entire order book needs to be retrieved, I suggest we keep this code and payout the bounty since it was my fault. But then my next bounty would be to implement AVL, RB tree or BTree.

@MaksymZavershynskyi
Copy link
Contributor

@sergey-melnychuk

Half of CI checks fail (for env: NEAR_RELEASE=false) with error HostError(GasLimitExceeded), am I doing something wrong? :)

Your code is fine. We broke our CI recently: #156

Copy link
Contributor

@MaksymZavershynskyi MaksymZavershynskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undoing the approval, since @SkidanovAlex found a bug when the same key is inserted twice.

@SkidanovAlex
Copy link

Heap::insert doesn't check if the key is already present, and HeapMap::insert doesn't do this check either. Thus, if the same key is inserted twice, it will be twice in the elements and once in the indices. If it is then removed, it will be once in elements and zero times in indices. At this point looking up by key will report the key missing, but iterating will report it present, i.e. the map will be inconsistent.

@MaksymZavershynskyi
Copy link
Contributor

@sergey-melnychuk We will be happy to pay out the bounty, given your code matches your proposal that we accepted. Unfortunately, we haven't realized at the time of acceptance that the heap is not a suitable data structure for the ordered map, and will not be merging the implementation in, seeking further to have a sorted map implementation.
Alternatively, we will also be happy to pay extra $250 ($750 total) if you want to implement the proper sorted map, based on AVL, RedBlack, or some other sort of self-balancing trees.
Also note that your Heap implementation was not covering a rather trivial corner case. Should you decide to implement the balanced tree, we will expect higher attention to details

@sergey-melnychuk
Copy link
Contributor Author

sergey-melnychuk commented May 19, 2020

@nearmax , @SkidanovAlex , thank you for the review, I will look into it today.

This is the initial approach, prototype, so work only starts here, not ends :) I was trying to prototype quickly - to get feedback and start for next iteration(s).

To summarize:

  • double-insertion bug - my bad, thanks for pointing out, I will fix it today (fix/test must be simple).
  • requirements: now desired features are more or less defined, so I can address them:
    • O(Klog(N)) for first K keys: I believe it is possible, heap seems like a good fit.
    • floor/ceil queries - unexpected in current impl, thus O(Nlog(N)).
  • O(Nlog(N)) for sorted view without the heap is totally valid point, any array/vector would do.

Next impl must fit these:

  • min/max queries in O(log(N)) worst case
  • floor/ceil (find closes key above/below) queries in O(log(N)) worst case
  • insert/remove/lookup in O(log(N)) worst case
  • iterate keys in sorted order in O(Nlog(N)) worst case

IMHO AVL is the best fit in performance/complexity tradeoff. Let me come up with a prototype of AVL based on Maps.

@MaksymZavershynskyi
Copy link
Contributor

@sergey-melnychuk

IMHO AVL is the best fit in performance/complexity tradeoff. Let me come up with a prototype of AVL based on Maps.

Thank you, AVL would be the best. And sorry for the mess up.

@sergey-melnychuk
Copy link
Contributor Author

@nearmax , nothing to be sorry about - it is much easier to clarify requirements in front of a prototype, than out of thin air :)

Added TreeMap skeleton, I believe in a few days I will make it ready for review. It looks promising, most operations seem to fit O(log(N)).

@sergey-melnychuk sergey-melnychuk changed the title HeapMap impl TreeMap impl based on AVL-tree May 20, 2020
@sergey-melnychuk sergey-melnychuk changed the title TreeMap impl based on AVL-tree [WIP] TreeMap impl based on AVL-tree May 21, 2020
@MaksymZavershynskyi
Copy link
Contributor

MaksymZavershynskyi commented May 29, 2020

@mikhailOK Is this PR good to go, can we merge it?

near-sdk/src/collections/tree_map.rs Outdated Show resolved Hide resolved
near-sdk/src/collections/tree_map.rs Outdated Show resolved Hide resolved
near-sdk/src/collections/tree_map.rs Outdated Show resolved Hide resolved
near-sdk/src/collections/tree_map.rs Outdated Show resolved Hide resolved
@MaksymZavershynskyi
Copy link
Contributor

@SkidanovAlex @mikhailOK LGTMed this PR. Is there anything from your side you want to comment on?

@SkidanovAlex
Copy link

The AVL tree looks good to me with all the new tests.

Can we remove Heap and the HeapMap? AVL tree is a strict superset of functionality.

Let's also get LGTM from Mikhail.

@MaksymZavershynskyi
Copy link
Contributor

The AVL tree looks good to me with all the new tests.

Can we remove Heap and the HeapMap? AVL tree is a strict superset of functionality.

Let's also get LGTM from Mikhail.

AFAIU heap has smaller constant in O(log N) than AVL, so it is beneficial to keep around. @mikhailOK already LGTMed, see above.

@mikhailOK
Copy link
Contributor

Heap is kind of weird right now, it's a priority queue but that also doesn't allow duplicate values. Make sense to remove it or make private until we decide what we want from it.

@MaksymZavershynskyi
Copy link
Contributor

@sergey-melnychuk Could you remove the heap? After that we will merge and I will release the bounty.

@sergey-melnychuk
Copy link
Contributor Author

@nearmax , heap is removed - I'll submit separate PR with priority queue, then you can decide if it is useful for you.

@MaksymZavershynskyi
Copy link
Contributor

@SkidanovAlex could you change your review to "approved" so that I don't force merge?

@MaksymZavershynskyi
Copy link
Contributor

@sergey-melnychuk Thank you very much for spending your time on implementing this collection together with us! I have paid out the bounty, please let us know if there any issues with payments. We are very satisfied with the result and hoping for the future collaboration together with you.

@sergey-melnychuk
Copy link
Contributor Author

@nearmax , Thank you, nice to know! Please feel free to ping me directly on GitHub if you need my help.

As promised, submitting a PR with Priority Queue: #170

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement ordered map for near sdk
4 participants