Skip to content
{{ message }}
This repository has been archived by the owner. It is now read-only.

# Tip Selection - memory efficient algorithm for computing comulative weights #558

Merged
merged 10 commits into from May 16, 2018
Merged

# Tip Selection - memory efficient algorithm for computing comulative weights#558

merged 10 commits into from May 16, 2018

## Conversation

### GalRogozinski commented Feb 26, 2018 • edited by iotasyncbot

 Context: The whitepaper describes a need to perform cumulative weight calculations on the transactions in order to perform the MCMC algorithm. Problem: There are two versions of the algorithm implemented in the code: In use - a time and memory efficient algorithm that performs a different calculation than the one described in the WP. Each transaction sums the weight of its direct ancestors and adds itself. This way indirect ancestors may be counted more than once. This may cause weight to increase exponentially. Solution: A space time efficient algorithm similar to algo (2). If you traverse the subtangle in topological order, you can dispose of sets you have used. See https://github.com/alongalky/iota-docs/blob/master/cumulative.md.
self-assigned this Feb 26, 2018
reviewed
 public class TipsManager { public static final int MAX_ANCESTORS_SIZE = 1000;

#### alongalky Feb 26, 2018

Where does this number come from? Why 1000?

#### GalRogozinski Feb 26, 2018 Author Member

This is the max size of the ancestor set. It is limited to reduce the chance of `OutOfMemory` exception.
@alon-e and I agreed on it.

Is it the correct number to use?
Frankly I don't know.
If @paulhandy or @th0br0 have something to say about this figure then I am all ears.

#### alongalky Feb 26, 2018

Is there a memory consumption estimation? This will tell us at least if we are in the ball park or being overly permissive or restrictive.
It could even be a back-of-the-envelope type calculation:
1000 transactions *
size of the subhash *
expected walk depth *
reasonable number of transactions between milestones (say with 100 tx/sec)

#### alongalky Feb 26, 2018 • edited

In addition, what happens when there the sets are full? It seems silly to keep allocating ancestor sets, you should just somehow pass "MAX" to save memory and time.

#### GalRogozinski Mar 6, 2018 Author Member

hmmm

Let say the size of a Subhash is 32 bytes. 1000 txs => ~32kb per set.
The total amount of memory 32kb * num_of_unreleased_sets.
This depends on how wide the tangle is...

#### GalRogozinski Mar 8, 2018 Author Member

Will check on testnet how often we hit the hard limit + make it configurable

 log.debug("Topological sort done. Start traversing on txs in order and calculate weight"); Map cumulativeWeights = calculateCwInOrder(txHashesToRate, myApprovedHashes, confirmLeftBehind, analyzedTips); log.debug("Cumulative weights calculation done in {} ms", System.currentTimeMillis() - start);

#### alongalky Feb 26, 2018

Is there a better way to measure times in Java? Maybe logs come with built in timestamps anyway? You probably want a profiling of every function call, not only the cumulative weight calculation

#### GalRogozinski Feb 26, 2018 Author Member

This is the simplest way.
Real profiling will be done with tools like JProfile or YourKit.

 Map> txToDirectApprovers = new HashMap<>(); stack.push(startTx); while (CollectionUtils.isNotEmpty(stack)) {

#### alongalky Feb 26, 2018

I think `!stack.isEmpty()` is cleaner, and more consistent with the rest of the calls

#### GalRogozinski Feb 26, 2018 • edited Author Member

The idea of `CollectionUtils` is to be defensive and not fall for `NullPointerException`.

You can say that in this specific code segment `stack` will never be `null`, and even if it can the above line (`stack.push()`) will throw an exception.

However, the reason I use it is purely out of habit, which I believe to be a good one.
Wherever it is fine to treat a `null` collection like an empty colllection one should use a null safe method so we will get less pesky null pointer exceptions.

 return cumulativeWeights; } private LinkedHashSet sortTransactionsInTopologicalOrder(Hash startTx) throws Exception {

#### alongalky Feb 26, 2018

What algorithm are you using? Can you add a comment like `// based on DFS algorithm, taken from: https://en.wikipedia.org/wiki/Topological_sorting#Depth-first_search`?

#### alongalky Feb 26, 2018

Also, maybe it makes sense to take this function out of the module so you can write unit tests for it

#### GalRogozinski Feb 26, 2018 Author Member

I can add a comment.

A unit test may be a good idea, but currently it is hard to write.
This is because Topological Order is not deterministic...

Thinking about this a little more now,
I can do a diamond shaped graph with 4 vertices. It will have only two possible orders and I can test whether one of them occurs.
If you have an idea of how to better test this please tell.

Another caveat (unrelated to your comment) to think about is the use of `LinkedHashMap`.
There was an earlier version of the code where the the `continue` in L#356 was erroneously omitted. The result was that there were multiple calls to `add`. In that early version `List` and `Set` were used instead of `LinkedHashMap`, so the bug was easily found. However if that `continue` will be deleted now that bug will not be so easy to see. The method will return the correct output but will be slower due to excessive `add` calls.

The advantage of using `LinkedHashMap` is that it will be more efficient on memory, than 2 objects. This method by the way is a bottle-neck memory-wise.

 Hash txHash = stack.peek(); if (!sortedTxs.contains(txHash)) { Collection appHashes = getTxDirectApproversHashes(txHash, txToDirectApprovers); if (CollectionUtils.isNotEmpty(appHashes)) {

#### alongalky Feb 26, 2018

`!appHashes.isEmpty()`

#### GalRogozinski Feb 26, 2018 Author Member

defensive programming

 transaction1.store(tangle); transaction2.store(tangle); transaction3.store(tangle); log.debug("printing transaction in diamond shape \n {} \n{} {}\n {}",

#### alongalky Feb 26, 2018

Do we need this log in a UT?

#### GalRogozinski Mar 6, 2018 Author Member

I think it is helpful to log. If others will agree with you I am fine with removing

 transaction4 = new TransactionViewModel(getRandomTransactionWithTrunkAndBranch(transaction2.getHash(), transaction3.getHash()), getRandomTransactionHash()); transaction1 = new TransactionViewModel(getRandomTransactionWithTrunkAndBranch( transaction.getHash(), transaction.getHash()), getRandomTransactionHash()); transaction2 = new TransactionViewModel(getRandomTransactionWithTrunkAndBranch(

#### alongalky Feb 26, 2018

Maybe it makes sense to factor these out to a function, something like:

``````generateTangle([[1, 0], [2, 1], [3,2])
``````

which returns a store with the three different edges and vertices. The code appears in every test and is hard to read, which makes it difficult to understand the scenario that is being tested

#### GalRogozinski Mar 6, 2018 Author Member

In some point in the future I will rewrite the test to not use random hashes (It is now this way because this is how it was before). Then I will also add this fix.

 Assert.assertEquals(ratings.get(transaction1.getHash()).size(),4); Assert.assertEquals(ratings.get(transaction2.getHash()).size(), 3); log.info(String.format("Linear ordered hashes from tip %.4s, %.4s, %.4s, %.4s, %.4s", transaction4.getHash(),

log in test?

#### GalRogozinski Feb 26, 2018 Author Member

I think it helps when the test fails

 } @Test public void updateRatings2TestWorks() throws Exception { TransactionViewModel transaction, transaction1, transaction2, transaction3, transaction4; public void testCalculateCumulativeWeightAlon() throws Exception {

#### alongalky Feb 26, 2018

Which Alon? :)
I think the name needs to say a bit more

#### GalRogozinski Feb 26, 2018 Author Member

It is a unit test with a tangle @alon-e made up to understand the PR better. I have no idea what name to give it, honestly.

Any name you can recommend will be fine.

#### alongalky Feb 26, 2018

Something to describe what it does.

 } //@Test // @Test

#### alongalky Feb 26, 2018

I recommend removing changes to this function, if it's only formatting and comments

reviewed
 //transition probability = ((Hx-Hy)^-3)/maxRating walkRatings[i] = Math.pow(tipRating - ratings.getOrDefault(tips[i],0L), -3); walkRatings[i] = Math.pow(tipRating - cumulativeWeights.getOrDefault(subHash,0), -3);

#### olaf-2 Mar 5, 2018

Performance could be improved by a factor of 20+. See #535

#### alongalky Mar 6, 2018

I recommend leaving this change out of this PR, it's big enough as it is. It's an unrelated issue.

#### GalRogozinski Mar 6, 2018 Author Member

We are supposed to switch to the formula described in the whitepaper (with the alpha). If it won't happen soon then we will merge your PR.

approved these changes

### paulhandy left a comment

 💯
``` create safeutils ```
``` 24602b0 ```
``` create BoundedHashSet ```
``` ceada96 ```
``` add apache-commons-collections4 ```
``` 22585aa ```
``` Transaction View Model Implements Equals and Hash Code ```
``` 2de9dba ```
``` Tip Selection - calculating cumulative weight while being memory effi… ```
``` b43446c ```
`…cient`
``` Tip Selection - unit tests for new weight algorithm ```
``` 76293f5 ```
``` Tip Selection - use subhashes to save memory ```
``` 83fdb60 ```
``` move getSubHash to IotaUtils ```
``` 884ca1e ```
``` remove unused import ```
``` b1e5958 ```
force-pushed the iotaledger:dev branch 2 times, most recently from `47d8c4a` to `00b7574` Apr 4, 2018
changed the title Tip Selection - memory efficient algorithm for computing comulative weights IRI-301 ⁃ Tip Selection - memory efficient algorithm for computing comulative weights Apr 17, 2018
changed the title IRI-301 ⁃ Tip Selection - memory efficient algorithm for computing comulative weights Tip Selection - memory efficient algorithm for computing comulative weights Apr 22, 2018
``` merge dev to cw-algo ```
``` d169028 ```
merged commit `086ec3e` into iotaledger:dev May 16, 2018
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
deleted the GalRogozinski:cw-algo branch May 16, 2018
to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

4 participants