Tip Selection - memory efficient algorithm for computing comulative weights #558

GalRogozinski · 2018-02-26T10:08:53Z

Context: The whitepaper describes a need to perform cumulative weight calculations on the transactions in order to perform the MCMC algorithm.

Problem: There are two versions of the algorithm implemented in the code:

In use - a time and memory efficient algorithm that performs a different calculation than the one described in the WP. Each transaction sums the weight of its direct ancestors and adds itself. This way indirect ancestors may be counted more than once. This may cause weight to increase exponentially.

Solution: A space time efficient algorithm similar to algo (2). If you traverse the subtangle in topological order, you can dispose of sets you have used. See https://github.com/alongalky/iota-docs/blob/master/cumulative.md.

alongalky · 2018-02-26T11:50:24Z

src/main/java/com/iota/iri/service/TipsManager.java


 public class TipsManager {

+    public static final int MAX_ANCESTORS_SIZE = 1000;


Where does this number come from? Why 1000?

This is the max size of the ancestor set. It is limited to reduce the chance of OutOfMemory exception.
@alon-e and I agreed on it.

Is it the correct number to use?
Frankly I don't know.
If @paulhandy or @th0br0 have something to say about this figure then I am all ears.

Is there a memory consumption estimation? This will tell us at least if we are in the ball park or being overly permissive or restrictive.
It could even be a back-of-the-envelope type calculation:
1000 transactions *
size of the subhash *
expected walk depth *
reasonable number of transactions between milestones (say with 100 tx/sec)

In addition, what happens when there the sets are full? It seems silly to keep allocating ancestor sets, you should just somehow pass "MAX" to save memory and time.

hmmm

Let say the size of a Subhash is 32 bytes. 1000 txs => ~32kb per set.
The total amount of memory 32kb * num_of_unreleased_sets.
This depends on how wide the tangle is...

Will check on testnet how often we hit the hard limit + make it configurable

alongalky · 2018-02-26T12:05:12Z

src/main/java/com/iota/iri/service/TipsManager.java

+        log.debug("Topological sort done. Start traversing on txs in order and calculate weight");
+        Map<Buffer, Integer> cumulativeWeights = calculateCwInOrder(txHashesToRate, myApprovedHashes, confirmLeftBehind,
+                analyzedTips);
+        log.debug("Cumulative weights calculation done in {} ms", System.currentTimeMillis() - start);


Is there a better way to measure times in Java? Maybe logs come with built in timestamps anyway? You probably want a profiling of every function call, not only the cumulative weight calculation

This is the simplest way.
Real profiling will be done with tools like JProfile or YourKit.

alongalky · 2018-02-26T12:07:52Z

src/main/java/com/iota/iri/service/TipsManager.java

+        Map<Hash, Collection<Hash>> txToDirectApprovers = new HashMap<>();
+
+        stack.push(startTx);
+        while (CollectionUtils.isNotEmpty(stack)) {


I think !stack.isEmpty() is cleaner, and more consistent with the rest of the calls

The idea of CollectionUtils is to be defensive and not fall for NullPointerException.

You can say that in this specific code segment stack will never be null, and even if it can the above line (stack.push()) will throw an exception.

However, the reason I use it is purely out of habit, which I believe to be a good one.
Wherever it is fine to treat a null collection like an empty colllection one should use a null safe method so we will get less pesky null pointer exceptions.

alongalky · 2018-02-26T12:11:11Z

src/main/java/com/iota/iri/service/TipsManager.java

+        return cumulativeWeights;
+    }
+
+    private LinkedHashSet<Hash>  sortTransactionsInTopologicalOrder(Hash startTx) throws Exception {


What algorithm are you using? Can you add a comment like // based on DFS algorithm, taken from: https://en.wikipedia.org/wiki/Topological_sorting#Depth-first_search?

Also, maybe it makes sense to take this function out of the module so you can write unit tests for it

I can add a comment.

A unit test may be a good idea, but currently it is hard to write.
This is because Topological Order is not deterministic...

Thinking about this a little more now,
I can do a diamond shaped graph with 4 vertices. It will have only two possible orders and I can test whether one of them occurs.
If you have an idea of how to better test this please tell.

Another caveat (unrelated to your comment) to think about is the use of LinkedHashMap.
There was an earlier version of the code where the the continue in L#356 was erroneously omitted. The result was that there were multiple calls to add. In that early version List and Set were used instead of LinkedHashMap, so the bug was easily found. However if that continue will be deleted now that bug will not be so easy to see. The method will return the correct output but will be slower due to excessive add calls.

The advantage of using LinkedHashMap is that it will be more efficient on memory, than 2 objects. This method by the way is a bottle-neck memory-wise.

alongalky · 2018-02-26T12:21:08Z

src/main/java/com/iota/iri/service/TipsManager.java

+            Hash txHash = stack.peek();
+            if (!sortedTxs.contains(txHash)) {
+                Collection<Hash> appHashes = getTxDirectApproversHashes(txHash, txToDirectApprovers);
+                if (CollectionUtils.isNotEmpty(appHashes)) {


!appHashes.isEmpty()

defensive programming

alongalky · 2018-02-26T12:30:42Z

src/test/java/com/iota/iri/service/TipsManagerTest.java

+        transaction1.store(tangle);
+        transaction2.store(tangle);
+        transaction3.store(tangle);
+        log.debug("printing transaction in diamond shape \n                      {} \n{}  {}\n                      {}",


Do we need this log in a UT?

I think it is helpful to log. If others will agree with you I am fine with removing

alongalky · 2018-02-26T12:36:23Z

src/test/java/com/iota/iri/service/TipsManagerTest.java

-        transaction4 = new TransactionViewModel(getRandomTransactionWithTrunkAndBranch(transaction2.getHash(), transaction3.getHash()), getRandomTransactionHash());
+        transaction1 = new TransactionViewModel(getRandomTransactionWithTrunkAndBranch(
+                transaction.getHash(), transaction.getHash()), getRandomTransactionHash());
+        transaction2 = new TransactionViewModel(getRandomTransactionWithTrunkAndBranch(


Maybe it makes sense to factor these out to a function, something like:

generateTangle([[1, 0], [2, 1], [3,2])

which returns a store with the three different edges and vertices. The code appears in every test and is hard to read, which makes it difficult to understand the scenario that is being tested

In some point in the future I will rewrite the test to not use random hashes (It is now this way because this is how it was before). Then I will also add this fix.

alongalky · 2018-02-26T12:36:34Z

src/test/java/com/iota/iri/service/TipsManagerTest.java

-        Assert.assertEquals(ratings.get(transaction1.getHash()).size(),4);
-        Assert.assertEquals(ratings.get(transaction2.getHash()).size(), 3);
+
+        log.info(String.format("Linear ordered hashes from tip %.4s, %.4s, %.4s, %.4s, %.4s", transaction4.getHash(),


log in test?

I think it helps when the test fails

alongalky · 2018-02-26T12:37:02Z

src/test/java/com/iota/iri/service/TipsManagerTest.java

    }

    @Test
-    public void updateRatings2TestWorks() throws Exception {
-        TransactionViewModel transaction, transaction1, transaction2, transaction3, transaction4;
+    public void testCalculateCumulativeWeightAlon() throws Exception {


Which Alon? :)
I think the name needs to say a bit more

It is a unit test with a tangle @alon-e made up to understand the PR better. I have no idea what name to give it, honestly.

Any name you can recommend will be fine.

Something to describe what it does.

alongalky · 2018-02-26T12:54:00Z

src/test/java/com/iota/iri/service/TipsManagerTest.java

    }

-    //@Test
+    //    @Test


I recommend removing changes to this function, if it's only formatting and comments

olaf-2 · 2018-03-05T18:54:11Z

src/main/java/com/iota/iri/service/TipsManager.java

                    //transition probability = ((Hx-Hy)^-3)/maxRating
-                    walkRatings[i] = Math.pow(tipRating - ratings.getOrDefault(tips[i],0L), -3);
+                    walkRatings[i] = Math.pow(tipRating - cumulativeWeights.getOrDefault(subHash,0), -3);


Performance could be improved by a factor of 20+. See #535

I recommend leaving this change out of this PR, it's big enough as it is. It's an unrelated issue.

We are supposed to switch to the formula described in the whitepaper (with the alpha). If it won't happen soon then we will merge your PR.

paulhandy

💯

…cient

GalRogozinski added T-Enhancement C-Memory labels Feb 26, 2018

GalRogozinski self-assigned this Feb 26, 2018

GalRogozinski requested review from th0br0, paulhandy, alongalky and alon-e February 26, 2018 10:08

alongalky reviewed Feb 26, 2018

View reviewed changes

olaf-2 reviewed Mar 5, 2018

View reviewed changes

paulhandy approved these changes Mar 10, 2018

View reviewed changes

GalRogozinski added P1 - Critical Priority - Do this as soon as you can S2 - High High Severity labels Mar 19, 2018

galrogo added 9 commits March 19, 2018 18:56

create safeutils

24602b0

create BoundedHashSet

ceada96

add apache-commons-collections4

22585aa

Transaction View Model Implements Equals and Hash Code

2de9dba

Tip Selection - calculating cumulative weight while being memory effi…

b43446c

…cient

Tip Selection - unit tests for new weight algorithm

76293f5

Tip Selection - use subhashes to save memory

83fdb60

move getSubHash to IotaUtils

884ca1e

remove unused import

b1e5958

GalRogozinski force-pushed the cw-algo branch from 63a85da to b1e5958 Compare March 19, 2018 17:02

GalRogozinski force-pushed the cw-algo branch from 18e143d to b1e5958 Compare March 29, 2018 06:30

GalRogozinski force-pushed the dev branch 2 times, most recently from 47d8c4a to 00b7574 Compare April 4, 2018 15:53

iotasyncbot changed the title ~~Tip Selection - memory efficient algorithm for computing comulative weights~~ IRI-301 ⁃ Tip Selection - memory efficient algorithm for computing comulative weights Apr 17, 2018

anyong changed the title ~~IRI-301 ⁃ Tip Selection - memory efficient algorithm for computing comulative weights~~ Tip Selection - memory efficient algorithm for computing comulative weights Apr 22, 2018

merge dev to cw-algo

d169028

GalRogozinski force-pushed the cw-algo branch from 444afa2 to d169028 Compare May 16, 2018 18:13

GalRogozinski merged commit 086ec3e into iotaledger:dev May 16, 2018

GalRogozinski deleted the cw-algo branch May 16, 2018 18:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tip Selection - memory efficient algorithm for computing comulative weights #558

Tip Selection - memory efficient algorithm for computing comulative weights #558

GalRogozinski commented Feb 26, 2018 •

edited by iotasyncbot

alongalky Feb 26, 2018

GalRogozinski Feb 26, 2018

alongalky Feb 26, 2018

alongalky Feb 26, 2018 •

edited

GalRogozinski Mar 6, 2018

GalRogozinski Mar 8, 2018

alongalky Feb 26, 2018

GalRogozinski Feb 26, 2018

alongalky Feb 26, 2018

GalRogozinski Feb 26, 2018 •

edited

alongalky Feb 26, 2018

alongalky Feb 26, 2018

GalRogozinski Feb 26, 2018

alongalky Feb 26, 2018

GalRogozinski Feb 26, 2018

alongalky Feb 26, 2018

GalRogozinski Mar 6, 2018

alongalky Feb 26, 2018

GalRogozinski Mar 6, 2018

alongalky Feb 26, 2018

GalRogozinski Feb 26, 2018

alongalky Feb 26, 2018

GalRogozinski Feb 26, 2018

alongalky Feb 26, 2018

alongalky Feb 26, 2018

olaf-2 Mar 5, 2018

alongalky Mar 6, 2018

GalRogozinski Mar 6, 2018

paulhandy left a comment


		public class TipsManager {

		public static final int MAX_ANCESTORS_SIZE = 1000;

Tip Selection - memory efficient algorithm for computing comulative weights #558

Tip Selection - memory efficient algorithm for computing comulative weights #558

Conversation

GalRogozinski commented Feb 26, 2018 • edited by iotasyncbot

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alongalky Feb 26, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GalRogozinski Feb 26, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paulhandy left a comment

Choose a reason for hiding this comment

GalRogozinski commented Feb 26, 2018 •

edited by iotasyncbot

alongalky Feb 26, 2018 •

edited

GalRogozinski Feb 26, 2018 •

edited