New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tip Selection - memory efficient algorithm for computing comulative weights #558

Merged
merged 10 commits into from May 16, 2018

Conversation

@GalRogozinski
Member

GalRogozinski commented Feb 26, 2018

Context: The whitepaper describes a need to perform cumulative weight calculations on the transactions in order to perform the MCMC algorithm.

Problem: There are two versions of the algorithm implemented in the code:

In use - a time and memory efficient algorithm that performs a different calculation than the one described in the WP. Each transaction sums the weight of its direct ancestors and adds itself. This way indirect ancestors may be counted more than once. This may cause weight to increase exponentially.

Solution: A space time efficient algorithm similar to algo (2). If you traverse the subtangle in topological order, you can dispose of sets you have used. See https://github.com/alongalky/iota-docs/blob/master/cumulative.md.

public class TipsManager {
public static final int MAX_ANCESTORS_SIZE = 1000;

This comment has been minimized.

@alongalky

alongalky Feb 26, 2018

Where does this number come from? Why 1000?

This comment has been minimized.

@GalRogozinski

GalRogozinski Feb 26, 2018

Member

This is the max size of the ancestor set. It is limited to reduce the chance of OutOfMemory exception.
@alon-e and I agreed on it.

Is it the correct number to use?
Frankly I don't know.
If @paulhandy or @th0br0 have something to say about this figure then I am all ears.

This comment has been minimized.

@alongalky

alongalky Feb 26, 2018

Is there a memory consumption estimation? This will tell us at least if we are in the ball park or being overly permissive or restrictive.
It could even be a back-of-the-envelope type calculation:
1000 transactions *
size of the subhash *
expected walk depth *
reasonable number of transactions between milestones (say with 100 tx/sec)

This comment has been minimized.

@alongalky

alongalky Feb 26, 2018

In addition, what happens when there the sets are full? It seems silly to keep allocating ancestor sets, you should just somehow pass "MAX" to save memory and time.

This comment has been minimized.

@GalRogozinski

GalRogozinski Mar 6, 2018

Member

hmmm

Let say the size of a Subhash is 32 bytes. 1000 txs => ~32kb per set.
The total amount of memory 32kb * num_of_unreleased_sets.
This depends on how wide the tangle is...

This comment has been minimized.

@GalRogozinski

GalRogozinski Mar 8, 2018

Member

Will check on testnet how often we hit the hard limit + make it configurable

log.debug("Topological sort done. Start traversing on txs in order and calculate weight");
Map<Buffer, Integer> cumulativeWeights = calculateCwInOrder(txHashesToRate, myApprovedHashes, confirmLeftBehind,
analyzedTips);
log.debug("Cumulative weights calculation done in {} ms", System.currentTimeMillis() - start);

This comment has been minimized.

@alongalky

alongalky Feb 26, 2018

Is there a better way to measure times in Java? Maybe logs come with built in timestamps anyway? You probably want a profiling of every function call, not only the cumulative weight calculation

This comment has been minimized.

@GalRogozinski

GalRogozinski Feb 26, 2018

Member

This is the simplest way.
Real profiling will be done with tools like JProfile or YourKit.

Map<Hash, Collection<Hash>> txToDirectApprovers = new HashMap<>();
stack.push(startTx);
while (CollectionUtils.isNotEmpty(stack)) {

This comment has been minimized.

@alongalky

alongalky Feb 26, 2018

I think !stack.isEmpty() is cleaner, and more consistent with the rest of the calls

This comment has been minimized.

@GalRogozinski

GalRogozinski Feb 26, 2018

Member

The idea of CollectionUtils is to be defensive and not fall for NullPointerException.

You can say that in this specific code segment stack will never be null, and even if it can the above line (stack.push()) will throw an exception.

However, the reason I use it is purely out of habit, which I believe to be a good one.
Wherever it is fine to treat a null collection like an empty colllection one should use a null safe method so we will get less pesky null pointer exceptions.

return cumulativeWeights;
}
private LinkedHashSet<Hash> sortTransactionsInTopologicalOrder(Hash startTx) throws Exception {

This comment has been minimized.

@alongalky

alongalky Feb 26, 2018

What algorithm are you using? Can you add a comment like // based on DFS algorithm, taken from: https://en.wikipedia.org/wiki/Topological_sorting#Depth-first_search?

This comment has been minimized.

@alongalky

alongalky Feb 26, 2018

Also, maybe it makes sense to take this function out of the module so you can write unit tests for it

This comment has been minimized.

@GalRogozinski

GalRogozinski Feb 26, 2018

Member

I can add a comment.

A unit test may be a good idea, but currently it is hard to write.
This is because Topological Order is not deterministic...

Thinking about this a little more now,
I can do a diamond shaped graph with 4 vertices. It will have only two possible orders and I can test whether one of them occurs.
If you have an idea of how to better test this please tell.

Another caveat (unrelated to your comment) to think about is the use of LinkedHashMap.
There was an earlier version of the code where the the continue in L#356 was erroneously omitted. The result was that there were multiple calls to add. In that early version List and Set were used instead of LinkedHashMap, so the bug was easily found. However if that continue will be deleted now that bug will not be so easy to see. The method will return the correct output but will be slower due to excessive add calls.

The advantage of using LinkedHashMap is that it will be more efficient on memory, than 2 objects. This method by the way is a bottle-neck memory-wise.

Hash txHash = stack.peek();
if (!sortedTxs.contains(txHash)) {
Collection<Hash> appHashes = getTxDirectApproversHashes(txHash, txToDirectApprovers);
if (CollectionUtils.isNotEmpty(appHashes)) {

This comment has been minimized.

@alongalky

alongalky Feb 26, 2018

!appHashes.isEmpty()

This comment has been minimized.

@GalRogozinski

GalRogozinski Feb 26, 2018

Member

defensive programming

transaction1.store(tangle);
transaction2.store(tangle);
transaction3.store(tangle);
log.debug("printing transaction in diamond shape \n {} \n{} {}\n {}",

This comment has been minimized.

@alongalky

alongalky Feb 26, 2018

Do we need this log in a UT?

This comment has been minimized.

@GalRogozinski

GalRogozinski Mar 6, 2018

Member

I think it is helpful to log. If others will agree with you I am fine with removing

transaction4 = new TransactionViewModel(getRandomTransactionWithTrunkAndBranch(transaction2.getHash(), transaction3.getHash()), getRandomTransactionHash());
transaction1 = new TransactionViewModel(getRandomTransactionWithTrunkAndBranch(
transaction.getHash(), transaction.getHash()), getRandomTransactionHash());
transaction2 = new TransactionViewModel(getRandomTransactionWithTrunkAndBranch(

This comment has been minimized.

@alongalky

alongalky Feb 26, 2018

Maybe it makes sense to factor these out to a function, something like:

generateTangle([[1, 0], [2, 1], [3,2])

which returns a store with the three different edges and vertices. The code appears in every test and is hard to read, which makes it difficult to understand the scenario that is being tested

This comment has been minimized.

@GalRogozinski

GalRogozinski Mar 6, 2018

Member

In some point in the future I will rewrite the test to not use random hashes (It is now this way because this is how it was before). Then I will also add this fix.

Assert.assertEquals(ratings.get(transaction1.getHash()).size(),4);
Assert.assertEquals(ratings.get(transaction2.getHash()).size(), 3);
log.info(String.format("Linear ordered hashes from tip %.4s, %.4s, %.4s, %.4s, %.4s", transaction4.getHash(),

This comment has been minimized.

@alongalky

This comment has been minimized.

@GalRogozinski

GalRogozinski Feb 26, 2018

Member

I think it helps when the test fails

}
@Test
public void updateRatings2TestWorks() throws Exception {
TransactionViewModel transaction, transaction1, transaction2, transaction3, transaction4;
public void testCalculateCumulativeWeightAlon() throws Exception {

This comment has been minimized.

@alongalky

alongalky Feb 26, 2018

Which Alon? :)
I think the name needs to say a bit more

This comment has been minimized.

@GalRogozinski

GalRogozinski Feb 26, 2018

Member

It is a unit test with a tangle @alon-e made up to understand the PR better. I have no idea what name to give it, honestly.

Any name you can recommend will be fine.

This comment has been minimized.

@alongalky

alongalky Feb 26, 2018

Something to describe what it does.

}
//@Test
// @Test

This comment has been minimized.

@alongalky

alongalky Feb 26, 2018

I recommend removing changes to this function, if it's only formatting and comments

//transition probability = ((Hx-Hy)^-3)/maxRating
walkRatings[i] = Math.pow(tipRating - ratings.getOrDefault(tips[i],0L), -3);
walkRatings[i] = Math.pow(tipRating - cumulativeWeights.getOrDefault(subHash,0), -3);

This comment has been minimized.

@olaf-2

olaf-2 Mar 5, 2018

Performance could be improved by a factor of 20+. See #535

This comment has been minimized.

@alongalky

alongalky Mar 6, 2018

I recommend leaving this change out of this PR, it's big enough as it is. It's an unrelated issue.

This comment has been minimized.

@GalRogozinski

GalRogozinski Mar 6, 2018

Member

We are supposed to switch to the formula described in the whitepaper (with the alpha). If it won't happen soon then we will merge your PR.

@paulhandy

💯

@iotasyncbot iotasyncbot changed the title from Tip Selection - memory efficient algorithm for computing comulative weights to IRI-301 ⁃ Tip Selection - memory efficient algorithm for computing comulative weights Apr 17, 2018

@anyong anyong changed the title from IRI-301 ⁃ Tip Selection - memory efficient algorithm for computing comulative weights to Tip Selection - memory efficient algorithm for computing comulative weights Apr 22, 2018

@@ -7,6 +7,7 @@
import com.iota.iri.utils.Converter;
import java.io.Serializable;
import java.nio.ByteBuffer;

This comment has been minimized.

@Test
public void testCalculateCumulativeWeightDiamond() throws Exception {
TransactionViewModel transaction, transaction1, transaction2, transaction3;

This comment has been minimized.

* @param maxSize the max size
* @throws NullPointerException if the specified collection is null
*/
public BoundedHashSet(Collection<? extends E> c, int maxSize) {

This comment has been minimized.

public void updateRatings2TestWorks() throws Exception {
TransactionViewModel transaction, transaction1, transaction2, transaction3, transaction4;
public void testCalculateCumulativeWeightAlon() throws Exception {
TransactionViewModel transaction, transaction1, transaction2, transaction3, transaction4, transaction5,

This comment has been minimized.

}
@Override
public boolean addAll(Collection<? extends E> c) {

This comment has been minimized.

@GalRogozinski GalRogozinski merged commit 086ec3e into iotaledger:dev May 16, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@GalRogozinski GalRogozinski deleted the GalRogozinski:cw-algo branch May 16, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment