-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Is the ThreadLocalRandom approach considerably safe on a clustered environment? #23
Comments
I don't consider it safe, because collisions can happen even if there are few nodes. Actually, the problem is not with the random number generator itself (ThreadLocalRandom), but with the unavailability to provide unique node identifiers for each TSID generator. Let's say today 10 machines are started in a cluster. Each TSID generator will get a random node ID. If all these TSID generators are given different random node numbers from each other, it can be said that no collision will occur. So it's safe for today. But if those machines are restarted for some reason tomorrow, the node IDs will be randomized again. If at least two TSID generators receive the same node number, say 42, the IDs generated by those two TSID generators can collide during the day. The class below has a test method that starts 10 threads, assigning a new TSID generator to each thread. package com.example;
import static org.junit.Assert.assertNull;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ThreadLocalRandom;
import java.util.logging.Logger;
import org.junit.Test;
import com.github.f4b6a3.tsid.TsidFactory;
public class ClashTest {
static Logger LOGGER = Logger.getLogger(ClashTest.class.getName());
private TsidFactory newFactory(int nodeBits) {
return TsidFactory.builder().withRandomFunction(() -> ThreadLocalRandom.current().nextInt())
.withNodeBits(nodeBits) // 8 bits: 256 nodes; 10 bits: 1024 nodes...
.build();
}
@Test
public void test() throws InterruptedException {
int nodeBits = 8;
int threadCount = 10;
int iterationCount = 100_000;
CountDownLatch endLatch = new CountDownLatch(threadCount);
ConcurrentMap<Long, Integer> tsidMap = new ConcurrentHashMap<>();
for (int i = 0; i < threadCount; i++) {
final int threadId = i;
// one generator PER THREAD
TsidFactory factory = newFactory(nodeBits);
new Thread(() -> {
for (int j = 0; j < iterationCount; j++) {
Long tsid = factory.create().toLong();
assertNull("TSID clash detected!", tsidMap.put(tsid, (threadId * iterationCount) + j));
}
endLatch.countDown();
}).start();
}
LOGGER.info("Started threads");
endLatch.await();
LOGGER.info("Done");
}
} The code was adapted from the code implemented by Vlad Mihalcea in one of his excellent articles. You can change the |
Thanks for the quick reply, @fabiolimace. Taking into consideration the limits of library, the max value allowed, considering a 20 bit length size is 1048575, but this value is not validated in runtime. What happens to the Node Identifier used to generate the TSID if the parameter overflows this hard limit? Eg: Regards. |
Yes, the most significant extra bits are silently trimmed. Here is the line that strips off the extra bits: In your example, 1048576 & (2^20-1) == 0 It's the same as 1048576 % 2^20 == 0 Should it throw an exception or should the javadoc be fixed? Or both? My best regards. |
I think that a guard clause would avoid error prone by incorrect configurations and avoid sillently behavior to the client. And there is a precedence, for example withNodeBits function that has it limit validation assuring the correct setup. But one way or another it could be mentioned in README/JavaDoc. Regards!! |
A guard clause would avoid error prone by incorrect configurations and avoid sillently behavior to the client.
A guard clause would avoid error prone by incorrect configurations and avoid sillently behavior to the client.
Released v5.2.1. 🎉 |
Hello, first of all I'd like to congratulate the maintainer for this wonderful project.
I'm wondering if, due an unavailability to provide a node identifier, the
ThreadLocalRandom
approach described on README is significantly safe to generate ids on a small cluster, < 10 nodes, assuring low probability of collision.My best regards.
The text was updated successfully, but these errors were encountered: