Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Is the ThreadLocalRandom approach considerably safe on a clustered environment? #23

Closed
gustavodaquino opened this issue Jan 10, 2023 · 5 comments
Labels
question Further information is requested

Comments

@gustavodaquino
Copy link

gustavodaquino commented Jan 10, 2023

Hello, first of all I'd like to congratulate the maintainer for this wonderful project.
I'm wondering if, due an unavailability to provide a node identifier, the ThreadLocalRandom approach described on README is significantly safe to generate ids on a small cluster, < 10 nodes, assuring low probability of collision.

// use a random function that returns an array of bytes with a given length
TsidFactory factory = TsidFactory.builder()
    .withRandomFunction(length -> {
        final byte[] bytes = new byte[length];
        ThreadLocalRandom.current().nextBytes(bytes);
        return bytes;
    }).build();

// use the factory
Tsid tsid = factory.create();

My best regards.

@gustavodaquino gustavodaquino changed the title Question: Is the ThreadLocalRandom approach considerably safe in a clustered environment? Question: Is the ThreadLocalRandom approach considerably safe on a clustered environment? Jan 11, 2023
@fabiolimace
Copy link
Member

fabiolimace commented Jan 11, 2023

I don't consider it safe, because collisions can happen even if there are few nodes.

Actually, the problem is not with the random number generator itself (ThreadLocalRandom), but with the unavailability to provide unique node identifiers for each TSID generator.

Let's say today 10 machines are started in a cluster. Each TSID generator will get a random node ID. If all these TSID generators are given different random node numbers from each other, it can be said that no collision will occur. So it's safe for today.

But if those machines are restarted for some reason tomorrow, the node IDs will be randomized again. If at least two TSID generators receive the same node number, say 42, the IDs generated by those two TSID generators can collide during the day.

The class below has a test method that starts 10 threads, assigning a new TSID generator to each thread.

package com.example;

import static org.junit.Assert.assertNull;

import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentMap;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ThreadLocalRandom;
import java.util.logging.Logger;

import org.junit.Test;

import com.github.f4b6a3.tsid.TsidFactory;

public class ClashTest {

    static Logger LOGGER = Logger.getLogger(ClashTest.class.getName());

    private TsidFactory newFactory(int nodeBits) {
        return TsidFactory.builder().withRandomFunction(() -> ThreadLocalRandom.current().nextInt())
                .withNodeBits(nodeBits) // 8 bits: 256 nodes; 10 bits: 1024 nodes...
                .build();
    }

    @Test
    public void test() throws InterruptedException {

        int nodeBits = 8;
        int threadCount = 10;
        int iterationCount = 100_000;

        CountDownLatch endLatch = new CountDownLatch(threadCount);
        ConcurrentMap<Long, Integer> tsidMap = new ConcurrentHashMap<>();

        for (int i = 0; i < threadCount; i++) {

            final int threadId = i;

            // one generator PER THREAD
            TsidFactory factory = newFactory(nodeBits);

            new Thread(() -> {
                for (int j = 0; j < iterationCount; j++) {
                    Long tsid = factory.create().toLong();
                    assertNull("TSID clash detected!", tsidMap.put(tsid, (threadId * iterationCount) + j));
                }

                endLatch.countDown();
            }).start();
        }

        LOGGER.info("Started threads");
        endLatch.await();

        LOGGER.info("Done");
    }
}

The code was adapted from the code implemented by Vlad Mihalcea in one of his excellent articles. You can change the nodeBits, threadCount, iterationCount variables during the tests.

@fabiolimace fabiolimace added the question Further information is requested label Jan 11, 2023
@gustavodaquino
Copy link
Author

gustavodaquino commented Jan 12, 2023

Thanks for the quick reply, @fabiolimace.
I have another doubt about node ids. At Factory builder function, javadoc says the function withNode(Integer node) has a limit defined in 2^nodeBits-1.

Taking into consideration the limits of library, the max value allowed, considering a 20 bit length size is 1048575, but this value is not validated in runtime.

What happens to the Node Identifier used to generate the TSID if the parameter overflows this hard limit? Eg: 1048576, or Int.MAX_VALUE (2.147.483.647), equivalent to 32 bit length? These extra bits are trimmed?

Regards.

@fabiolimace
Copy link
Member

Yes, the most significant extra bits are silently trimmed.

Here is the line that strips off the extra bits: TsidFactory(Builder builder).

In your example, 1048576 becomes 0 because:

1048576 & (2^20-1) == 0

It's the same as mod(1048576, 2^20):

1048576 % 2^20 == 0

Should it throw an exception or should the javadoc be fixed? Or both?

My best regards.

@gustavodaquino
Copy link
Author

I think that a guard clause would avoid error prone by incorrect configurations and avoid sillently behavior to the client.

And there is a precedence, for example withNodeBits function that has it limit validation assuring the correct setup.

But one way or another it could be mentioned in README/JavaDoc.

Regards!!

fabiolimace added a commit that referenced this issue Jan 28, 2023
A guard clause would avoid error prone by incorrect configurations and
avoid sillently behavior to the client.
fabiolimace added a commit that referenced this issue Jan 28, 2023
A guard clause would avoid error prone by incorrect configurations and
avoid sillently behavior to the client.
@fabiolimace
Copy link
Member

Released v5.2.1. 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants