Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch auto-generated IDs to Flake IDs from random UUIDs #7531

Closed
wants to merge 8 commits into from

Conversation

Projects
None yet
5 participants
@mikemccand
Copy link
Contributor

commented Sep 1, 2014

Flake IDs give better lookup performance in Lucene since they share
predictable prefixes (timestamp).

Closes #5941

This PR starts from @GaelTadh's original PR (#6004) and just folds in the last round of feedback ... I think it's ready?

GaelTadh and others added some commits Apr 28, 2014

Use a timestamp/macaddress based UUID
Using SecureRandom as a UUID generator is slow and doesn't allow us
to take adavantage of some lucene optimizations around ids with common
prefixes. This commit will allow us to use a
timestamp64bit-macAddr-counter UUID. Since the macAddr may be shared
among several nodes running on the same hardware we use an xor of the
macaddr with a SecureRandom number generated on startup.

See #5941
Use a timestamp/macaddres/sequence number as the uuid for objects wit…
…hout an incoming id.

Wire up the timestampUUID generator to indexing.

See #5941
Add time based UUIDs.
Incorporate some of the changes from @kimchy and @s1monw.
Move the UUID generators into their own classes and provide a common
interface as a first step to moving them under a singleton. Use a better
method of getting the mac address and fall back to a secure random address
if it fails. Add tests to test conccurency and shared prefix integrity of
UUIDs. Use PaddedAtomicLongs to hold the sequence number and lasttime.
Check to see if a time slip has occured as described by @s1monw in a CAS loop.
Next step is to move the impls under a singleton.

See #5941
Support for timebased flakelike uuids.
Reduce number of time bytes to 6 reducing total number of bytes to 20.
Validate that we have a mac address that contains data to avoid getting
addresses that are just 00:00:00:00:00:00 which can happen on virtualized
machines. Remove use of ByteBuffer on puts to reduce overhead. Add code to
attempt to prevent time slips.

See #5941
Remove unneeded variable.
Simplify mac address validation routing and remove unneed variable.
/** These are essentially flake ids (http://boundary.com/blog/2012/01/12/flake-a-decentralized-k-ordered-unique-id-generator-in-erlang) but
* we use 6 (not 8) bytes for timestamp, and use 3 (not 2) bytes for sequence number. */

class TimeBasedUUID implements UUIDGenerator {

This comment has been minimized.

Copy link
@pickypg

pickypg Sep 1, 2014

Member

Odd that it does not end in Generator, which makes it seem like a generated UUID (same for RandomBasedUUID).

return RANDOM_UUID_GENERATOR.getBase64UUID(random);
}

public static String base64UUID() {

This comment has been minimized.

Copy link
@pickypg

pickypg Sep 1, 2014

Member

Because people are not meant to depend on the underlying implementation, what should make someone choose randomBase64UUID versus base64UUID? I feel like the answer should be documented somewhere (potentially as JavaDoc since it's a utility).

@pickypg

This comment has been minimized.

Copy link
Member

commented Sep 1, 2014

@mikemccand LGTM. Just minor fluff.

@mikemccand mikemccand added the review label Sep 1, 2014

@mikemccand

This comment has been minimized.

Copy link
Contributor Author

commented Sep 1, 2014

Thanks @pickpg I pushed a new commit...


public class MacAddressProvider {

private static final ESLogger logger = Loggers.getLogger("MacAddressProvider");

This comment has been minimized.

Copy link
@jpountz

jpountz Sep 2, 2014

Contributor

Should it be Loggers.getLogger(MacAddressProvider.class) for consistency with other classes?


private static final ESLogger logger = Loggers.getLogger("MacAddressProvider");

public static byte[] getMacAddress() throws SocketException {

This comment has been minimized.

Copy link
@jpountz

jpountz Sep 2, 2014

Contributor

Can it be made private?

if (!nint.isLoopback()) {
//Pick the first non loopback address we find
byte[] address = nint.getHardwareAddress();
if (address != null && address.length == 6) {

This comment has been minimized.

Copy link
@jpountz

jpountz Sep 2, 2014

Contributor

maybe call isValidAddress instead so that we don't end up picking an address here that is rejected later on?


public static byte[] getMacAddress() throws SocketException {
Enumeration<NetworkInterface> en = NetworkInterface.getNetworkInterfaces();
while (en.hasMoreElements()) {

This comment has been minimized.

Copy link
@jpountz

jpountz Sep 2, 2014

Contributor

Maybe we should check for null before enumerating, NetworkInterface javadocs suggest that it can return null if no interfaces were found.

byte[] mungedBytes = new byte[6];
SecureRandomHolder.INSTANCE.nextBytes(mungedBytes);
for (int i = 0; i < 6; ++i) {
mungedBytes[i] = (byte) (0xff & (mungedBytes[i] ^ address[i]));

This comment has been minimized.

Copy link
@jpountz

jpountz Sep 2, 2014

Contributor

or just mungedBytes[i] ^= address[i];?

/** Puts the lower numberOfLongBytes from l into the array, starting index pos. */
private static void putLong(byte[] array, long l, int pos, int numberOfLongBytes) {
for (int i=0; i<numberOfLongBytes; ++i) {
array[pos+numberOfLongBytes-i-1] = (byte)( l >> (i*8) & 0xff);

This comment has been minimized.

Copy link
@jpountz

jpountz Sep 2, 2014

Contributor

Can we use an unsigned shift and remove the mask?

} else if (lastTimestamp.compareAndSet(last, timestamp)) {
break;
}
}

This comment has been minimized.

Copy link
@jpountz

jpountz Sep 2, 2014

Contributor

I might be wrong, but imagine that the clock is set to long in the future and then back again the the present. last will be greater than the current timestamp for a long time so timestamp will be always reset to last. And if we index more than 16M documents in that period of time then we will have duplicate identifiers?

This comment has been minimized.

Copy link
@jpountz

jpountz Sep 2, 2014

Contributor

Maybe the bug is that

timestamp++;

should be replaced with

timestamp = Math.max(timestamp, last) + 1;

to ensure that the timestamp actually increases when sequenceId wraps?

This comment has been minimized.

Copy link
@jpountz

jpountz Sep 2, 2014

Contributor

Maybe this method should be synchronized so that we can ensure that ids are actually monotonically increasing? The code in the synchronized block should be very fast (we can put the call to System.currentTimeMillis outside of it) so this would very unlikely be a bottleneck?

This comment has been minimized.

Copy link
@mikemccand

mikemccand Sep 2, 2014

Author Contributor

Thanks for all this great feedback @jpountz

Maybe this method should be synchronized so that we can ensure that ids are actually monotonically increasing?

+1, I'll do that.

@jpountz jpountz added review and removed review labels Sep 2, 2014

@mikemccand

This comment has been minimized.

Copy link
Contributor Author

commented Sep 2, 2014

Thanks @jpount, I pushed a new commit folding in your feedback...

@mikemccand mikemccand added the review label Sep 2, 2014

@jpountz

This comment has been minimized.

Copy link
Contributor

commented Sep 2, 2014

LGTM

@jpountz jpountz removed the review label Sep 2, 2014

@mikemccand mikemccand closed this in 9c1ac95 Sep 2, 2014

mikemccand added a commit that referenced this pull request Sep 2, 2014

Use Flake IDs instead of random UUIDs when auto-generating id field
Flake IDs give better lookup performance in Lucene since they share
predictable prefixes (timestamp).

Closes #7531

Closes #6004

Closes #5941

mikemccand added a commit that referenced this pull request Sep 8, 2014

Use Flake IDs instead of random UUIDs when auto-generating id field
Flake IDs give better lookup performance in Lucene since they share
predictable prefixes (timestamp).

Closes #7531

Closes #6004

Closes #5941

@clintongormley clintongormley changed the title Switch auto-ids to Flake IDs from random UUIDs Indexing: Switch auto-ids to Flake IDs from random UUIDs Sep 10, 2014

@clintongormley clintongormley changed the title Indexing: Switch auto-ids to Flake IDs from random UUIDs Switch auto-generated IDs to Flake IDs from random UUIDs Jun 6, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.