Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

currentTimeUUID creates duplicates when called at the same point in time #6208

Closed
haaawk opened this issue Apr 16, 2020 · 11 comments
Closed
Labels
area/internals an issue which refers to some internal class or something which has little exposure to users and is
Milestone

Comments

@haaawk
Copy link
Contributor

haaawk commented Apr 16, 2020

The issue was reported on StackOverflow here

Internally currentTimeUUID calls get_time_UUID which creates UUID using current time and a clock_seq_and_node constant generated once per shard.

    static UUID get_time_UUID()
    {
        auto uuid = UUID(instance->create_time_safe(), clock_seq_and_node);
        assert(uuid.is_timestamp());
        return uuid;
    }
const thread_local int64_t UUID_gen::clock_seq_and_node = make_clock_seq_and_node();
@haaawk
Copy link
Contributor Author

haaawk commented Apr 16, 2020

This issue will affect CDC as well because CDC generates TimeUUID for clustering keys in CDC Log

@haaawk haaawk self-assigned this Apr 16, 2020
@haaawk haaawk added this to the 4.1 milestone Apr 16, 2020
@haaawk
Copy link
Contributor Author

haaawk commented Apr 16, 2020

Fix is here -> #6209

@tgrabiec
Copy link
Contributor

It's not clear to me what the supposed bug is about. Please clarify in the description.

@haaawk
Copy link
Contributor Author

haaawk commented Apr 16, 2020

@tgrabiec
Copy link
Contributor

https://stackoverflow.com/questions/61223391/how-to-guarantee-monotonically-increasing-timeuuid-when-selecting-from-scylla/61243270

The user used currentTimeUUID and got 20-40 duplicates per partition.

I am confused as to what he means by that.

He says:

I tried using currentTimeUUID function and it seems to work(monotonically increasing within the same partition key)

which suggests that currentTimeUUID() doesn't generate duplicates.

Also, createdAt is a clustering key so queries cannot return duplicates in that column.

@slivne slivne modified the milestones: 4.1, 4.2 May 31, 2020
@haaawk haaawk removed the area/cdc label Jun 1, 2020
@slivne slivne added the area/internals an issue which refers to some internal class or something which has little exposure to users and is label Jun 1, 2020
@slivne slivne modified the milestones: 4.2, 4.x Jun 1, 2020
@nyh
Copy link
Contributor

nyh commented Jul 20, 2020

I think you are mis-understanding the problem (or I am misunderstanding you...). The function which you quoted,

static UUID get_time_UUID()
    {
        auto uuid = UUID(instance->create_time_safe(), clock_seq_and_node);
        assert(uuid.is_timestamp());
        return uuid;
    }

doesn't need clock_seq_and_node to change to avoid returning the same id twice... It uses create_time_safe() which ensures that the same uuid is not generated even if called in the same millisecond on the same shard (unfortunately we use millisecond resolution, which sucks for other reasons, but wouldn't break uniqueness). This is the reason for the "safe()" in its name.

So I don't think that the C++ function get_time_UUID() actually has the problem that you think it has.

Another possibility is that our code doesn't call it but call one of its variants which takes a time argument? Those variants indeed appear broken because they don't check for ties. Those broken variants should be outright deleted - do you know why they even exist?

I propose that before you do anything on this issue, that you try to write a test which reproduces this issue (unit test of C++ functions, or dtest or pytest to see the end-to-end problem in CQL), and then you can confirm you understand what the real problem is.

@nyh
Copy link
Contributor

nyh commented Nov 23, 2020

@kostja observed, if I understand correctly, that the error causing the problem reported above isn't the details mentioned above. Rather, it is make_node() in utils/UUID_gen.cc.
This function should have been different in different nodes - e.g., a MAC address - but we take the same number on all nodes (just the shard number), and the result is that two nodes that calculate a timestamp around the same time produce the same UUID.
The code even has a FIXME that this needs to be fixed :-(

@kostja
Copy link
Contributor

kostja commented Nov 27, 2020

Apart from filling host id, the patch should make sure timeuuid compare respects host id.

@nyh
Copy link
Contributor

nyh commented Nov 29, 2020

Apart from filling host id, the patch should make sure timeuuid compare respects host id.

Do you suspect that compare_visitor (the timeuuid_type_impl& case) in types.cc` ignores some parts of the timeuuid? Or by "make sure" you just mean we need to check and verify that this is the case?

@kostja
Copy link
Contributor

kostja commented Nov 29, 2020

The logic of the compare function confused me. Apart from being very inefficient, it falls back to signed compare of the entire UUID if timeuuid_compare_bytes() returns 0. I didn't notice it at first.

nyh pushed a commit that referenced this issue Jan 20, 2021
Before this patch, UUID generation code was not creating
sufficiently unique IDs: the 6 byte node identifier was mostly
empty, i.e. only containing shard id. This could lead to
collisions between queries executed concurrently at different
coordinators, and, since timeuuid is used as key in list append
and prepend operations, lead to lost updates.

To generate a unique node id, the patch uses a combination of
hardware MAC address (or a random number if no hardware address is
available) and the current shard id.

The shard id is mixed into higher bits of MAC, to reduce the
chances on NIC collision within the same network.

With sufficiently unique timeuuids as list cell keys, such updates
are no longer lost, but multi-value update can still be "merged"
with another multi-value update.

E.g. if node A executes SET l = l + [4, 5] and node B executes SET
l  = l + [6, 7], the list value could be any of [4, 5, 6, 7], [4,
6, 5, 7], [6, 4, 5, 7] and so on.

At least we are now less likely to get any value lost.

Fixes #6208.

@todo: initialize UUID subsystem explicitly in main()
and switch to using seastar::engine().net().network_interfaces()

test: unit (dev)
syuu1228 pushed a commit to syuu1228/scylla that referenced this issue Jan 25, 2021
Before this patch, UUID generation code was not creating
sufficiently unique IDs: the 6 byte node identifier was mostly
empty, i.e. only containing shard id. This could lead to
collisions between queries executed concurrently at different
coordinators, and, since timeuuid is used as key in list append
and prepend operations, lead to lost updates.

To generate a unique node id, the patch uses a combination of
hardware MAC address (or a random number if no hardware address is
available) and the current shard id.

The shard id is mixed into higher bits of MAC, to reduce the
chances on NIC collision within the same network.

With sufficiently unique timeuuids as list cell keys, such updates
are no longer lost, but multi-value update can still be "merged"
with another multi-value update.

E.g. if node A executes SET l = l + [4, 5] and node B executes SET
l  = l + [6, 7], the list value could be any of [4, 5, 6, 7], [4,
6, 5, 7], [6, 4, 5, 7] and so on.

At least we are now less likely to get any value lost.

Fixes scylladb#6208.

@todo: initialize UUID subsystem explicitly in main()
and switch to using seastar::engine().net().network_interfaces()

test: unit (dev)
@DoronArazii DoronArazii modified the milestones: 5.x, 5.0 Jul 14, 2022
@avikivity
Copy link
Member

Fix present on all active branches, not backporting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/internals an issue which refers to some internal class or something which has little exposure to users and is
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants