Skip to content

fixed bulk recognize has assigned uid bug #2850

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

fixed bulk recognize has assigned uid bug #2850

wants to merge 3 commits into from

Conversation

relunctance
Copy link

@relunctance relunctance commented Dec 28, 2018

rdf eg:

<0x4c4eab0>     <md5>   "a11aa8ed3695e504517ee1c4ea6f22c4"      .

live load work well , but bulk can't recognize that has been assigned uid "0x4c4eab0" .


This change is Reviewable

@CLAassistant
Copy link

CLAassistant commented Dec 28, 2018

CLA assistant check
All committers have signed the CLA.

@manishrjain
Copy link
Contributor

@codexnull Can you please review this? Two things to check for:

  1. Does Live load do this?
  2. The UID that's been handed, we need to make sure that bulk loader gets an appropriate lease from Zero to include that UID.

@manishrjain
Copy link
Contributor

This change does not work. And any small fixes won't work either. Say user's RDF has a pre-assigned UID X. This leads to 3 cases:

https://photos.app.goo.gl/MMpcjuMT4eEqUomf7

Case 1: We have assigned up to A uids. Bulk loader would in its entire run would assign up to B uids. X >> B, in which case, at the end of bulk loader, we need to ensure that UID lease > X, so we'll need to artificially jump the lease from B -> X, throwing the X - B UIDs on the floor (unused). This is annoying, but not a breaker.

Case 2: X falls between A and B, i.e. A < X < B. In that case, we need to ensure that X is not assigned to any other node in the system. This requires us to keep track of all the Xs (all predefined UIDs), and ensure that they don't get reassigned. This adds complexity.

Case 3: By the time mapper encounters X, A > X. That means X was already reassigned to some other node in the system. We can't go back and undo that assignment. So, bulk loader must now do a pre-pass through all the RDF files, looking for all Xs, before it starts mappers. That's just too much unnecessary work.

Given these cases, we decided not to support this use case, where a user wants to inject preassigned UIDs to bulk loader. In any case, bulk loader assumes a new cluster, so the idea of coming with a pre-assigned UID is wrong, to begin with.

@manishrjain manishrjain closed this Jan 3, 2019
@relunctance
Copy link
Author

Thanks .
Your analysis is very good . many problems can be avoided , It's safe to close it.
But It can use third parties to maintain assigned UID , In this way, The user knows what uid is available.
Maybe it only applies to new clusters, not existing ones to expand group.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants