Skip to content

Conversation

savil
Copy link
Collaborator

@savil savil commented Feb 2, 2023

Summary

Motivation:
From inspecting the segment logs, I am reminded that the userID is logged as a 64 char.

Segment says that ideally this userID is something we track in our main database. Longer term, I'm not sure we'd want our primary userIDs to be TEXT kind in Postgres data type terms, compared to smaller uuid or int variations.

Approach:
This PR amends the UserIDFromGithubUsername function to deterministically
return a UUID string (or empty string if not authenticated or error occurs).

It is important that the same UUID is returned for the same github username.

RFC: Breaking Change
Tradeoff: in Segment we'll not be able to track the same users' activity prior to this PR.

How was it tested?

copied this function into a devbox-project for golang and executed it multiple times
with different usernames.

@savil savil requested a review from loreto February 2, 2023 22:46
Copy link
Collaborator Author

savil commented Feb 2, 2023

Current dependencies on/for this PR:

This comment was auto-generated by Graphite.

@savil savil requested a review from ipince February 2, 2023 23:48
}

hash := hashFromUsername(username)
uid, err := uuid.FromBytes([]byte(hash)[:16])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will create a UUID with a valid version/variant since all of the bytes are random. This might not matter for Sentry though (I'm not sure why they want a UUID specifically). uuid.NewHash might be a better way of hashing the username into a UUID.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Segment (not sentry) doesn't ask for a UUID. They do recommend choosing a UserID that will be stable for the same user, so we can track the same user over time.

Prior to this PR, I was using just the raw 64-char hash. But I felt that would be too large, and unstructured in case we choose to use that value later on in some internal database (hypothetically). So, I am exploring using a uuid type instead.

Thanks for pointing out the problem about the invalid version/variant. Will look into that.

const githubPrefix = "github:"
mac.Write([]byte(githubPrefix + username))

return hex.EncodeToString(mac.Sum(nil))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could not encode the hash to hex and use the byte slice directly.

@savil savil force-pushed the savil/truncate-user-id branch from 5aa722b to 9e19455 Compare February 6, 2023 19:56
@savil savil merged commit 7f087d4 into main Feb 6, 2023
@savil savil deleted the savil/truncate-user-id branch February 6, 2023 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants