Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map bpo users to GitHub users #11

Closed
ezio-melotti opened this issue Oct 3, 2021 · 15 comments
Closed

Map bpo users to GitHub users #11

ezio-melotti opened this issue Oct 3, 2021 · 15 comments
Assignees

Comments

@ezio-melotti
Copy link
Member

On bpo users can specify their GitHub username. If they do so, their bpo issues/comments can be mapped to their GitHub users, however this only works for users that belong to the "python" organization.

For users with a GitHub username that don't belong to the "python" org and for users that haven't specified their GitHub usernames, a placeholder user (called mannequin) with either their GitHub or bpo username will be created.

The mannequin will only show the username, so:

  • even if the user exists outside of the "python" org:
    • there is no direct way to know their real name
    • clicking on their username will not open their user page
    • they (probably) will not receive notifications for new comments
    • they won't be able to edit their old comments
  • if the bpo username is used:
    • there is no direct way to know their real name
    • if will not be possible to know their GitHub username (if they have one)
    • they will not receive notifications
    • it might create confusion if the original author comments with a different GitHub username
    • it might create confusion if a different GitHub user exists with the same username

Mannequins can be manually reclaimed after the import, but this might still be impossible if the users don't belong to the org. A possible workaround is to create a new org, add all the bpo users that have a GitHub username to that org (possibly without sending out notifications), perform the import there so that all the users get mapped, then copy all the issues to python/cpython and remove the new org. This might preserve the user mapping even if the users don't belong to the "python" org.

@ezio-melotti ezio-melotti added this to To do in GitHub migration via automation Oct 3, 2021
@gvanrossum
Copy link

Do you know (roughly) how many users are in the python org and how many are not?

@ezio-melotti
Copy link
Member Author

There are ~210 users and ~70 external contributors under the "python" org on Github. This should include all the core devs and possibly some staff, triagers, and people working on other projects under the org. On bpo there are ~33k users.

In other words, only core devs will be mapped, and all the other contributors won't be mapped unless we find a solution (GitHub is aware of the issue and looking into it).

@gvanrossum
Copy link

Thanks for indicating the scale. Creating 33k dummy GitHub users (while surely a drop in the bucket of so many millions of users) seems suboptimal, so I hope GitHub finds a solution. (If they don't, I hope that the dummy users at least links or provides a copy of that user's public bpo metadata -- IIRC there's at least an optional "real name"? And how bad would it be if it linked to the stated GH username, even if we can't verify that?)

@ezio-melotti
Copy link
Member Author

As far as I understand, it doesn't create an actual dummy GitHub user -- it just shows the username and a (mannequin) tag next to it. Clicking or hovering on the username doesn't open a popup with the user info nor opens the user page. I also tried to set the full name for the mannequin user while importing the data, but the name doesn't show up anywhere.

This is how it looks like:
20211003--01

@gvanrossum
Copy link

Good luck!

@warsaw
Copy link

warsaw commented Oct 3, 2021 via email

@ezio-melotti
Copy link
Member Author

I believe this is by design, and that the migration tool was designed with self-hosted GitHub instances in mind. In that situation, all the users should belong to the organization, and users that are not in the org are likely former employers that left the org and should therefore be replaced by mannequins.

GitHub is checking if there's a way to map external users too. I've also been thinking about potential security concerns, and the only thing I can think of is that a bpo user could create issues or write comments on bpo and set their GitHub username on bpo to the username of e.g. a core dev. After the migration it will appear like the core dev wrote them but they won't be able to post anymore as the core dev, and the core dev should be able to edit/delete the old message. This could also be mitigated by reviewing duplicated GitHub usernames on bpo (in some cases conflating two bpo accounts into one GitHub account might be useful).

@gvanrossum
Copy link

Yeah, I'm pretty sure I have two bpo accounts that I use interchangeably.

@ezio-melotti
Copy link
Member Author

I discussed with the SC the approach suggested by GitHub, i.e. creating a dummy org, inviting people to join it, waiting 1/2-week, and then performing the import, but even though this should map the accounts properly, it sounds like it might not be very effective, so the idea has been abandoned.

A better approach might be:

  1. encourage people to add their GitHub username on bpo (by adding a banner on top of bpo, writing to python-dev, and possibly by mailing them directly)
  2. add to the migrated issues people that have their GitHub username set (either by directly mentioning them in their messages, or if possible, by adding them to the issue during the migration)

Currently there are ~33k total users, ~8k have a GitHub username set (I have to double-check this), and the remaining ~25k don't. With this approach:

  • the accounts of the ~200 core devs will be properly mapped
  • the accounts of the ~8k+ people with a GitHub username won't be directly mapped but they will still be added to the issue and receive future notifications. However they won't be able to reclaim their previous messages and mannequin accounts (since they are not part of the org).
  • the remaining ~25k account won't be linked to the new issues and won't receive notifications if new messages are added on GitHub since we don't know their GitHub username (but they can follow a link from bpo to GitHub if they happen to go back to check bpo)

I'm still waiting to hear back from GitHub to see if there are other options available.

@gvanrossum
Copy link

gvanrossum commented Oct 18, 2021 via email

@FFY00
Copy link

FFY00 commented Dec 10, 2021

FWIW, Github seems to have rolled out an update that enables this kind of data claims. I believe the API is not publicly documented as they are still iterating over the design, you should probably reach out to them (I believe you are already in contact, but I may be wrong 😅).

github-data-claim

@ezio-melotti
Copy link
Member Author

There is a way to claim contributions, but afaik it only works for members of the organization. Are you a member of the llvm org?

@FFY00
Copy link

FFY00 commented Dec 10, 2021

Ah, alright. Yes, I am.

@asl
Copy link

asl commented Dec 15, 2021

A possible workaround is to create a new org,

There is no need for a new org. Just create a new team and invite users there. Our (LLVM) experience shows that mannequin resolution is a manual process (there is no API for this). So, it's tedious click-enter-click-click process. You do not want this for 33k users :) It would be better to create such team in advance and start sending invitations. The invitation expires in 7 days, so some users will certainly miss it, but you can send them several times, say, within a month. This will streamline the process heavily.

@ezio-melotti ezio-melotti self-assigned this Mar 22, 2022
@ezio-melotti
Copy link
Member Author

Eventually we settled on what I listed here: #11 (comment)

Of the ~35k bpo users, about ~8.8k have a linked GitHub username. Their username will be listed in the body of the message, but it's not possible to automatically subscribe them. #12 propose a solution, and it should be possible to automatically mention them with an action.
Of the 190 users with the iscommitter bit, 131 have a linked GitHub username and 59 don't. The 59 that don't will still be able to reclaim their mannequin after the migration.

GitHub migration automation moved this from To do to Done Mar 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

5 participants