Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to Github #422

Closed
ghost opened this issue Aug 29, 2015 · 12 comments
Closed

Migrate to Github #422

ghost opened this issue Aug 29, 2015 · 12 comments

Comments

@ghost
Copy link

ghost commented Aug 29, 2015

Originally reported by: jaraco (Bitbucket: jaraco, GitHub: jaraco)


@RonnyPfannschmidt and others have on more than one occasion asked about migrating from Bitbucket to Github. I'm creating this ticket to track that proposal and execution for Setuptools.

First, I'm generally in support of migration to Github. The current two-repo system (for supporting Travis-based CI and Github contributions) is clumsy at best, and the dominance of Github is undeniable. When Mercurial was chosen for Setuptools, the choice was made to align better with the existing Distribute repo, easier transition for SVN users, and for some of the same reasons that Mercurial was chosen for Python itself. This rationale has minimal value moving forward.

I've previously migrated a couple of projects from Bitbucket to Github including keyring and setuptools_scm. Here's the technique I used:

  1. Using late versions of Mercurial, Dulwich, and hg-git, create the Git clone of the Hg repository. Verify that heads, branches, etc, are all represented properly.
  2. From Github, request a temporary exemption to the rate throttling on the API, as anything more than 10 or so issues/comments will hit the rate limits.
  3. Use bitbucket_issue_migration to migrate the issues. If one migrates the issues to a clean repository, the new issues will have the same numbering.
  4. Disable issue tracking on the original repo.
  5. Update references and links in the new and old repositories to direct users as appropriate. Cut a new release to publish these references with the package.

While this process has been adequate for smaller projects, there are some issues that I suspect cannot be simply ignored in migrating larger projects like Setuptools.

issue attribution and timestamps

My biggest concern is about issue attribution and timestamps. The migration is not lossless. Every issue and comment gets a current timestamp and is attributed to the user under which the migration runs. The migration works around this by adding a timestamp and note of original attribution into the body of the text. This makes reviewing of these tickets harder to comprehend. This degradation of quality is acceptable for trivial repositories, but will be substantially less acceptable for a project like Setuptools with hundreds of open and closed tickets. Can this be improved?

open issues and pull requests remain in the old project

With hundreds of open tickets in the old project, users will be subscribed to those tickets and not to the new ones. It will take a great deal of effort to get those tickets closed and updated to refer to the migrated copy. Can this be done automatically (reference the migrated ticket, close it if not closed already, and finally disable the issues on the old repo)?

closed, anonymous heads

Mercurial allows for closed, anonymous heads in the repository. These heads will likely be omitted from the history when pushing to a Git repo. Perhaps that's acceptable, but it will mean a loss of history.


@ghost
Copy link
Author

ghost commented Aug 31, 2015

Original comment by dstufft (Bitbucket: dstufft, GitHub: dstufft):


I'm not sure if it's worthwhile to do or not, but when Go switched to Github they made a little webservice where people could authorized this webservice with OAuth tokens to post issues and such on their behalf. When they actually did the migration the bot would look to see who created the issue (or a comment to an issue) and if that person had authorized the webservice to be allowed to make posts on their behalf the bot would post with their user instead, otherwise the bot fell back onto using a global migration user. You can see a few details here - https://groups.google.com/forum/#!topic/golang-dev/sckirqOWepg%5B1-25%5D. I'm not sure if that tool is open source or if it's not if there's anything like it that is, but it's an option that could be explored to trying to make the transition less lossy.

@ghost
Copy link
Author

ghost commented Mar 4, 2016

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):


Issue #484 was marked as a duplicate of this issue.

@ghost
Copy link
Author

ghost commented Mar 4, 2016

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):


Ronny has indicated that he'll be unable to pursue this issue, so I'll take it back and see what I can make work. Thanks to Donald for the suggestion. I'll look into that and see if it's a viable solution.

Currently, the migration process is also blocked by bitbucket_issue_migration 65 unless I use Ronny's fork or mithrandi's fork.

@ghost
Copy link
Author

ghost commented Mar 7, 2016

Original comment by nickchammas (Bitbucket: nickchammas, GitHub: Unknown):


Just a note that may be of interest: I asked GitHub support about whether their new Import tool will be upgraded to also import issues, and I got this in response:

Thanks for reaching out. You could use a new GitHub API [1] endpoint we're working on which was designed with such migrations in mind. It should allow you to preserve the dates of issues and comments that you migrate and it doesn't trigger notifications (so it's not affected by the content-creation abuse rate limits [2]). We wrote up a walkthrough of it in this gist https://gist.github.com/jonmagic/5282384165e0f86ef105. Could you give that a try with a few issues and a test project, and see if it fits your needs?

This API doesn't support setting the author on issues or comments -- all issues and comments will be created with the user doing the import as the author. As I'm sure you understand, allowing you to set any user as the author of an issue or comment would be a security concern. The team is looking into ways to support setting authors if those authors approve of the import, but exposing that in a way that doesn't block the whole import (in case one author doesn't approve or doesn't respond) will take a bit of time (so I don't expect this to be available in the near future). For now, you can add the username of the original author in a short header or footer of the issue/comment body.

[1] https://developer.github.com/v3/

[2] https://developer.github.com/v3/#abuse-rate-limits

When I asked if the migration preserves the original issue author via a note in the issue text, I got this in response:

The migration doesn't do this automatically, no. If you'd like that to happen -- you'll need to add that text to the issue or comment body yourself.

@ghost
Copy link
Author

ghost commented Mar 7, 2016

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):


@stevenk_, @nakatoio: This is the ticket where we're working on the Github migration.

@nickchammas: The bitbucket_issue_migration tool (main fork and mithrandi's fork) does use the issue import API, having the benefits and limitations that Github describes, including the authorship and other details in the body of the comment.

Thinking about technique dstuff describes, I'm guessing the migration was done by actually generating the comments through the regular API, which probably lost fidelity about date/time of the comments.

Although I originally sought to retain attribution when restoring issues and their comments, I'm at this point relenting to accept whatever migration can complete with whatever fidelity that migration can muster. If we can get the issues to migrate, the move of code should be straightforward (I'll do that from my Mercurial clone which already has git hashes through the jaraco/setuptools github mirror).

I welcome the help on this effort. Thanks.

@ghost
Copy link
Author

ghost commented Mar 17, 2016

Original comment by stevenk_ (Bitbucket: stevenk_, GitHub: Unknown):


I've been looking at this over the past few days.

Since I don't have enough access to bitbucket to perform an export of issues, I've hacked up the main fork of bitbucket_issue_migration to output lightly modified JSON that I then zip up and feed into mithrandi's fork. I've created a new repository on github to import into, which I'll then delete when we're done with it, and I've numbered it, so we can iterate if we need to.

https://github.com/s-t-e-v-e-n-k/test-setuptools-migration-1/issues

@ghost
Copy link
Author

ghost commented Mar 17, 2016

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):


Steve,

I can grant whatever access you need to work this effectively. I didn't think any access was required at the source for public issue trackers such as setuptools has.

@ghost
Copy link
Author

ghost commented Mar 17, 2016

Original comment by stevenk_ (Bitbucket: stevenk_, GitHub: Unknown):


Admin access to the setuptools repository is required to perform an export of the issues. I certainly understand if you're a little concerned to hand that out, which is why I'm happy enough with my hacked together zipfile. I'm more concerned with how my first shot import looks, and what we can do to improve it.

@ghost
Copy link
Author

ghost commented Mar 29, 2016

Original comment by stevenk_ (Bitbucket: stevenk_, GitHub: Unknown):


Thanks for granting me access, Jason. I can see now after comparing the exported zipfile from Bitbucket versus my hacked together one that there are a number of differences. I have completed a second import, which is at:

https://github.com/s-t-e-v-e-n-k/test-setuptools-migration-2/issues

Let me know how it looks.

@ghost
Copy link
Author

ghost commented Mar 29, 2016

Original comment by jaraco (Bitbucket: jaraco, GitHub: jaraco):


I did a spot check of a few issues, and it generally looks good to me.

One question - what user should we use for performing the migration? I suspect @stevenk_ doesn't want to be a defacto watcher on every Setuptools issue. Also, I've found it confusing when I go to a project and I can't readily distinguish between the comments I made and the comments I imported. I think I'd like to see them imported under a different account, perhaps a "migration" account. Pytest used pytestbot.

To that end, I created bb-migration. I'll send you the password in a PM. If you could run the migration once more using that account to a repo called 'setuptools', and then request a transfer of that repo to the pypa org, I'll handle pushing the code. We'll want to disable issues here in short order, so I'll want to cut a release shortly after the migration to communicate the change and to redirect users to the new site. Any other considerations?

@ghost
Copy link
Author

ghost commented Mar 29, 2016

Original comment by stevenk_ (Bitbucket: stevenk_, GitHub: Unknown):


One problem is that you need to enable issues on jaraco/setuptools before I run the import. I think the plan should be to run the export, disable issues here, transfer the repo, and then cut a release. If you'd like to co-ordinate in real-ish time, I'm on IRC as StevenK in #pypa-dev. My only other consideration would be around open pull requests on bitbucket, but it's entirely up to you on how you handle it.

@ghost
Copy link
Author

ghost commented Mar 29, 2016

Original comment by stevenk_ (Bitbucket: stevenk_, GitHub: Unknown):


Sorry, I misread your comment. I have created an empty setuptools repository on github, and I will perform a fresh export and migration in the morning, and will comment here when it's complete. I have added you as a collaborator, so you should be able to just push up code when you wish.

https://github.com/s-t-e-v-e-n-k/setuptools is the link.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

0 participants