New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce file size of repo #610

Closed
qrush opened this Issue Oct 14, 2013 · 46 comments

Comments

Projects
None yet
@qrush
Member

qrush commented Oct 14, 2013

A fresh clone of rubygems.org is over 400MB. What can we do to reduce this? @sferik any ideas? :)

[master][~/Dev/rubygems.org] du -skh .
421M    .
@skottler

This comment has been minimized.

Show comment
Hide comment
@skottler

skottler Oct 14, 2013

Member

git gc --auto should reduce the size just due to the number of loose objects.

Member

skottler commented Oct 14, 2013

git gc --auto should reduce the size just due to the number of loose objects.

@qrush

This comment has been minimized.

Show comment
Hide comment
@qrush

qrush Oct 14, 2013

Member

This was off a fresh git clone. No garbage collection will help here!

Member

qrush commented Oct 14, 2013

This was off a fresh git clone. No garbage collection will help here!

@skottler

This comment has been minimized.

Show comment
Hide comment
@skottler

skottler Oct 15, 2013

Member

A yeah, I missed the 'fresh' part ;-)

Member

skottler commented Oct 15, 2013

A yeah, I missed the 'fresh' part ;-)

@sferik

This comment has been minimized.

Show comment
Hide comment
@sferik

sferik Oct 15, 2013

Member

We could keep vendor/cache in a separate repo, which could be deployed as a git submodule. It wouldn't be necessary to update this submodule when doing bundle install, only when doing a bundle update.

Once we make this change, we’ll also need to squash all the commits in the main repo that add or update a gem in vendor/cache.

Member

sferik commented Oct 15, 2013

We could keep vendor/cache in a separate repo, which could be deployed as a git submodule. It wouldn't be necessary to update this submodule when doing bundle install, only when doing a bundle update.

Once we make this change, we’ll also need to squash all the commits in the main repo that add or update a gem in vendor/cache.

@dreamr

This comment has been minimized.

Show comment
Hide comment
@dreamr

dreamr Nov 5, 2013

I agree, just started working on some refactoring of rgo and the initial download is silly. I know there is a division here between people like me that don't think this kind of thing should be part of the repo, and there are others that disagree with me.

Just wanted to chime in

dreamr commented Nov 5, 2013

I agree, just started working on some refactoring of rgo and the initial download is silly. I know there is a division here between people like me that don't think this kind of thing should be part of the repo, and there are others that disagree with me.

Just wanted to chime in

@qrush

This comment has been minimized.

Show comment
Hide comment
@qrush

qrush Nov 28, 2014

Member

We're now over 500MB:

% du -skh .
552M    .

Any ideas on how to make this better?

Member

qrush commented Nov 28, 2014

We're now over 500MB:

% du -skh .
552M    .

Any ideas on how to make this better?

@qrush qrush added the health label Nov 28, 2014

@indirect

This comment has been minimized.

Show comment
Hide comment
@indirect

indirect Nov 28, 2014

Member

The only solution I know of is what @sferik said--keep the gems in a submodule so that its history can be rewritten without older gems periodically.

Member

indirect commented Nov 28, 2014

The only solution I know of is what @sferik said--keep the gems in a submodule so that its history can be rewritten without older gems periodically.

@qrush

This comment has been minimized.

Show comment
Hide comment
@qrush

qrush Nov 28, 2014

Member

I think we should try that then. I don't think we should squash the history though. We could use git filter-branch and effectively write out every commit to vendor/cache.

Member

qrush commented Nov 28, 2014

I think we should try that then. I don't think we should squash the history though. We could use git filter-branch and effectively write out every commit to vendor/cache.

@dwradcliffe

This comment has been minimized.

Show comment
Hide comment
@dwradcliffe

dwradcliffe Nov 28, 2014

Member

This will break history for everyone though, right?

Member

dwradcliffe commented Nov 28, 2014

This will break history for everyone though, right?

@qrush

This comment has been minimized.

Show comment
Hide comment
@qrush

qrush Nov 28, 2014

Member

It sure will, but what's better: Having a small repo to help more people get started and still be protected against deploys breaking, or one giant repo?

We could also try git-subtree:

http://ariya.ofilabs.com/2014/07/extracting-parts-of-git-repository-and-keeping-the-history.html

Member

qrush commented Nov 28, 2014

It sure will, but what's better: Having a small repo to help more people get started and still be protected against deploys breaking, or one giant repo?

We could also try git-subtree:

http://ariya.ofilabs.com/2014/07/extracting-parts-of-git-repository-and-keeping-the-history.html

@luislavena

This comment has been minimized.

Show comment
Hide comment
@luislavena

luislavena Nov 28, 2014

Member

@qrush @dwradcliffe have you guys considered git-annex and the different kind of remotes?

Member

luislavena commented Nov 28, 2014

@qrush @dwradcliffe have you guys considered git-annex and the different kind of remotes?

@robertodecurnex

This comment has been minimized.

Show comment
Hide comment
@robertodecurnex

robertodecurnex Nov 28, 2014

@qrush try to get make the repo smaller is always good.

On the orther hand, I can read between lines that the repo size may cause deploy problems.

You should probably try to have archives (git archive branch_name) ready to deploy, ~27MB.

Or, use shallow copies of the repo (git clone --depth=1), they are just ~23.64 MB. Since Git 1.9 this shallow repos have been improved a lot. You can still pull/push changes as a normal repo does so you can keep your current instance updated after the first deploy, just pulling, as usual.

About development, you can clone with --depth=1 and run a git fetch --unshallow in parallel. You can start working right away.

robertodecurnex commented Nov 28, 2014

@qrush try to get make the repo smaller is always good.

On the orther hand, I can read between lines that the repo size may cause deploy problems.

You should probably try to have archives (git archive branch_name) ready to deploy, ~27MB.

Or, use shallow copies of the repo (git clone --depth=1), they are just ~23.64 MB. Since Git 1.9 this shallow repos have been improved a lot. You can still pull/push changes as a normal repo does so you can keep your current instance updated after the first deploy, just pulling, as usual.

About development, you can clone with --depth=1 and run a git fetch --unshallow in parallel. You can start working right away.

@dwradcliffe

This comment has been minimized.

Show comment
Hide comment
@dwradcliffe

dwradcliffe Nov 28, 2014

Member

We should be ok on deploys, but let's leave the production branch alone until we're sure.

Member

dwradcliffe commented Nov 28, 2014

We should be ok on deploys, but let's leave the production branch alone until we're sure.

@evanphx

This comment has been minimized.

Show comment
Hide comment
@evanphx

evanphx Nov 29, 2014

Member

I ran git filter-branch --index-filter 'git rm --cached --ignore-unmatch -r vendor/cache' HEAD, then cloned that git clone --no-hardlinks rubygems.org rubygems.org.2, ran git gc and git prune:

zero :: git/rubygems.org.2> git prune
zero :: git/rubygems.org.2> du -sh .
231M    .
Member

evanphx commented Nov 29, 2014

I ran git filter-branch --index-filter 'git rm --cached --ignore-unmatch -r vendor/cache' HEAD, then cloned that git clone --no-hardlinks rubygems.org rubygems.org.2, ran git gc and git prune:

zero :: git/rubygems.org.2> git prune
zero :: git/rubygems.org.2> du -sh .
231M    .
@qrush

This comment has been minimized.

Show comment
Hide comment
@qrush

qrush Nov 29, 2014

Member

That's still huge. Looks like we should check out some of these other approaches or try a submodule.

Member

qrush commented Nov 29, 2014

That's still huge. Looks like we should check out some of these other approaches or try a submodule.

@evanphx

This comment has been minimized.

Show comment
Hide comment
@evanphx

evanphx Nov 29, 2014

Member

What would we put in a submodule? At this point, we've accumulated a lot of data and thats contributing to the large history.

Just so we're clear, what is your goal @qrush? How are you successful isn't measured in the repo size, thats just a way to achieve a goal. What is that goal?

Member

evanphx commented Nov 29, 2014

What would we put in a submodule? At this point, we've accumulated a lot of data and thats contributing to the large history.

Just so we're clear, what is your goal @qrush? How are you successful isn't measured in the repo size, thats just a way to achieve a goal. What is that goal?

@indirect

This comment has been minimized.

Show comment
Hide comment
@indirect

indirect Nov 29, 2014

Member

I think what @sferik and I are suggesting is that we use filter-branch to remove vendor/cache from the history of the rubygems.org http://rubygems.org/ repo (which brings it down to a very reasonable size for day to day devlopment), and then use a git submodule for the vendor/cache directory from that point forward. The submodule will be optional for development (since bundle install can use the lock to deliver the same gems), but it allows deploys to succeed even when rubygems.org http://rubygems.org/ is down.

On Nov 28, 2014, at 5:11 PM, Evan Phoenix notifications@github.com wrote:

What would we put in a submodule? At this point, we've accumulated a lot of data and thats contributing to the large history.

Just so we're clear, what is your goal @qrush https://github.com/qrush? How are you successful isn't measured in the repo size, thats just a way to achieve a goal. What is that goal?


Reply to this email directly or view it on GitHub #610 (comment).

Member

indirect commented Nov 29, 2014

I think what @sferik and I are suggesting is that we use filter-branch to remove vendor/cache from the history of the rubygems.org http://rubygems.org/ repo (which brings it down to a very reasonable size for day to day devlopment), and then use a git submodule for the vendor/cache directory from that point forward. The submodule will be optional for development (since bundle install can use the lock to deliver the same gems), but it allows deploys to succeed even when rubygems.org http://rubygems.org/ is down.

On Nov 28, 2014, at 5:11 PM, Evan Phoenix notifications@github.com wrote:

What would we put in a submodule? At this point, we've accumulated a lot of data and thats contributing to the large history.

Just so we're clear, what is your goal @qrush https://github.com/qrush? How are you successful isn't measured in the repo size, thats just a way to achieve a goal. What is that goal?


Reply to this email directly or view it on GitHub #610 (comment).

@qrush

This comment has been minimized.

Show comment
Hide comment
@qrush

qrush Nov 29, 2014

Member

I'd say the goal is to have a reasonable sized download for people to get started with. Ideally under 100mb would be a good start.

If we ask people to start work on the site, let's say at a hack night or anywhere without an awesome bandwidth connection, downloading 500MB of a repo would certainly make me think again about contributing.

Member

qrush commented Nov 29, 2014

I'd say the goal is to have a reasonable sized download for people to get started with. Ideally under 100mb would be a good start.

If we ask people to start work on the site, let's say at a hack night or anywhere without an awesome bandwidth connection, downloading 500MB of a repo would certainly make me think again about contributing.

@evanphx

This comment has been minimized.

Show comment
Hide comment
@evanphx

evanphx Nov 29, 2014

Member

I'll play devil's advocate: have people complained about the size? Has it been an impediment for people to do development? There are certainly ways to manage for an hack party (seeding repos from a usb stick, etc).

I don't think sub-modules are a solution, since they'll still need to be downloaded, so there is no net savings in doing a fresh clone (unless you do some weird submodule depth cloning).

Additionally, I showed that the vendor/cache is 200Mb of the total and @qrush feels that 215Mb is still to high. So how will we reduce that other 215Mb?

Member

evanphx commented Nov 29, 2014

I'll play devil's advocate: have people complained about the size? Has it been an impediment for people to do development? There are certainly ways to manage for an hack party (seeding repos from a usb stick, etc).

I don't think sub-modules are a solution, since they'll still need to be downloaded, so there is no net savings in doing a fresh clone (unless you do some weird submodule depth cloning).

Additionally, I showed that the vendor/cache is 200Mb of the total and @qrush feels that 215Mb is still to high. So how will we reduce that other 215Mb?

@evanphx

This comment has been minimized.

Show comment
Hide comment
@evanphx

evanphx Nov 29, 2014

Member

For hack nights, doing a depth clone is the solution. The ~20Mb it results in can't be beat with any other solution.

Member

evanphx commented Nov 29, 2014

For hack nights, doing a depth clone is the solution. The ~20Mb it results in can't be beat with any other solution.

@robertodecurnex

This comment has been minimized.

Show comment
Hide comment
@robertodecurnex

robertodecurnex Nov 29, 2014

People MUST get used to shallow clones. +500MB repos are getting more and more common as the time pass and, for collaborators, that do not really own the projects, having the whole repo makes no sense at all. May be add some guidelines about how to clone it with --depth and how to fetch new branches, and even --unshallow it afterwards, would work. The only thing to warn people about is that they should be using git >= 1.9 (as most people does).

Moving vendor/cache out sound reasonable too, but since it really doesn't solve the problem...

robertodecurnex commented Nov 29, 2014

People MUST get used to shallow clones. +500MB repos are getting more and more common as the time pass and, for collaborators, that do not really own the projects, having the whole repo makes no sense at all. May be add some guidelines about how to clone it with --depth and how to fetch new branches, and even --unshallow it afterwards, would work. The only thing to warn people about is that they should be using git >= 1.9 (as most people does).

Moving vendor/cache out sound reasonable too, but since it really doesn't solve the problem...

@indirect

This comment has been minimized.

Show comment
Hide comment
@indirect

indirect Nov 29, 2014

Member

I think recommending git 1.9 and shallow clones to new contributors is a really good idea! That said, I don’t think that “shallow clones exist” is a good argument against reducing the size of the repo if we can. Huge repos make lots of things more painful they need to be, especially if the reason for the huge repo is completely transient (simply to be able to deploy even if rg.org is down).

On Nov 28, 2014, at 7:54 PM, Roberto Decurnex notifications@github.com wrote:

People MUST get used to shallow clones. +500MB repos are getting more and more common as the time pass and, for collaborators, that do not really own the projects, having the whole repo makes no sense at all. May be add some guidelines about how to clone it with --depth and how to fetch new branches, and even --unshallow it afterwards, would work. The only thing to warn people about is that they should be using git >= 1.9 (as most people does).

Moving vendor/cache out sound reasonable too, but since it really doesn't solve the problem...


Reply to this email directly or view it on GitHub #610 (comment).

Member

indirect commented Nov 29, 2014

I think recommending git 1.9 and shallow clones to new contributors is a really good idea! That said, I don’t think that “shallow clones exist” is a good argument against reducing the size of the repo if we can. Huge repos make lots of things more painful they need to be, especially if the reason for the huge repo is completely transient (simply to be able to deploy even if rg.org is down).

On Nov 28, 2014, at 7:54 PM, Roberto Decurnex notifications@github.com wrote:

People MUST get used to shallow clones. +500MB repos are getting more and more common as the time pass and, for collaborators, that do not really own the projects, having the whole repo makes no sense at all. May be add some guidelines about how to clone it with --depth and how to fetch new branches, and even --unshallow it afterwards, would work. The only thing to warn people about is that they should be using git >= 1.9 (as most people does).

Moving vendor/cache out sound reasonable too, but since it really doesn't solve the problem...


Reply to this email directly or view it on GitHub #610 (comment).

@evanphx

This comment has been minimized.

Show comment
Hide comment
@evanphx

evanphx Nov 29, 2014

Member

One way to reduce the size of vendor/cache is simply updating it less often. I know @sferik likes to keep it update often, but that contributes to the unnecessary churn and thus size of vendor/cache.

We still haven't figure out how to reduce the other 215Mb that is not vendor/cache either.

Member

evanphx commented Nov 29, 2014

One way to reduce the size of vendor/cache is simply updating it less often. I know @sferik likes to keep it update often, but that contributes to the unnecessary churn and thus size of vendor/cache.

We still haven't figure out how to reduce the other 215Mb that is not vendor/cache either.

@evanphx

This comment has been minimized.

Show comment
Hide comment
@evanphx

evanphx Nov 29, 2014

Member

We could address the need for vendor/cache entirely too, namely independence from rubygems.org itself. We could operate a separate gem server (maybe even use of the existing services) to host the gems needed to run our software and reference it from the Gemfile.

Member

evanphx commented Nov 29, 2014

We could address the need for vendor/cache entirely too, namely independence from rubygems.org itself. We could operate a separate gem server (maybe even use of the existing services) to host the gems needed to run our software and reference it from the Gemfile.

@indirect

This comment has been minimized.

Show comment
Hide comment
@indirect

indirect Nov 29, 2014

Member

I mean, we could even have the deploy process push the .gem files from the machine being deployed from. They have to be there if the bundle install succeeded.

On Nov 28, 2014, at 8:12 PM, Evan Phoenix notifications@github.com wrote:

We could address the need for vendor/cache entirely too, namely independence from rubygems.org itself. We could operate a separate gem server (maybe even use of the existing services) to host the gems needed to run our software and reference it from the Gemfile.


Reply to this email directly or view it on GitHub #610 (comment).

Member

indirect commented Nov 29, 2014

I mean, we could even have the deploy process push the .gem files from the machine being deployed from. They have to be there if the bundle install succeeded.

On Nov 28, 2014, at 8:12 PM, Evan Phoenix notifications@github.com wrote:

We could address the need for vendor/cache entirely too, namely independence from rubygems.org itself. We could operate a separate gem server (maybe even use of the existing services) to host the gems needed to run our software and reference it from the Gemfile.


Reply to this email directly or view it on GitHub #610 (comment).

@robertodecurnex

This comment has been minimized.

Show comment
Hide comment
@robertodecurnex

robertodecurnex Nov 29, 2014

@indirect you are right. The size issue is still there, but collaboration is, I think, not a real problem here, nor the deploy.

I agree 100% that vendor/cache does not belong to this repo. Moving them elsewhere and cleaning its references will make a big difference. As mentioned before it will also destroy the current tree, forcing everyone to prune their repos (kind of drastic, but it probably worth it, actually there is a git-replace that can be used to prevent that [insert creeps here]).

As I first mentioned (on Twitter), and @evanphx suggests, having a private gemserver just mirroring rubygems.org deps would work fine. Having just the .gem files stored somewhere, accessible from the rubygems.org server, will also work (you don't really need all the complexity of the specs since you should have all the dependencies resolved at development time, as Heroku does, they just take your Gemfile.lock and use their local gems).

I've even used private apt repos to handle gems without rubygem in the middle (while keeping version constraints).

robertodecurnex commented Nov 29, 2014

@indirect you are right. The size issue is still there, but collaboration is, I think, not a real problem here, nor the deploy.

I agree 100% that vendor/cache does not belong to this repo. Moving them elsewhere and cleaning its references will make a big difference. As mentioned before it will also destroy the current tree, forcing everyone to prune their repos (kind of drastic, but it probably worth it, actually there is a git-replace that can be used to prevent that [insert creeps here]).

As I first mentioned (on Twitter), and @evanphx suggests, having a private gemserver just mirroring rubygems.org deps would work fine. Having just the .gem files stored somewhere, accessible from the rubygems.org server, will also work (you don't really need all the complexity of the specs since you should have all the dependencies resolved at development time, as Heroku does, they just take your Gemfile.lock and use their local gems).

I've even used private apt repos to handle gems without rubygem in the middle (while keeping version constraints).

@dwradcliffe

This comment has been minimized.

Show comment
Hide comment
@dwradcliffe

dwradcliffe Nov 29, 2014

Member

At this point I'd rather not maintain a second gem server for this. The simplest solution is just keeping this in a separate git repo and symlinking during deploy. I don't want to lose the local .gem cache - it makes deploys very very fast and of course helps prevent a circular dependency. :) We can probably even automate the collection of these gems into the new git repo.

Member

dwradcliffe commented Nov 29, 2014

At this point I'd rather not maintain a second gem server for this. The simplest solution is just keeping this in a separate git repo and symlinking during deploy. I don't want to lose the local .gem cache - it makes deploys very very fast and of course helps prevent a circular dependency. :) We can probably even automate the collection of these gems into the new git repo.

@evanphx

This comment has been minimized.

Show comment
Hide comment
@evanphx

evanphx Nov 29, 2014

Member

Putting the gems into another repo won't help the space problem, we'll just have the same data in a different place.

Member

evanphx commented Nov 29, 2014

Putting the gems into another repo won't help the space problem, we'll just have the same data in a different place.

@robertodecurnex

This comment has been minimized.

Show comment
Hide comment
@robertodecurnex

robertodecurnex Nov 29, 2014

If you use that repo just for deploys it's ok. @dwradcliffe means real symlinks, clone the gems on a different directory and link them to the app directory. Doing it that way (without vendor/cache and without the submodule) you can still get the smaller rubygems.org repo. On a dev machine you just need to clone rubygems.org and run bundler locally.

robertodecurnex commented Nov 29, 2014

If you use that repo just for deploys it's ok. @dwradcliffe means real symlinks, clone the gems on a different directory and link them to the app directory. Doing it that way (without vendor/cache and without the submodule) you can still get the smaller rubygems.org repo. On a dev machine you just need to clone rubygems.org and run bundler locally.

@evanphx

This comment has been minimized.

Show comment
Hide comment
@evanphx

evanphx Dec 1, 2014

Member

@qrush I filtered out vendor/ entirely and doesn't seem to have made a difference.

Member

evanphx commented Dec 1, 2014

@qrush I filtered out vendor/ entirely and doesn't seem to have made a difference.

@arthurnn arthurnn self-assigned this Jan 9, 2015

@arthurnn

This comment has been minimized.

Show comment
Hide comment
@arthurnn

arthurnn Jan 9, 2015

Member

Ok, I think I found the best of both worlds, let me show the results and explain after:

[arthurnn@ralph arthurnn_rb]$ du -skh .
 11M    .
[arthurnn@ralph arthurnn_rb]$ ls vendor/cache/
[arthurnn@ralph arthurnn_rb]$ more .gitmodules
[submodule "vendor/cache"]
        path = vendor/cache
        url = git@github.com:arthurnn/rubygems.org-vendor.git

You can see that the repo, right after a clone has only 11M !!! And vendor/cache is empty as it is a submodule.
How it works is, after you clone the repo you have 2 choices:

  • bundle install, and carry on (assuming you wont be changing gems)
  • git submodule init; git submodule update which will clone the big(500M) repo, assuming you need that to change gems.

Like that we can have people cloning the main repo super fast and making changes, and also we have a vendor cache for all gems, with history.

You can test all that locally, clone my fork: https://github.com/arthurnn/rubygems.org.
The gem vendor cache folder is a separate repo, https://github.com/arthurnn/rubygems.org-vendor , which I kept all history that touched that folder.

Just for info, the folders I removed from history were:

  • vendor/bundler_gems
  • vendor/gems
  • vendor/rails
  • vendor/plugins
  • vendor/cache

Let me know if you guys like it, and pros/cons about this approach. Only cons we need to consider is that all SHA now are new, and any GitHub link or comment that ref a SHA we wont be able to track that down, but thats what happens when we rewrite history, at least we do this now.

Member

arthurnn commented Jan 9, 2015

Ok, I think I found the best of both worlds, let me show the results and explain after:

[arthurnn@ralph arthurnn_rb]$ du -skh .
 11M    .
[arthurnn@ralph arthurnn_rb]$ ls vendor/cache/
[arthurnn@ralph arthurnn_rb]$ more .gitmodules
[submodule "vendor/cache"]
        path = vendor/cache
        url = git@github.com:arthurnn/rubygems.org-vendor.git

You can see that the repo, right after a clone has only 11M !!! And vendor/cache is empty as it is a submodule.
How it works is, after you clone the repo you have 2 choices:

  • bundle install, and carry on (assuming you wont be changing gems)
  • git submodule init; git submodule update which will clone the big(500M) repo, assuming you need that to change gems.

Like that we can have people cloning the main repo super fast and making changes, and also we have a vendor cache for all gems, with history.

You can test all that locally, clone my fork: https://github.com/arthurnn/rubygems.org.
The gem vendor cache folder is a separate repo, https://github.com/arthurnn/rubygems.org-vendor , which I kept all history that touched that folder.

Just for info, the folders I removed from history were:

  • vendor/bundler_gems
  • vendor/gems
  • vendor/rails
  • vendor/plugins
  • vendor/cache

Let me know if you guys like it, and pros/cons about this approach. Only cons we need to consider is that all SHA now are new, and any GitHub link or comment that ref a SHA we wont be able to track that down, but thats what happens when we rewrite history, at least we do this now.

@indirect

This comment has been minimized.

Show comment
Hide comment
@indirect

indirect Jan 9, 2015

Member

This is exactly my suggestion, so I am 👍 on it. :)

Member

indirect commented Jan 9, 2015

This is exactly my suggestion, so I am 👍 on it. :)

@dwradcliffe

This comment has been minimized.

Show comment
Hide comment
@dwradcliffe

dwradcliffe Jan 11, 2015

Member

Looks good - I think this is the best solution. Let's get a 👍 from @qrush, @evanphx and @sferik before we do it.

Member

dwradcliffe commented Jan 11, 2015

Looks good - I think this is the best solution. Let's get a 👍 from @qrush, @evanphx and @sferik before we do it.

@arthurnn

This comment has been minimized.

Show comment
Hide comment
@arthurnn

arthurnn Jan 11, 2015

Member

yes,also before rewriting all history, we could save a fork of this repo just in case we need some reference, or backup.

Member

arthurnn commented Jan 11, 2015

yes,also before rewriting all history, we could save a fork of this repo just in case we need some reference, or backup.

@qrush

This comment has been minimized.

Show comment
Hide comment
@qrush

qrush Jan 16, 2015

Member

So now we have to make sure git clone --recursive ? Everytime I've dealt with submodules it's awfully painful and confusing. But I'd say that is better than a huge repo.

Member

qrush commented Jan 16, 2015

So now we have to make sure git clone --recursive ? Everytime I've dealt with submodules it's awfully painful and confusing. But I'd say that is better than a huge repo.

@arthurnn

This comment has been minimized.

Show comment
Hide comment
@arthurnn

arthurnn Jan 16, 2015

Member

@qrush only people that are going to update gems will have to git clone --recursive. which is probably 15% of devs. The other 85% will install gems using bundler and dont need the submodule.

Member

arthurnn commented Jan 16, 2015

@qrush only people that are going to update gems will have to git clone --recursive. which is probably 15% of devs. The other 85% will install gems using bundler and dont need the submodule.

@qrush

This comment has been minimized.

Show comment
Hide comment
@qrush

qrush Jan 29, 2015

Member

I think this one can be closed now. Big thanks to @arthurnn @dwradcliffe and others who helped make this happen!!

Member

qrush commented Jan 29, 2015

I think this one can be closed now. Big thanks to @arthurnn @dwradcliffe and others who helped make this happen!!

@qrush qrush closed this Jan 29, 2015

@phoet

This comment has been minimized.

Show comment
Hide comment
@phoet

phoet Jan 29, 2015

Contributor

@arthurnn 😘

Contributor

phoet commented Jan 29, 2015

@arthurnn 😘

@qrush qrush referenced this issue Jan 29, 2015

Merged

Some cleanup #859

@sferik

This comment has been minimized.

Show comment
Hide comment
@sferik

sferik Jan 29, 2015

Member

A new patch version of the bcrypt gem was just released and I just updated rubygems.org and the vendor repository. I just want to make sure I’m following the correct protocol for updating gems. Here is a script of what I did:

git clone git@github.com:rubygems/rubygems.org.git # fast
git clone git@github.com:rubygems/rubygems.org-vendor.git rubygems.org/vendor/cache # slow
cd rubygems.org/
bundle update bcrypt
cd vendor/cache/
git add -u bcrypt-3.1.9.gem
git add bcrypt-3.1.10.gem
git commit -m "Update bcrypt to version 3.1.10"
git push origin master
cd ../../
git add Gemfile.lock
git add vendor/cache
git commit -m "Update bcrypt to version 3.1.10"
git push origin master

Here are the two resulting commits: rubygems/rubygems.org-vendor@a64c6d3 and e0c7549.

Does this all look kosher? Am I missing any steps? Am I performing any unnecessary steps?

If this looks good, I will probably write a small rake task so I don’t have to perform these 11 steps every time I want to update a gem. Once I do that, should I add it to lib/tasks? I could also document this in the wiki.

Member

sferik commented Jan 29, 2015

A new patch version of the bcrypt gem was just released and I just updated rubygems.org and the vendor repository. I just want to make sure I’m following the correct protocol for updating gems. Here is a script of what I did:

git clone git@github.com:rubygems/rubygems.org.git # fast
git clone git@github.com:rubygems/rubygems.org-vendor.git rubygems.org/vendor/cache # slow
cd rubygems.org/
bundle update bcrypt
cd vendor/cache/
git add -u bcrypt-3.1.9.gem
git add bcrypt-3.1.10.gem
git commit -m "Update bcrypt to version 3.1.10"
git push origin master
cd ../../
git add Gemfile.lock
git add vendor/cache
git commit -m "Update bcrypt to version 3.1.10"
git push origin master

Here are the two resulting commits: rubygems/rubygems.org-vendor@a64c6d3 and e0c7549.

Does this all look kosher? Am I missing any steps? Am I performing any unnecessary steps?

If this looks good, I will probably write a small rake task so I don’t have to perform these 11 steps every time I want to update a gem. Once I do that, should I add it to lib/tasks? I could also document this in the wiki.

@arthurnn

This comment has been minimized.

Show comment
Hide comment
@arthurnn

arthurnn Jan 29, 2015

Member

@sferik seems right.
However I am not sure if a rake task is necessary.
After you have the repo cloned, there is 2 extra commands only ( which are the commit and push from inside the vendor/cache folder ).

This is me doing a end-to-end gem update:
http://showterm.io/3f29906f765569bc48742

Also, I would say we dont need to update the submodule for every gem we update, we actually could only update the submodule before we merge to production branch.
So people that create a PR to update a gem, they wont need to deal with submodules at all.

Member

arthurnn commented Jan 29, 2015

@sferik seems right.
However I am not sure if a rake task is necessary.
After you have the repo cloned, there is 2 extra commands only ( which are the commit and push from inside the vendor/cache folder ).

This is me doing a end-to-end gem update:
http://showterm.io/3f29906f765569bc48742

Also, I would say we dont need to update the submodule for every gem we update, we actually could only update the submodule before we merge to production branch.
So people that create a PR to update a gem, they wont need to deal with submodules at all.

@sferik

This comment has been minimized.

Show comment
Hide comment
@sferik

sferik Jan 29, 2015

Member

Also, I would say we dont need to update the submodule for every gem we update, we actually could only update the submodule before we merge to production branch.

Okay, we just need to make sure this is part of our deploy process, so the Gemfile.lock and vendor/cache don’t get out-of-sync. /cc @dwradcliffe

Member

sferik commented Jan 29, 2015

Also, I would say we dont need to update the submodule for every gem we update, we actually could only update the submodule before we merge to production branch.

Okay, we just need to make sure this is part of our deploy process, so the Gemfile.lock and vendor/cache don’t get out-of-sync. /cc @dwradcliffe

@dwradcliffe

This comment has been minimized.

Show comment
Hide comment
@dwradcliffe

dwradcliffe Jan 29, 2015

Member

Yeah I suppose we could automate the vendor part so no one has to do it manually.

Member

dwradcliffe commented Jan 29, 2015

Yeah I suppose we could automate the vendor part so no one has to do it manually.

@mockdeep

This comment has been minimized.

Show comment
Hide comment
@mockdeep

mockdeep Feb 4, 2015

Contributor

One thing I've seen in the past is that it is possible for someone to unwittingly re-introduce the files. Not sure if git is better about managing it now, but I think if you were to merge the old with the new it would end up showing each commit twice in the history as well as the files being back in the repo. Doing this on another project, I removed all of the contributors and made them make a pull-request to confirm they were clean before adding them back.

Contributor

mockdeep commented Feb 4, 2015

One thing I've seen in the past is that it is possible for someone to unwittingly re-introduce the files. Not sure if git is better about managing it now, but I think if you were to merge the old with the new it would end up showing each commit twice in the history as well as the files being back in the repo. Doing this on another project, I removed all of the contributors and made them make a pull-request to confirm they were clean before adding them back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment