Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxy version #138

Merged
merged 24 commits into from Nov 21, 2013
Merged

Proxy version #138

merged 24 commits into from Nov 21, 2013

Conversation

reggieb
Copy link
Member

@reggieb reggieb commented Nov 7, 2013

I'd like to bump the version to 0.12.0

This new version adds the facility to configure Geminabox as a RubyGems proxy. That is, in proxy mode, if a client requests a gem that is not stored locally, Geminabox will try to get the gem from RubyGems instead. If successful the gem will be stored locally, so in proxy mode Geminabox also acts as a RubyGems cache.

Proxy mode is switched off as default, and can be switched on by either:

Setting RUBYGEM_PROXY to true in the environment:

RUBYGEMS_PROXY=true rackup

Or in config.ru (before the run command), set:

Geminabox.rubygems_proxy = true

Whilst working on this modification, I also had some problems with the Geminabox name spacing. So I have also refactored the name spacing. The main consequence of this is that the main Sinatra app is now Geminabox::Server.

I've also moved some gem loading from the gemspec to Gemfile. Specifically gems that are only needed in the test environment. This should reduce the geminabox gem's dependencies.

Unless anyone has any problems with this new version, I'll merge this and push a new gem in a few days time.

@ghost ghost assigned reggieb Nov 7, 2013
@reggieb reggieb mentioned this pull request Nov 7, 2013
@flyinprogrammer
Copy link
Contributor

so just cloned master --- how do we seed the fresh install with rubygems.org gems?

running: gem install bundler yields the gem not being found even though proxy is set to 'true'

@flyinprogrammer
Copy link
Contributor

which is weird because /api/v1/dependencies?gems=bundler and /api/v1/dependencies.json?gems=bundler seems to work for days

@flyinprogrammer
Copy link
Contributor

it's because we aren't splicing the tars i presume:
$ gem install bundler --verbose
HEAD http://assadm000.hahosting.local/latest_specs.4.8.gz
200 OK
GET http://assadm000.hahosting.local/latest_specs.4.8.gz
200 OK
ERROR: Could not find a valid gem 'bundler' (>= 0) in any repository
HEAD http://assadm000.hahosting.local/prerelease_specs.4.8.gz
200 OK
GET http://assadm000.hahosting.local/prerelease_specs.4.8.gz
200 OK
HEAD http://assadm000.hahosting.local/specs.4.8.gz
200 OK
GET http://assadm000.hahosting.local/specs.4.8.gz
200 OK

@reggieb
Copy link
Member Author

reggieb commented Nov 7, 2013

Looks like I've got more work to do. I've been testing manually using a dummy app's Gemfile and bundler. That just relies on the contents of /api/v1/dependencies to look up what gems are available. I'll need to beef up the tests too.

My bad.

Thanks @flyinprogramer.

@flyinprogrammer
Copy link
Contributor

this is the dirty - dirty way of doing things:

https://github.com/flyinprogramer/geminabox/blob/master/lib/geminabox.rb#L64-L88

notice that the gz's have all the real spec data in it; but the 'raws': http://rubygems.org/specs.4.8; only have a smidge of data around rubygems -- it seems like rubygems is doing some sort of hack -- but considered best practice or 'good enough' if you're using the latest version of rubygems?

And my version of splice_in_rubygems! is incomplete. It should look like this:

splice_specs!('specs.4.8.gz')
splice_specs!('latest_specs.4.8.gz')
splice_specs!('prerelease_specs.4.8.gz')
reset_raw_files!

@reggieb
Copy link
Member Author

reggieb commented Nov 7, 2013

Thanks @flyinprogramer. I should be able to incorporate that tomorrow afternoon. Off to bed now.

@reggieb
Copy link
Member Author

reggieb commented Nov 8, 2013

The gz files being pulled down are reasonably big, so that only wants to be done occasionally. We could just do the merge each time a new gem is uploaded, but then the list of remote gems would become stale fairly quickly. The list has to be updated regularly, but adding a scheduler would be a pain. If its done each time rubygems is queried, its going to generate a lot of traffic. So I think it needs to be cached. 5 minutes would probably be enough.

Also the files are being stored locally, so there needs to be proxy and non-proxy versions of the local spec files. So that the list available matches the remote_proxy setting if that setting is toggled.

@flyinprogrammer
Copy link
Contributor

The gz files being pulled down are reasonably big, so that only wants to be done occasionally. We could just do the merge each time a new gem is uploaded, but then the list of remote gems would become stale fairly quickly. The list has to be updated regularly, but adding a scheduler would be a pain. If its done each time rubygems is queried, its going to generate a lot of traffic. So I think it needs to be cached. 5 minutes would probably be enough.

Splicing all of this at worst takes 1 minute; however, it's still a lot of work, and costly with tying up not being able to serve the gz files while they're locked for writing. I would simply only have it re-splice after a 'full' reindex as the 'partial' reindex seems to be good enough to unzip-add-zip the existing file as it is, no problem. I was playing around with causing http://host/reindex actually simply fork a process to go do this and return a landing page that says 'we see you want a reindex; our gnomes have gone off to try and accomplish this work; wish us luck!' or something because of course if rubygems.org is down this is going to fail and it's hard to notify the user of this.

Inside this block is what I'm talking about as being a 'full' reindex -- this is the part where it actually re-inspects '[data]/gems'; what's actually on disk; and treats all other files as garbage.
https://github.com/reggieb/geminabox/blob/master/lib/geminabox/server.rb#L42-L45

Also the files are being stored locally, so there needs to be proxy and non-proxy versions of the local spec files. So that the list available matches the remote_proxy setting if that setting is toggled.

Why? -- unless you have 2 geminabox servers using the same data folder in 2 different modes (and that seems like a terrible idea) -- it doesn't matter if these zips get all of rubygems added to them; on a 'full' reindex w/o proxy you're not going to do the splice; and these are going to be 'fixed' to being back to what you have on disk.

@flyinprogrammer
Copy link
Contributor

@flyinprogrammer
Copy link
Contributor

so with that commit; if i upload a gem via browser; it seems to totally ruin our latest_spec and spec gz's; and what's weird is that i went from having all the gems (rubygems + local) to 50; but i have 1912 gems in my data/gems folder...

@reggieb
Copy link
Member Author

reggieb commented Nov 8, 2013

I've spotted one reason why this isn't working. get_from_rubygems_if_not_local should only apply when the request is for a gem. So it needs to be moved from the server method, to the "/gems/*.gem" action

  get "/gems/*.gem" do
    get_from_rubygems_if_not_local if Server.rubygems_proxy
    serve
  end

Where it was, if you have an empty data folder, the system tries to create data/latest_specs.4.8.gz via get_from_rubygems_if_not_local which fails because it isn't a valid gem. So what is needed is to call your splice_specs! method at the action where the gz is requested. Something like:

%w[specs.4.8.gz
   latest_specs.4.8.gz
   prerelease_specs.4.8.gz
].each do |index|
  get "/#{index}" do
    splice_specs!(index) if Server.rubygems_proxy
    content_type('application/x-gzip')
    serve
  end
end

Also we don't need to cache. This is only going to take a little longer than if you were fetching stuff directly from rubygems.

@reggieb
Copy link
Member Author

reggieb commented Nov 10, 2013

I got a little further on Friday. I've got hostess handling the requests for gems, gz, and gemspec files. But pulling a single gem from the command line, fails after that (haven't got the rz file handled correctly I think). I'll spend a few hours on it on Monday, but if that doesn't work out, I'm tempted to release this version as a bundler only proxy, which I think will work for a lot of use cases (mine included), and return to it when I have some more time (or someone else may be able to pick it up).

@reggieb
Copy link
Member Author

reggieb commented Nov 12, 2013

/api/v1/dependencies outputs a nice list of just what's needed for a list of gems, so I am surprised there is so much "get me everything" traffic. I wonder if Norman Stansfield is behind it.

@reggieb
Copy link
Member Author

reggieb commented Nov 15, 2013

@flyinprogramer, I thought you'd like an update. I've implemented your code flyinprogramer/geminabox@727a3a4 and it works in that I can bundle using bundler 1.5.0.rc.1. But, it also makes geminabox think it has all the gems locally that are stored on RubyGems and the http://localhost:9292 grinds almost to a halt, before displaying a huge list of gems.

However, you've given me the way forward.

I think we need to keep the spliced files separated from the main local files, and serve the spliced files in proxy mode.

I think I'm also going to only splice if the file is small or has not been updated for a couple of minutes, as splicing takes a lot of work.

Thanks again for your help

@reggieb
Copy link
Member Author

reggieb commented Nov 15, 2013

@flyinprogramer can you have a look at my develop branch: https://github.com/reggieb/geminabox/tree/develop

I've incorporated your splicing mechanism into a new class Splicer. Bundling now seems to work well, but there is still a problem with gem install.

I think the problem is the file types: /quick/Marshal.4.8/*.gemspec.rz, /yaml.Z, /Marshal.4.8.Z

I think with /quick/Marshal.4.8/*.gemspec.rz geminabox should serve the local file if there is one, and the remote one if not.

I have no idea of what to do with /yaml.Z, /Marshal.4.8.Z. Do you know how I can merge local and remote versions?

@flyinprogrammer
Copy link
Contributor

So the issue is that when it goes to fetch a gemspec; you're always trying to make it a Gem.
https://github.com/reggieb/geminabox/blob/develop/lib/geminabox/hostess.rb#L62

with my version i see if the request ends in .gem; if it does; do that; if it doesn't; its a normal file we don't care to cache; so treat it as such:
https://github.com/flyinprogramer/geminabox/blob/master/lib/geminabox/hostess.rb#L59-L65

as for those other files; the ones that end in .Z; i have never seen them requested in my life!

we also still definitely have race conditions with that splice file write; when you setup geminabox with unicorn to handle more than 1 request it's very possible to be writing those .gz files at the same; and that causes weirdness. more than once i've seen our gemserver only register local gems vs. all of them.

also as far as slowness goes; yah idk what to do about that other than re-write/tweak the UI. for our server i have the home page just serve up the total count (that's how i know we have file sync issues; sometimes its 2000 (our local count) other times its 65k (local + remote)) -- and i've told my people to just use [hostname]/gems/(name of gem) to see if we are hosting it and what versions are available.

https://github.com/reggieb/geminabox/blob/develop/lib/geminabox/splicer.rb#L18
https://github.com/reggieb/geminabox/blob/develop/lib/geminabox/splicer.rb#L62

Some how we need to sync/lock these files? idk i have never done things like this.
Or -- it'd be cooler if we just streamed the spliced zip to our clients; maybe even setup caching?
either way - we can leave this for v2 since we splice and over-write each request its never an issue unless we're trying to use the web front end - in which case why are you doing that ?

@reggieb
Copy link
Member Author

reggieb commented Nov 16, 2013

Ah! I over looked your if path.end_with? '.gem' step. Good that's easy to implement.

As each one is gem specific, I guess they don't change (much/at all?). I'm tempted to build the system so that it will try to get the local copy, then the spliced copy and if that fails, it will then get a copy from RubyGems, and copy it to spliced.

Performance became much less of an issue when I moved the remote files to the spliced folder. I think the system may have been trying to write the merged file to one of the files it was trying to merge into the new content, and this was causing problems that were hugely impacting the performance.

I think the web interface should present gems that are stored locally, and not those that are remotely hosted at RubyGems. So in proxy mode, it will list all the gems that have actually been pulled down through the server, but not all the gems that the geminabox instance knows about (that is, all listed /specs.4.8.gz rather than those listed in spliced/.specs.4.8.gz).

It should be easy to drop in custom views, by specifying a location for custom version via Geminabox.views. I'm very tempted to build a rake task that will copy the default view files to the Geminabox.views location, to provide templates, for anyone wanting to do this.

Therefore, I am firmly of the belief that we should keep the views as simple as possible, and encourage users to roll their own.

@reggieb
Copy link
Member Author

reggieb commented Nov 16, 2013

Oh and I'll local > spliced > remote the .Z files too.

@flyinprogrammer
Copy link
Contributor

😂 😂 😂 😂 😂

@reggieb
Copy link
Member Author

reggieb commented Nov 17, 2013

Can't sleep. 💤

@flyinprogramer glad you like it so far! 😄

I've realised that I can clean the code up a lot by splitting the proxy functionality out into a sub-module. Then choose to use Proxy at one point; that being Server where the choice is made to use Geminabox::Hostess or Geminabox::Proxy::Hostess. That means we have a nice clean Hostess (same as pre-proxy version), and ::Proxy::Hostess can drop having to check the proxy status all the time. It also separates the objects needed for proxy mode, out from the other objects. It then makes sense to keep the files Proxy uses in data/proxy.

@reggieb
Copy link
Member Author

reggieb commented Nov 17, 2013

OK. I think I might be there now.

@flyinprogramer, can you have a look at my develop branch now: https://github.com/reggieb/geminabox/tree/develop

@flyinprogrammer
Copy link
Contributor

this looks great -- i'll put it through some testing in a few hours; i just got done with a 3 day hacking fest --- 250 changed files with 23,780 additions and 6,672 deletions; I need sleep first.

@reggieb
Copy link
Member Author

reggieb commented Nov 19, 2013

Just looking back through the comments, and I think it's worth making one point. Copier first looks for a proxy version, then local, then remote. When I started building it I realised that this made more sense that looking for local first.

@flyinprogrammer
Copy link
Contributor

as you can see i forked yo edits --- i just got the server up; no changes needed thus far but i still need to test things.

i wanted to add some caching and maybe do locks on those proxy writes. of course those can be in v2; and are just an idea --- update to come when i have it!

@flyinprogrammer
Copy link
Contributor

gem dependency bundler -r worked first boot!!!!
uploaded an 'internal' gem
gem dependency [internal gem name] -r works too first try too -- and the webui is working!!!

i think we're getting ready for a release

@reggieb
Copy link
Member Author

reggieb commented Nov 19, 2013

Good news!

However, I have a feeling gem install isn't working perfectly. I think the gemspecs contain the rubygems locations for the dependencies and leads to them being downloaded directly from rubygems. I have a feeling we'll need to open up the gemspec.rz files and modify them.

@flyinprogrammer
Copy link
Contributor

what is your evidence for thinking this is true?

i modified my /etc/hosts so that rubygems.org resolved to localhost;
cleared out my system, and did:

gem install zeus --verbose

everything pointed to my local gemserver and installed correctly first try.

I know the specs do have 'website' fields in them; but that's how they populate this data:
https://www.evernote.com/shard/s9/sh/89a9ac25-3f7b-4900-9109-536d2d038f29/7134832e1bc62c11b333528e7ffd32e6

@reggieb
Copy link
Member Author

reggieb commented Nov 20, 2013

Watching the server output as I did a gem install. To be honest, your test is better. Looks like I was wrong (which is good!)

I think we're ready then. Do you agree?

@flyinprogrammer
Copy link
Contributor

I agree! I'm running it in production for the company I work for starting 10 minutes ago -- we can wait a day to see if they can topple it over; but i think this is going to work great!

@reggieb
Copy link
Member Author

reggieb commented Nov 20, 2013

Tomorrow then! 😄

@flyinprogrammer
Copy link
Contributor

no issues to report -- i'd say we're all systems go! 🚀

reggieb added a commit that referenced this pull request Nov 21, 2013
Proxy version: Adds facility for geminabox to act as a RubyGems proxy
@reggieb reggieb merged commit 442de59 into geminabox:master Nov 21, 2013
@reggieb
Copy link
Member Author

reggieb commented Nov 21, 2013

It's done! I've pushed the gem too.

@flyinprogrammer
Copy link
Contributor

#143 <--- found a bug; this is a quick patch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants