New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proxy version #138
Proxy version #138
Conversation
so just cloned master --- how do we seed the fresh install with rubygems.org gems? running: gem install bundler yields the gem not being found even though proxy is set to 'true' |
which is weird because /api/v1/dependencies?gems=bundler and /api/v1/dependencies.json?gems=bundler seems to work for days |
it's because we aren't splicing the tars i presume: |
Looks like I've got more work to do. I've been testing manually using a dummy app's Gemfile and bundler. That just relies on the contents of /api/v1/dependencies to look up what gems are available. I'll need to beef up the tests too. My bad. Thanks @flyinprogramer. |
this is the dirty - dirty way of doing things: https://github.com/flyinprogramer/geminabox/blob/master/lib/geminabox.rb#L64-L88 notice that the gz's have all the real spec data in it; but the 'raws': http://rubygems.org/specs.4.8; only have a smidge of data around rubygems -- it seems like rubygems is doing some sort of hack -- but considered best practice or 'good enough' if you're using the latest version of rubygems? And my version of splice_in_rubygems! is incomplete. It should look like this:
|
Thanks @flyinprogramer. I should be able to incorporate that tomorrow afternoon. Off to bed now. |
The gz files being pulled down are reasonably big, so that only wants to be done occasionally. We could just do the merge each time a new gem is uploaded, but then the list of remote gems would become stale fairly quickly. The list has to be updated regularly, but adding a scheduler would be a pain. If its done each time rubygems is queried, its going to generate a lot of traffic. So I think it needs to be cached. 5 minutes would probably be enough. Also the files are being stored locally, so there needs to be proxy and non-proxy versions of the local spec files. So that the list available matches the remote_proxy setting if that setting is toggled. |
Splicing all of this at worst takes 1 minute; however, it's still a lot of work, and costly with tying up not being able to serve the gz files while they're locked for writing. I would simply only have it re-splice after a 'full' reindex as the 'partial' reindex seems to be good enough to unzip-add-zip the existing file as it is, no problem. I was playing around with causing http://host/reindex actually simply fork a process to go do this and return a landing page that says 'we see you want a reindex; our gnomes have gone off to try and accomplish this work; wish us luck!' or something because of course if rubygems.org is down this is going to fail and it's hard to notify the user of this. Inside this block is what I'm talking about as being a 'full' reindex -- this is the part where it actually re-inspects '[data]/gems'; what's actually on disk; and treats all other files as garbage.
Why? -- unless you have 2 geminabox servers using the same data folder in 2 different modes (and that seems like a terrible idea) -- it doesn't matter if these zips get all of rubygems added to them; on a 'full' reindex w/o proxy you're not going to do the splice; and these are going to be 'fixed' to being back to what you have on disk. |
this is dirty; but this commit works: https://github.com/flyinprogramer/geminabox/commit/3704d7efe3a46cf059351c5a5e9f2548b562344d |
so with that commit; if i upload a gem via browser; it seems to totally ruin our latest_spec and spec gz's; and what's weird is that i went from having all the gems (rubygems + local) to 50; but i have 1912 gems in my data/gems folder... |
I've spotted one reason why this isn't working. get_from_rubygems_if_not_local should only apply when the request is for a gem. So it needs to be moved from the server method, to the "/gems/*.gem" action
Where it was, if you have an empty data folder, the system tries to create data/latest_specs.4.8.gz via get_from_rubygems_if_not_local which fails because it isn't a valid gem. So what is needed is to call your splice_specs! method at the action where the gz is requested. Something like:
Also we don't need to cache. This is only going to take a little longer than if you were fetching stuff directly from rubygems. |
I got a little further on Friday. I've got hostess handling the requests for gems, gz, and gemspec files. But pulling a single gem from the command line, fails after that (haven't got the rz file handled correctly I think). I'll spend a few hours on it on Monday, but if that doesn't work out, I'm tempted to release this version as a bundler only proxy, which I think will work for a lot of use cases (mine included), and return to it when I have some more time (or someone else may be able to pick it up). |
/api/v1/dependencies outputs a nice list of just what's needed for a list of gems, so I am surprised there is so much "get me everything" traffic. I wonder if Norman Stansfield is behind it. |
@flyinprogramer, I thought you'd like an update. I've implemented your code flyinprogramer/geminabox@727a3a4 and it works in that I can bundle using bundler 1.5.0.rc.1. But, it also makes geminabox think it has all the gems locally that are stored on RubyGems and the http://localhost:9292 grinds almost to a halt, before displaying a huge list of gems. However, you've given me the way forward. I think we need to keep the spliced files separated from the main local files, and serve the spliced files in proxy mode. I think I'm also going to only splice if the file is small or has not been updated for a couple of minutes, as splicing takes a lot of work. Thanks again for your help |
@flyinprogramer can you have a look at my develop branch: https://github.com/reggieb/geminabox/tree/develop I've incorporated your splicing mechanism into a new class Splicer. Bundling now seems to work well, but there is still a problem with gem install. I think the problem is the file types: /quick/Marshal.4.8/*.gemspec.rz, /yaml.Z, /Marshal.4.8.Z I think with /quick/Marshal.4.8/*.gemspec.rz geminabox should serve the local file if there is one, and the remote one if not. I have no idea of what to do with /yaml.Z, /Marshal.4.8.Z. Do you know how I can merge local and remote versions? |
So the issue is that when it goes to fetch a gemspec; you're always trying to make it a Gem. with my version i see if the request ends in .gem; if it does; do that; if it doesn't; its a normal file we don't care to cache; so treat it as such: as for those other files; the ones that end in .Z; i have never seen them requested in my life! we also still definitely have race conditions with that splice file write; when you setup geminabox with unicorn to handle more than 1 request it's very possible to be writing those .gz files at the same; and that causes weirdness. more than once i've seen our gemserver only register local gems vs. all of them. also as far as slowness goes; yah idk what to do about that other than re-write/tweak the UI. for our server i have the home page just serve up the total count (that's how i know we have file sync issues; sometimes its 2000 (our local count) other times its 65k (local + remote)) -- and i've told my people to just use [hostname]/gems/(name of gem) to see if we are hosting it and what versions are available. https://github.com/reggieb/geminabox/blob/develop/lib/geminabox/splicer.rb#L18 Some how we need to sync/lock these files? idk i have never done things like this. |
Ah! I over looked your if path.end_with? '.gem' step. Good that's easy to implement. As each one is gem specific, I guess they don't change (much/at all?). I'm tempted to build the system so that it will try to get the local copy, then the spliced copy and if that fails, it will then get a copy from RubyGems, and copy it to spliced. Performance became much less of an issue when I moved the remote files to the spliced folder. I think the system may have been trying to write the merged file to one of the files it was trying to merge into the new content, and this was causing problems that were hugely impacting the performance. I think the web interface should present gems that are stored locally, and not those that are remotely hosted at RubyGems. So in proxy mode, it will list all the gems that have actually been pulled down through the server, but not all the gems that the geminabox instance knows about (that is, all listed /specs.4.8.gz rather than those listed in spliced/.specs.4.8.gz). It should be easy to drop in custom views, by specifying a location for custom version via Geminabox.views. I'm very tempted to build a rake task that will copy the default view files to the Geminabox.views location, to provide templates, for anyone wanting to do this. Therefore, I am firmly of the belief that we should keep the views as simple as possible, and encourage users to roll their own. |
Oh and I'll local > spliced > remote the .Z files too. |
😂 😂 😂 😂 😂 |
Can't sleep. 💤 @flyinprogramer glad you like it so far! 😄 I've realised that I can clean the code up a lot by splitting the proxy functionality out into a sub-module. Then choose to use Proxy at one point; that being Server where the choice is made to use Geminabox::Hostess or Geminabox::Proxy::Hostess. That means we have a nice clean Hostess (same as pre-proxy version), and ::Proxy::Hostess can drop having to check the proxy status all the time. It also separates the objects needed for proxy mode, out from the other objects. It then makes sense to keep the files Proxy uses in data/proxy. |
OK. I think I might be there now. @flyinprogramer, can you have a look at my develop branch now: https://github.com/reggieb/geminabox/tree/develop |
this looks great -- i'll put it through some testing in a few hours; i just got done with a 3 day hacking fest --- 250 changed files with 23,780 additions and 6,672 deletions; I need sleep first. |
Just looking back through the comments, and I think it's worth making one point. Copier first looks for a proxy version, then local, then remote. When I started building it I realised that this made more sense that looking for local first. |
as you can see i forked yo edits --- i just got the server up; no changes needed thus far but i still need to test things. i wanted to add some caching and maybe do locks on those proxy writes. of course those can be in v2; and are just an idea --- update to come when i have it! |
gem dependency bundler -r worked first boot!!!! i think we're getting ready for a release |
Good news! However, I have a feeling gem install isn't working perfectly. I think the gemspecs contain the rubygems locations for the dependencies and leads to them being downloaded directly from rubygems. I have a feeling we'll need to open up the gemspec.rz files and modify them. |
what is your evidence for thinking this is true? i modified my /etc/hosts so that rubygems.org resolved to localhost;
everything pointed to my local gemserver and installed correctly first try. I know the specs do have 'website' fields in them; but that's how they populate this data: |
Watching the server output as I did a gem install. To be honest, your test is better. Looks like I was wrong (which is good!) I think we're ready then. Do you agree? |
I agree! I'm running it in production for the company I work for starting 10 minutes ago -- we can wait a day to see if they can topple it over; but i think this is going to work great! |
Tomorrow then! 😄 |
no issues to report -- i'd say we're all systems go! 🚀 |
Proxy version: Adds facility for geminabox to act as a RubyGems proxy
It's done! I've pushed the gem too. |
#143 <--- found a bug; this is a quick patch |
I'd like to bump the version to 0.12.0
This new version adds the facility to configure Geminabox as a RubyGems proxy. That is, in proxy mode, if a client requests a gem that is not stored locally, Geminabox will try to get the gem from RubyGems instead. If successful the gem will be stored locally, so in proxy mode Geminabox also acts as a RubyGems cache.
Proxy mode is switched off as default, and can be switched on by either:
Setting RUBYGEM_PROXY to true in the environment:
Or in config.ru (before the run command), set:
Whilst working on this modification, I also had some problems with the Geminabox name spacing. So I have also refactored the name spacing. The main consequence of this is that the main Sinatra app is now Geminabox::Server.
I've also moved some gem loading from the gemspec to Gemfile. Specifically gems that are only needed in the test environment. This should reduce the geminabox gem's dependencies.
Unless anyone has any problems with this new version, I'll merge this and push a new gem in a few days time.