Fix missing dependencies (as seen from the api) #18

elskwid · 2012-11-24T10:08:17Z

Issue #17 describes a problem with gem
dependencies missing when viewed through the api.
These same dependencies are visible from the gem
and from the rubygems.org dependency list.

The fix is actually quite simple: the version_id
was not being captured in the case where new
versions are inserted for gems. As dependencies
are associated with both the gem and the version
(associations set up in the Rails app) the
API returns no dependencies when the version is
missing.

Also added the built_at attribute to be the same
as that used in rubygems.org - it was causing
postgres constraint violation in development.

(Fixes #17) Thanks to @phlipper for working with
me all day on this and to @rsutphin for doing a nice
job with the initial report.

Issue rubygems#17 describes a problem with gem dependencies missing when viewed through the api. These same dependencies are visible from the gem and from the rubygems.org dependency list. The fix is actually quite simple: the version_id was not being captured in the case where new versions are inserted for gems. As dependencies are associated with both the gem and the version (associations set up in the Rails app) the API returns no dependencies when the version is missing. Also added the built_at attribute to be the same as that used in rubygems.org - it was causing postgres constraint violation in development.

phlipper · 2012-11-24T10:21:49Z

-approved

indirect · 2012-11-24T17:43:23Z

Wow, very nice, guys. Thanks!

elskwid · 2012-11-24T17:46:08Z

@indirect, I'm currently putting together some tests around this. A little challenging because the specs are really wound up with the jobs.

This shores up the dependency fixes with specs to test expectations for existing behavior and new behavior. Verify that dependencies are _not_ created when a gem is added before its dependents. (Existing behavior) Verify that dependencies _are_ created when a gem is added after its dependents. (New behavior) Extras: * Update GemspecHelper#generate_gemspec to interpolate the name in more fields so we can create gems other than 'foo'. * Update GemBuilder#create_version to add built_at so we don't run foul of database constraints.

elskwid · 2012-11-24T19:40:36Z

And the specs are IN!

phlipper · 2012-11-24T19:40:38Z

👍 on the tests

elskwid · 2012-11-24T20:49:59Z

Something to keep in mind: this is a database issue so there will need to be a way to go back and update existing gems that currently have missing dependencies. (I'd be happy to help with that.)

supernullset · 2012-11-24T21:49:19Z

Nice work Gentlemen! ✨

indirect · 2012-11-24T22:23:54Z

Maybe we could write a job or migration to backfill the missing dependencies in production?

On Nov 24, 2012, at 12:50 PM, Don Morrison notifications@github.com wrote:

Something to keep in mind: this is a database issue so there will need to be a way to go back and update existing gems that currently have missing dependencies. (I'd be happy to help with that.)

�
Reply to this email directly or view it on GitHub.

Fix missing dependencies (as seen from the api)

hone · 2012-11-24T22:27:06Z

Thanks guys for catching the bug I introduced.

That first spec still seems like broken functionality in the sync code.

elskwid · 2012-11-25T23:15:59Z

@hone, if you're referring to the fact that dependencies will get skipped if the gems don't already exist, then yes. It seems like a bug but I wanted to catch the existing behavior.

Have you been able to deploy the fix to production? And if so, were you able to write up something to catch the database up? If not, I would like to take a stab at it. I just don't want to duplicate effort.

This is keeping people from being able to do their work. Saw someone say something in #bundler and my guess is we'll see more and more as the days go by.

indirect · 2012-11-26T00:02:17Z

As a temporary workaround, anyone can run "bundle install --full-index" in order to not use the Bundler API while it's acting up.

On Nov 25, 2012, at 3:16 PM, Don Morrison notifications@github.com wrote:

@hone, if you're referring to the fact that dependencies will get skipped if the gems don't already exist, then yes. It seems like a bug but I wanted to catch the existing behavior.

Have you been able to deploy the fix to production? And if so, were you able to write up something to catch the database up? If not, I would like to take a stab at it. I just don't want to duplicate effort.

This is keeping people from being able to do their work. Saw someone say something in #bundler and my guess is we'll see more and more as the days go by.

—
Reply to this email directly or view it on GitHub.

elskwid · 2012-11-26T00:11:30Z

@indirect, ah, good to know, that helps me in the short-term. I wonder how many know about that.

After digging through these issues I am keen on helping with this project. It is a hugely important piece of infrastructure and I have some ideas on how best to help. Is there a way I can dive in and help get the index updated?

indirect · 2012-11-26T00:49:11Z

Hopefully anyone who runs "bundle help install", but I'm not sure how many people that is. :)

I know that @hone has been working on the index stuff, so I'm going to ask him to chime in here… what would you suggest?

On Nov 25, 2012, at 4:11 PM, Don Morrison notifications@github.com wrote:

@indirect, ah, good to know, that helps me in the short-term. I wonder how many know about that.

After digging through these issues I am keen on helping with this project. It is a hugely important piece of infrastructure and I have some ideas on how best to help. Is there a way I can dive in and help get the index updated?

—
Reply to this email directly or view it on GitHub.

elskwid · 2012-11-26T01:19:01Z

@indirect, most of what I would like to do is optimize this project for use by bundler (and other applications that use the api to get dependency information). This probably isn't the best place for the comments, but you asked. :-) Also, I should mention, that I totally understand the circumstances under which this project was written - the time constraints and application considerations.

Most of my thoughts are around the complications of keeping this db in sync with rubygems.org and on responding to request quickly.

Sync
There are two times when gems should be updated in this application. On push and on yank. Otherwise there isn't a way to update a gem. The history just sits there, the same as it ever was. This feels essentially like cache busting and this site feels like a glorified cache.

The api is just polling for changes by checking when the spec index was last modified. Then comparing that to the local data. I know it's a future scenario but this looks like a good place for a hook. Just shoot off a message to the api site(s) and they can get what they need. Again, that's future but it's one we can enable right now with access to the rg codebase and this project.

In the absence of hooks we can optimize the local data for resolving that changed status so the sync can be faster and lightweight.

Data
The database on rg is optimized for inserting the data, not for reporting it. Classic transactional db vs. reporting problem.

Requests
The current method of quick queries against the database cuts down on the AR overhead, as it isn't present, but it still means that the database is queried for each gem that bundler cares about and the majority of this information hasn't changed in a long time.

I plan to spike on alternative ways to structure the data here that would make it easier to keep updated and faster to respond to requests. When looking at what the api serves up - it is the gem name, version and a list of dependencies. That's it. And all of that is contained in the gem spec. Feels like the database is overkill. We need a way to index them but couldn't we just store the data as it needs to be returned?

If I had to choose what to start with:

Use a queue/worker lib like celluloid to shore up the sync process
Modify the data to keep it closer to the desired result
Keep a good sync index so it can be faster to sync
Look into a hook/event based update approach for keeping the gems updated

So, yeah, braindump. I am very serious about helping though. Thanks for listening.

hone · 2012-11-26T04:38:00Z

You should check out the youtube video where we talk about some of this stuff. It'd be great to have help. As for 4, it's something we've talked about doing and is on the roadmap. The polling was just the easiest thing to get working off the ground since it was the least invasive to rubygems. When we originally built this, I didn't even know if it would go into production. I'm not sure what you mean by #1? The system is using a threaded consumer pool and a queue. Potentially, I might want to pull the resque 2 code for worker management in the future whenever we finish that. For #2, we're playing around with using flatfiles potentially on S3. @dpiddy can probably say more about that. The syncs are actually relatively fast once it's warmed up. I've been meaning to document most of the architecture but haven't gotten around to it.

I've deployed the fix to prevent further gems from being messed up. I'm still working on the rake task and will get something out by tomorrow to fix the existing data.

elskwid · 2012-11-26T06:52:49Z

@hone, thanks for the response. Again, I totally understand the situation you were in when you wrote it. Nothing I am saying is an attack on you or the decisions you have made and I am not suggesting anything I have here is unique or magical. It's working, right? So, seriously, great job! I have watched the video I was mainly just responding to @indirect since he asked what I would suggest.

For 1. I'm a huge fan of leveraging other libraries to do the pool and queue work for me - like girl_friday or celluloid. I know it's a matter of personal preference but I don't enjoy rolling those on my own. That's all.

Yeah 2, is just my gut reaction to seeing you keep a database that is the same as the production database to serve a different purpose. @evanphx mentioned spiking on some memcache work and the flat files on s3 could work too.

In the end, I'd love to be involved, I have a growing understanding of the problem, and a desire to contribute so direct me where I can be the most help and I'll jump in. This is such a new project that it seems like a good time to question things and discuss them.

hone · 2012-11-26T17:14:25Z

@elskwid oh I didn't take it that way, sorry if my response came off that way. I just wanted to show the current state of where we stood on all the points. I'd love your help to make this better. The new data model is interesting. Let's chat about it off this issue? ping me on #bundler in freenode.

hone added a commit that referenced this pull request Nov 24, 2012

Merge pull request #18 from elskwid/fix-dependencies

63dcb43

Fix missing dependencies (as seen from the api)

hone merged commit 6adc0b6 into rubygems:master Nov 24, 2012

indirect mentioned this pull request Nov 26, 2012

bundler incorrectly misses dependency rubygems/bundler#2185

Closed

TigerWolf mentioned this pull request Nov 26, 2012

composite_primary_keys gem v 5.0.10 missing dependencies in dependencies API #17

Closed

TigerWolf mentioned this pull request Nov 27, 2012

Gem complaining about dependency seattlerb/ruby_parser#88

Closed

dkubb mentioned this pull request Nov 29, 2012

Adamantium dep not being picked up by rubygems dkubb/equalizer#4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix missing dependencies (as seen from the api) #18

Fix missing dependencies (as seen from the api) #18

elskwid commented Nov 24, 2012

phlipper commented Nov 24, 2012

indirect commented Nov 24, 2012

elskwid commented Nov 24, 2012

elskwid commented Nov 24, 2012

phlipper commented Nov 24, 2012

elskwid commented Nov 24, 2012

supernullset commented Nov 24, 2012

indirect commented Nov 24, 2012

hone commented Nov 24, 2012

elskwid commented Nov 25, 2012

indirect commented Nov 26, 2012

elskwid commented Nov 26, 2012

indirect commented Nov 26, 2012

elskwid commented Nov 26, 2012

hone commented Nov 26, 2012

elskwid commented Nov 26, 2012

hone commented Nov 26, 2012

Fix missing dependencies (as seen from the api) #18

Fix missing dependencies (as seen from the api) #18

Conversation

elskwid commented Nov 24, 2012

phlipper commented Nov 24, 2012

indirect commented Nov 24, 2012

elskwid commented Nov 24, 2012

elskwid commented Nov 24, 2012

phlipper commented Nov 24, 2012

elskwid commented Nov 24, 2012

supernullset commented Nov 24, 2012

indirect commented Nov 24, 2012

hone commented Nov 24, 2012

elskwid commented Nov 25, 2012

indirect commented Nov 26, 2012

elskwid commented Nov 26, 2012

indirect commented Nov 26, 2012

elskwid commented Nov 26, 2012

hone commented Nov 26, 2012

elskwid commented Nov 26, 2012

hone commented Nov 26, 2012