Skip to content
This repository has been archived by the owner on Mar 14, 2018. It is now read-only.

Fix missing dependencies (as seen from the api) #18

Merged
merged 2 commits into from
Nov 24, 2012

Conversation

elskwid
Copy link
Contributor

@elskwid elskwid commented Nov 24, 2012

Issue #17 describes a problem with gem
dependencies missing when viewed through the api.
These same dependencies are visible from the gem
and from the rubygems.org dependency list.

The fix is actually quite simple: the version_id
was not being captured in the case where new
versions are inserted for gems. As dependencies
are associated with both the gem and the version
(associations set up in the Rails app) the
API returns no dependencies when the version is
missing.

Also added the built_at attribute to be the same
as that used in rubygems.org - it was causing
postgres constraint violation in development.

(Fixes #17) Thanks to @phlipper for working with
me all day on this and to @rsutphin for doing a nice
job with the initial report.

Issue rubygems#17 describes a problem with gem 
dependencies missing when viewed through the api.
These same dependencies are visible from the gem
and from the rubygems.org dependency list.

The fix is actually quite simple: the version_id
was not being captured in the case where new
versions are inserted for gems. As dependencies
are associated with both the gem and the version
(associations set up in the Rails app) the 
API returns no dependencies when the version is
missing.

Also added the built_at attribute to be the same
as that used in rubygems.org - it was causing
postgres constraint violation in development.
@phlipper
Copy link

:neckbeard:-approved

@indirect
Copy link
Member

Wow, very nice, guys. Thanks!

@elskwid
Copy link
Contributor Author

elskwid commented Nov 24, 2012

@indirect, I'm currently putting together some tests around this. A little challenging because the specs are really wound up with the jobs.

This shores up the dependency fixes with specs
to test expectations for existing behavior and
new behavior. 

Verify that dependencies are _not_ created when
a gem is added before its dependents. (Existing
behavior)

Verify that dependencies _are_ created when a
gem is added after its dependents. (New behavior)

Extras: 

* Update GemspecHelper#generate_gemspec to 
  interpolate the name in more fields so we can
  create gems other than 'foo'.

* Update GemBuilder#create_version to add built_at
  so we don't run foul of database constraints.
@elskwid
Copy link
Contributor Author

elskwid commented Nov 24, 2012

And the specs are IN!

@phlipper
Copy link

👍 on the tests

@elskwid
Copy link
Contributor Author

elskwid commented Nov 24, 2012

Something to keep in mind: this is a database issue so there will need to be a way to go back and update existing gems that currently have missing dependencies. (I'd be happy to help with that.)

@supernullset
Copy link

Nice work Gentlemen! ✨

@indirect
Copy link
Member

Maybe we could write a job or migration to backfill the missing dependencies in production?

On Nov 24, 2012, at 12:50 PM, Don Morrison notifications@github.com wrote:

Something to keep in mind: this is a database issue so there will need to be a way to go back and update existing gems that currently have missing dependencies. (I'd be happy to help with that.)


Reply to this email directly or view it on GitHub.

hone added a commit that referenced this pull request Nov 24, 2012
Fix missing dependencies (as seen from the api)
@hone hone merged commit 6adc0b6 into rubygems:master Nov 24, 2012
@hone
Copy link
Contributor

hone commented Nov 24, 2012

Thanks guys for catching the bug I introduced.

That first spec still seems like broken functionality in the sync code.

@elskwid
Copy link
Contributor Author

elskwid commented Nov 25, 2012

@hone, if you're referring to the fact that dependencies will get skipped if the gems don't already exist, then yes. It seems like a bug but I wanted to catch the existing behavior.

Have you been able to deploy the fix to production? And if so, were you able to write up something to catch the database up? If not, I would like to take a stab at it. I just don't want to duplicate effort.

This is keeping people from being able to do their work. Saw someone say something in #bundler and my guess is we'll see more and more as the days go by.

@indirect
Copy link
Member

As a temporary workaround, anyone can run "bundle install --full-index" in order to not use the Bundler API while it's acting up.

On Nov 25, 2012, at 3:16 PM, Don Morrison notifications@github.com wrote:

@hone, if you're referring to the fact that dependencies will get skipped if the gems don't already exist, then yes. It seems like a bug but I wanted to catch the existing behavior.

Have you been able to deploy the fix to production? And if so, were you able to write up something to catch the database up? If not, I would like to take a stab at it. I just don't want to duplicate effort.

This is keeping people from being able to do their work. Saw someone say something in #bundler and my guess is we'll see more and more as the days go by.


Reply to this email directly or view it on GitHub.

@elskwid
Copy link
Contributor Author

elskwid commented Nov 26, 2012

@indirect, ah, good to know, that helps me in the short-term. I wonder how many know about that.

After digging through these issues I am keen on helping with this project. It is a hugely important piece of infrastructure and I have some ideas on how best to help. Is there a way I can dive in and help get the index updated?

@indirect
Copy link
Member

Hopefully anyone who runs "bundle help install", but I'm not sure how many people that is. :)

I know that @hone has been working on the index stuff, so I'm going to ask him to chime in here… what would you suggest?

On Nov 25, 2012, at 4:11 PM, Don Morrison notifications@github.com wrote:

@indirect, ah, good to know, that helps me in the short-term. I wonder how many know about that.

After digging through these issues I am keen on helping with this project. It is a hugely important piece of infrastructure and I have some ideas on how best to help. Is there a way I can dive in and help get the index updated?


Reply to this email directly or view it on GitHub.

@elskwid
Copy link
Contributor Author

elskwid commented Nov 26, 2012

@indirect, most of what I would like to do is optimize this project for use by bundler (and other applications that use the api to get dependency information). This probably isn't the best place for the comments, but you asked. :-) Also, I should mention, that I totally understand the circumstances under which this project was written - the time constraints and application considerations.

Most of my thoughts are around the complications of keeping this db in sync with rubygems.org and on responding to request quickly.

Sync
There are two times when gems should be updated in this application. On push and on yank. Otherwise there isn't a way to update a gem. The history just sits there, the same as it ever was. This feels essentially like cache busting and this site feels like a glorified cache.

The api is just polling for changes by checking when the spec index was last modified. Then comparing that to the local data. I know it's a future scenario but this looks like a good place for a hook. Just shoot off a message to the api site(s) and they can get what they need. Again, that's future but it's one we can enable right now with access to the rg codebase and this project.

In the absence of hooks we can optimize the local data for resolving that changed status so the sync can be faster and lightweight.

Data
The database on rg is optimized for inserting the data, not for reporting it. Classic transactional db vs. reporting problem.

Requests
The current method of quick queries against the database cuts down on the AR overhead, as it isn't present, but it still means that the database is queried for each gem that bundler cares about and the majority of this information hasn't changed in a long time.

I plan to spike on alternative ways to structure the data here that would make it easier to keep updated and faster to respond to requests. When looking at what the api serves up - it is the gem name, version and a list of dependencies. That's it. And all of that is contained in the gem spec. Feels like the database is overkill. We need a way to index them but couldn't we just store the data as it needs to be returned?

If I had to choose what to start with:

  1. Use a queue/worker lib like celluloid to shore up the sync process
  2. Modify the data to keep it closer to the desired result
  3. Keep a good sync index so it can be faster to sync
  4. Look into a hook/event based update approach for keeping the gems updated

So, yeah, braindump. I am very serious about helping though. Thanks for listening.

@hone
Copy link
Contributor

hone commented Nov 26, 2012

You should check out the youtube video where we talk about some of this stuff. It'd be great to have help. As for 4, it's something we've talked about doing and is on the roadmap. The polling was just the easiest thing to get working off the ground since it was the least invasive to rubygems. When we originally built this, I didn't even know if it would go into production. I'm not sure what you mean by #1? The system is using a threaded consumer pool and a queue. Potentially, I might want to pull the resque 2 code for worker management in the future whenever we finish that. For #2, we're playing around with using flatfiles potentially on S3. @dpiddy can probably say more about that. The syncs are actually relatively fast once it's warmed up. I've been meaning to document most of the architecture but haven't gotten around to it.

I've deployed the fix to prevent further gems from being messed up. I'm still working on the rake task and will get something out by tomorrow to fix the existing data.

@elskwid
Copy link
Contributor Author

elskwid commented Nov 26, 2012

@hone, thanks for the response. Again, I totally understand the situation you were in when you wrote it. Nothing I am saying is an attack on you or the decisions you have made and I am not suggesting anything I have here is unique or magical. It's working, right? So, seriously, great job! I have watched the video I was mainly just responding to @indirect since he asked what I would suggest.

For 1. I'm a huge fan of leveraging other libraries to do the pool and queue work for me - like girl_friday or celluloid. I know it's a matter of personal preference but I don't enjoy rolling those on my own. That's all.

Yeah 2, is just my gut reaction to seeing you keep a database that is the same as the production database to serve a different purpose. @evanphx mentioned spiking on some memcache work and the flat files on s3 could work too.

In the end, I'd love to be involved, I have a growing understanding of the problem, and a desire to contribute so direct me where I can be the most help and I'll jump in. This is such a new project that it seems like a good time to question things and discuss them.

@hone
Copy link
Contributor

hone commented Nov 26, 2012

@elskwid oh I didn't take it that way, sorry if my response came off that way. I just wanted to show the current state of where we stood on all the points. I'd love your help to make this better. The new data model is interesting. Let's chat about it off this issue? ping me on #bundler in freenode.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

composite_primary_keys gem v 5.0.10 missing dependencies in dependencies API
5 participants