Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing data in production #606

Conversation

shaiguitar
Copy link

The Download.counts_by_day_for_version_in_date_range method looks in redis to see if the download history is there (https://github.com/rubygems/rubygems.org/blob/master/app/models/download.rb#L79-L107 is the method in question at time of this writing).

The test is putting data in redis and confirming the count returned by that method is correct. The test and method do work, so all is well. However, failing redis data, it falls back to searching VersionHistory records. Unfortunately, it looks like there are missing VersionHistory records in rubygems.org's production data.

Example: rails-2.3.5.

Looking at https://rubygems.org/gems/rails/versions/2.3.5 it says the total download count for that version is aprox ~1M downloads. However, the api is returning a download count of 33k. Pretty weak comparison. The redis data is probably missing old data, and there no VersionHistory records in the database dump either. The dump is attached to this post on this branch.

Console with the dump:

1.9.3p0 :005 > VersionHistory.count
 => 350364
1.9.3p0 :004 > VersionHistory.last.day
 => Tue, 07 Aug 2012

But none for rails:

1.9.3p0 :013 > version.id
 => 113389
1.9.3p0 :014 > version.rubygem.name
 => "rails"
1.9.3p0 :015 > version.number
 => "2.3.5"
1.9.3p0 :016 > VersionHistory.where(:version_id => v.id)
 => []

I don't have the redis dump, but given the data coming back from that method, it seems redis has missing data as well.

It looks like redis does have some download info on this gem, but only from march this year. The data coming back from the api (should be a sum of 1M but is only ~33k) returns 0's all the way from 26th of November until the 8th of March 4 years later.

1.9.3p0 :017 > Gems.downloads('rails','2.3.5',5.year.ago, Time.now)

# data returned starts way after the gem was actually built
# ["2009-11-26", 0]...["2013-03-07", 0], ["2013-03-08", 178]

Some more inconsistencies: https://gist.github.com/shaiguitar/d2af997b7f58e24fd305

shaiguitar referenced this pull request in shaiguitar/gem_velocity Oct 21, 2013
@evanphx
Copy link
Member

evanphx commented Oct 22, 2013

Do not merge this. It contains a giant dump file.

@shaiguitar
Copy link
Author

Since this is not a PR but rather an issue, moving the discussion to #616 . Sorry for the confusion!

@shaiguitar shaiguitar closed this Oct 22, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants