Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea of missing events ratio? #113

Closed
redox opened this issue Aug 10, 2015 · 4 comments
Closed

Idea of missing events ratio? #113

redox opened this issue Aug 10, 2015 · 4 comments

Comments

@redox
Copy link

redox commented Aug 10, 2015

Hey @igrigorik,

do you have an idea about the number of events the crawler is missing (because of https://github.com/igrigorik/githubarchive.org/blob/master/crawler/crawler.rb#L77 and of the polling mechanism)? I was pretty surprised not to find some new projects having a pretty high (several dozens events per day those last days) public activity.

Thank you for that project, very useful 👍

@igrigorik
Copy link
Owner

Looking at the logs.. It does happen, and when it does it appears to happen closely together (e.g. 3-4 times) due to some burst of activity. However, it's not that frequent.. a few times per week.

We can and should fix that logic though: when we detect that all events are new we can paginate back until we find the overlap with previously seen events.

@redox
Copy link
Author

redox commented Aug 13, 2015

OK, that's weird; I was speaking about algolia/instantsearch.js (56 commits, lots of discussion on some issues) and still: SELECT * FROM [githubarchive:github.timeline] WHERE repository_name = 'instantsearch.js' doesn't retrieve anything.

If it's only 3-4 times a week, I don't think it's the root cause then. Do you know if their API could miss some public events?

@igrigorik
Copy link
Owner

Err, recent activity? Note that we're no longer logging to github.timeline post Jan 1, 2015. See: https://www.githubarchive.org/#bigquery

@redox
Copy link
Author

redox commented Aug 14, 2015

Oh alright, yes recent activity. That was my issue :) Thank you for clarifying things!

@redox redox closed this as completed Aug 14, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants