Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Close Events on Some Issues #189

Open
sehtia opened this issue Jun 1, 2018 · 6 comments
Open

Missing Close Events on Some Issues #189

sehtia opened this issue Jun 1, 2018 · 6 comments

Comments

@sehtia
Copy link

sehtia commented Jun 1, 2018

Hello!
I'm attempting to understand why some issues show up with an 'open' state whereas they are closed on GitHub. For example, in the image below.
Like this
As far as I understand, it seems that when certain issues are closed by a PullRequest merge using the keywords in the body, the issue is closed on the GitHub side, but no 'close' IssueEvent is created so to GH Archive that issue remains in its open state. However, there does seem to be a close record for some issues that were closed by a PullRequest, as shown below,
like this so that wouldn't entirely make sense. Would you be able to shed any light on this as I'm quite confused? Thank You!

@igrigorik
Copy link
Owner

Thanks for the detailed detective work!

Based on the fact that there are some issues that are being closed, I'm inclined to think that perhaps we may be missing/dropping some activities through our polling mechanism. That said.. @annafil are you aware of any gotchas with close events, or can you think of any other reasons?

@annafil
Copy link
Collaborator

annafil commented Jun 4, 2018

@sehtia Thanks for the sleuthing you've done on this so far!

Looking at the API output for this particular issue: https://api.github.com/repos/GoogleContainerTools/jib/issues/268/events there seems to be a closed event that should have registered. That lends support to @igrigorik's theory that this might be due to dropping events because of polling issues.

Depending on what you're using the data for, and how much you need, there may be a workaround. Even if GHArchive is dropping some events from the /events API endpoint, the API actually provides you all the events for a given issue in a given repo if you fetch directly (like in the above example link). If you know the project, or set of projects you are looking for, you can manually go through the issues for a project via the API and grab all associated issue events (including some that are not sent by default via /events, like 'assigned' and 'labelled'). There is no limit on the history of those events, unlike the /events endpoint which only goes back 300 events per repo. If you need to do this for a large number of projects you'll probably run into rate limit issues, but you can use the built support mechanisms to help you throttle your requests to keep up with the rate limit.

@sehtia
Copy link
Author

sehtia commented Jun 4, 2018

@igrigorik @annafil
No problem and thanks for getting back to me promptly!

Ah yes, that makes a lot more sense. @annafil My goal was to get all the currently open issues for a specific set of repositories I was interested in for tracking/analysis purposes. I'm now attempting to solve this goal by mostly following your advice, specifically by making a connection to the specific repo API (api.github.com/repos/user/repoName/issues&status=open) and working from there. However, I'm open to suggestions if you recommend a different approach for my need and/or guides to throttle the rate limiting as it doesn't seem this will scale for a large number of repos.

Also, @igrigorik is there any documentation where I can read about the polling mechanism/dropped events used in GHArchive to further understand the issue (out of curiosity)?

@annafil
Copy link
Collaborator

annafil commented Jun 4, 2018

@sehtia You can check out https://developer.github.com/v3/#rate-limiting for advice on working with REST API rate limits. If you can say a little more about what you're tracking/analyzing, and perhaps how many repos you estimate to poll, I can point you in a more specific direction :)

@lucianoviola
Copy link

lucianoviola commented May 18, 2021

@annafil @igrigorik I'm also experiencing issues with PRs that are missing events for "action=closed". For example:

In my investigation, I found this to be the case for 2509 PRs. Here are some examples:

But, unlike "issues", I can't get the past events for PRs.

This seems to be a bug that is happening recurrently and to this day. Are there any plans to resolve it? :)

@bored-engineer
Copy link
Contributor

FYI #275 may at least explain why missed events haven't been identified/logged by the crawler, even if it doesn't actually solve the problem

@lcy020625 lcy020625 mentioned this issue Jul 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants