Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub storage fails when content is >1MB #88

Open
parkr opened this issue Nov 13, 2018 · 9 comments
Open

GitHub storage fails when content is >1MB #88

parkr opened this issue Nov 13, 2018 · 9 comments

Comments

@parkr
Copy link
Contributor

parkr commented Nov 13, 2018

The GitHub Contents API has a limit which causes file fetch, creation, and updates to fail if the content is > 1 MB in size. I have been hitting this recently:

2018/11/12 22:00:06 github: creating updates/1542060006221540759-check.json on branch 'master'
2018/11/12 22:00:07 GET https://api.github.com/repos/parkr/status/contents/updates/index.json?ref=heads%2Fmaster: 403 This API returns blobs up to 1 MB
+in size. The requested blob is too large to fetch via the API, but you can use the Git Data API to request blobs up to 100 MB in size. [{Resource:Blob
+Field:data Code:too_large Message:}]

The Git Data API requires creating the blob, tree, and commit objects manually, but provides a much more robust means of dealing with larger data. We should migrate the GitHub notifier to use this method instead.

Opening this in case someone else has time to do this before I do.

@parkr parkr changed the title GitHub notifier fails when content is >1MB GitHub storage fails when content is >1MB Nov 13, 2018
@parkr
Copy link
Contributor Author

parkr commented Nov 14, 2018

Seeing if the Google folks would be interested in streamlining this on their side. It's presently a significant amount of code to use the Git API! google/go-github#1052

@DanielRuf
Copy link
Contributor

Any updates on this? Is this still relevant?

@parkr
Copy link
Contributor Author

parkr commented Apr 25, 2020

I still hit this occasionally, and have to remove the index to fix.

@DanielRuf
Copy link
Contributor

Do you know if this will happen for 3 services with a simple http check for every 10 minutes? Or can you say in which cases this could occur (like x watched services / URLs with an interval of x time)? Not that I hit this every time.

@parkr
Copy link
Contributor Author

parkr commented Apr 25, 2020

@DanielRuf This will eventually occur for all configurations using the GitHub storage backend, but doing fewer checks with fewer services will increase the lead time. I have 11 services checked every 30 minutes and it occurs every few months for me.

One automated way to fix this in the GitHub storage backend is to take the bytesize of the index, and trim it down to < 1MB on every attempt to save.

@DanielRuf
Copy link
Contributor

Sounds like we need some data rentention feature here (remove old entries / files).

@parkr
Copy link
Contributor Author

parkr commented Jun 26, 2020

This appears to only occur when the index itself gets too large, so we could certainly prune old entries in the index such that the serialized JSON is always <1MB.

The status page that I have only ever shows 24 hours, so we could also limit the index to the last 24 hours.

@titpetric
Copy link
Contributor

titpetric commented Jul 20, 2020

@parkr what are your expiry settings for the GH storage? The index should be cleaned up on maintain calls, depending on what you have in check_expiry. If this is zero/unset, your index won't get cleaned up, even if it does grow to be larger than GH limits.

@titpetric
Copy link
Contributor

That being said - it's a work around. Regardless of what you set in check_expiry, the index must be limited to ~1MB by size, which does mean that it would drop/delete data after some time. Are you storing your checks (historically) forever, or do you delete those too when you recreate the index?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants