New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
/probot/stats endpoint is not scalable #380
Comments
I was working on an option that uses GraphQL here, but it basically ended up in me recreating a lot of the logic, so I noticed something. If the idea is that we're refreshing the stats on an interval and caching it, I'm not sure this part is right... Lines 18 to 21 in dcd0e32
If I'm understanding it correctly, that means every time the page is loaded, it will wait until another refresh is done. I removed the Also, what if we used the Search API to get the top 10 repos sorted by stars? We'd still need to use |
@tcbyrd Sorry for the delay here.
This logic is confusing, but it's actually only awaits for the stats to load the first time. Once the first load resolves, then
I think that's a great solution! We could even get top ~30 or so. As long as it fits in a single request, it'll be much faster than the current approach.
Why would it need to get the list of repositories? I think it would work to search for the user/org name. For example, @probot is:public will find public repos on the @probot org sorted by stars. |
@bkeepers I thought about that too. It just means we're defining popularity at the Org/User level, whether or not they chose to install the app on all their repositories. This is fine if all we want to know is "Users and Organizations with popular public repos" instead of "Users and Organizations that are using [app] on their public repos". I'm completely fine with the former, since in most cases I think the latter will still be true. If that's the case, then we can definitely axe that call entirely. |
Good point. It would be ideal if the stats only reflected the repositories that the app is installed on, but I don't think it's a big deal either way. The stats endpoint is just for vanity right now, so I think it's more important to make it more scalable than to get precise data. |
That's fair, but I think there'd be a significant difference in the data returned. Electron for example has a ton of repos; would that count as a ton of installations if the app is installed on one of those repos? Or am I misunderstanding? |
@JasonEtco In that scenario, we wouldn't run the |
Ah I see. I understand why, but its a shame we can't collect a number of installations; its something GitHub should have in the UI, because right now the API (and for my lazy self, the |
We can still collect number of installations by paginating
cc @jeffrafter @chobberoni @tarebyte since I briefly discussed the desire for App status with you a few weeks ago |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Could this be reopened and pinned? |
Can I assume this is the reason that the installation counter for delete-merged-branch is stuck for more than a month?
|
The stats APIs for some of my apps have been timing out, so I've replaced the endpoint with this: // set these env vars for the app
// DISABLE_STATS=true
// GITHUB_ACCESS_TOKEN=<token>
robot.router.get("/probot/stats", async function(req, res) {
const octokit = require("@octokit/rest")();
octokit.authenticate({
type: "token",
token: process.env.GITHUB_ACCESS_TOKEN
});
let github = await robot.auth();
const installations = await github.paginate(
github.apps.getInstallations({ per_page: 100 }),
res => res.data
);
// if (installations.length) {
// github = await robot.auth(installations[0].id);
// }
const popular = [];
for (const item of installations) {
const account = item.account;
const { data: repos } = await octokit.repos.getForUser({
username: account.login,
sort: "updated",
per_page: 30
});
if (!repos.length) {
continue;
}
account.stars = repos.reduce((stars, repository) => {
return stars + repository.stargazers_count;
}, 0);
popular.push(account);
}
res.json({
installations: installations.length,
popular: popular
.filter(item => item.stars > 0)
.sort((a, b) => b.stars - a.stars)
.slice(0, 10)
});
});
|
I see that the probot hosted apps may have solved this issue, but can't find any issues/PRs/code that would indicate how. Any insight into how this was solved for the apps hosted by probot on heroku? Planning on turning this into a long-running worker tasks/cron that caches result's for the web worker, was wondering the approach taken for other apps. |
@jakebolam Personally we end up just disabling stats on larger installations. The Marketplace is starting to show the exact same data anyway (number of installations and top customers), so ultimately listing your app on the marketplace gets you the same thing. |
With the change now that you can list apps on the Marketplace as "unverified", I'm going to close this issue. The Marketplace listing is the more production-ready solution for larger apps and also includes more data like unique visitors and conversion rate |
At app startup, the
/probot/stats
endpoint fetches all the repositories for all the installations that the app is installed on. This endpoint is used by probot.github.io to show the number of installations and show some of the more popular orgs using an app.It is frequently timing out on stale, which causes other tests to fail.
The text was updated successfully, but these errors were encountered: