Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix updated at time to use default branch commits and check monorepo directories #333

Closed

Conversation

alexbrazier
Copy link
Collaborator

@alexbrazier alexbrazier commented Jun 21, 2020

fixes #20

Why

The updated time was including any updates to the repo, such as issues created or commits in other branches. I think users would expect the updated time to either represent the last time a commit was made to the default branch e.g. master or the last publish time.

The updates would always be the same for monorepos, so every time large ones like expo had any updates the first few pages would be expo projects, even though only one of them had been updated.

Changes

  • Uses graphql api which allows us to make a single request to fetch the package.json and other details
  • Fixes updated time so it only checks commits on the default branch
  • Get the correct time for monorepos as it checks for commits within the folder

Checklist

If you added a feature or fixed a bug:

  • Documented in this PR how to use the feature or replicate the bug.
  • Documented in this PR how you fixed or created the feature.

Screenshot showing expo monorepo now has different updated dates that matches the folder location and master branch
image

Thanks to @kelset for providing initial graphql query in #50

- Uses graphql api which allows us to make a single request to fetch the package.json and other details
- Fixes updated time so it only checks commits on the default branch
- Get the correct time for monorepos as it checks for commits within the folder
@@ -1,8 +1,7 @@
{
"build": {
"env": {
"GITHUB_CLIENT_ID": "@github_client_id",
"GITHUB_CLIENT_SECRET": "@github_client_secret"
"GITHUB_TOKEN": "@github_token"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's possible to use a client id and secret with the graphql api so this needs to be updated to a github token.

// const repoRequestCache = {};

export const fetchGithubData = async data => {
const getUpdatedUrl = async url => {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes the graphql API doesn't pick up redirects. This function gets the updated url using the rest API to retry.

hasIssues: json.hasIssuesEnabled,
hasWiki: json.hasWikiEnabled,
hasPages: json.deployments.totalCount > 0,
hasDownloads: true,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find this in graphql, and I'm not really sure what it is meant to represent. I've set it to true to maintain backwards compatibility as every repository listed in the data was true

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it probably refers to the repo having files/installers attached to releases, but not sure tbh.

@@ -45,9 +45,9 @@ export type Library = {
license: {
key: string;
name: string;
spdx_id: string;
spdxId: string;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two fields have been updated by graphql. I couldn't see them being used anywhere but it made more sense to just leave them in camelcase to match the rest of the output rather than having to modify them. I can revert this if it's going to cause a breaking change.

@alexbrazier alexbrazier mentioned this pull request Jun 22, 2020
@kelset
Copy link
Member

kelset commented Jun 22, 2020

hey Alex, thank you so much for picking this back up 👍

@brentvatne I think this new approach is totally worth as the original problem highlighted in my issue seems to still stand.

@brentvatne
Copy link
Contributor

brentvatne commented Jun 23, 2020

i pulled the branch locally and i had trouble fetching data using yarn data:update with the current batch size of 100 - i believe the graphql api has different throttling rules. i reduced the batch size to 50 and that resolved the issue for me.

we should update fetchGithubRateLimit to use the graphql api to determine how many more node requests are available and also compare that with the number of nodes required per repo request * number of repos to bail out of updating data and deploying if we are throttled

aside from resource limit related things, this looks great :) thanks!

@brentvatne
Copy link
Contributor

merged in 4c1b864 - thanks a lot! if you can follow up with a fix for fetchGithubRateLimit and so on as described in previous comment that'd be amazing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pushedAt is not reliable
3 participants