Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data incomplete? #120

Closed
litetex opened this issue Jun 10, 2022 · 6 comments
Closed

Data incomplete? #120

litetex opened this issue Jun 10, 2022 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@litetex
Copy link
Owner

litetex commented Jun 10, 2022

Since ~2022-04-01 GitHub seems to return unreliable data:
grafik

@litetex litetex added the bug Something isn't working label Jun 10, 2022
@litetex
Copy link
Owner Author

litetex commented Jun 10, 2022

lol the api is completely broken:
https://api.github.com/search/repositories?q=stars%3A%3E1000&sort=stars&order=asc is not return the items in ascending order xD

Both ASC and DESC are affected

@litetex litetex self-assigned this Jun 10, 2022
@litetex
Copy link
Owner Author

litetex commented Jun 10, 2022

Bug reported to GitHub...
https://support.github.com/ticket/personal/0/1663259

@litetex
Copy link
Owner Author

litetex commented Jun 18, 2022

Thanks for reaching out to GitHub Support!

I suspect this is one of the cases where making a global search only returns estimates at best. A more accurate result is guaranteed for tightly scoped searches. An example of a tightly scoped search would be:

https://api.github.com/search/repositories?q=org:ORG_NAME&sort=stars&order=desc

I have asked the engineering team to help take a look at this specific report and I'll write to you again when I have any updates to share.

@litetex
Copy link
Owner Author

litetex commented Jul 24, 2022

Added some additional information, received the following feedback:

GitHub commented 2 days ago

Thank you for your patience.

The engineering team has implemented a fix and the issue is now resolved. Please let us know if you have any further concerns.


GitHub commented on 20 Jun

Thanks for the follow-up!

Thanks for sharing that additional information, I was able to confirm from the engineering team that this is a bug and an internal issue has been created to track the report.

I don't have a timeline for when a fix will be implemented but this is now in good hands and would write again to you if I have any news to share.

Let's see if it get's better within the nextdays :)

@litetex
Copy link
Owner Author

litetex commented Jul 24, 2022

The ordering is still broken:

curl -s 'https://api.github.com/search/repositories?q=org:google&sort=stars&order=desc&per_page=100&page=3' | jq '.items[] | "\(.stargazers_count) \(.name)"'
"1443 CausalImpact"
"1441 spatial-media"
"1422 uis-rnn"
"1421 ion"
"1416 cloudprober"
"1402 google-authenticator-libpam"
"1399 cel-spec"
"1389 android-emulator-container-scripts"
"1388 marl"
"1371 flogger"
"1350 shaderc"
"1343 brax"
"1320 pinject"
"1328 haskell-trainings"
"1303 firing-range"
"1299 gofuzz"
"1294 jsonapi"
"1292 boringssl"
"1290 json_serializable.dart"
"1287 highwayhash"
"1294 live-transcribe-speech-engine"
"1272 schism"
"1264 skicka"
"1267 fuzzer-test-suite"
"1269 UIforETW"
"1248 prettytensor"
"1245 crfs"
"1243 cel-go"
"1241 android-emulator-m1-preview"
"1233 sg2im"
"1225 mathfu"
"1206 codeworld"
"1201 perfetto"
"1198 go-jsonnet"
"1178 security-research"
"1169 XNNPACK"
"1170 proto-quic"
"1172 skylark"
"1146 argh"
"1139 glazier"
"1138 blockly-games"
"1133 brain-tokyo-workshop"
"1125 lullaby"
"1127 budou"
"1127 makani"
"1113 mozc"
"1111 site-kit-wp"
"1117 wwwbasic"
"1096 keyczar"
"1107 monster-mash"
"1060 BIG-bench"
"1055 conscrypt"
"1045 mundane"
"1041 woff2"
"1036 mozc-devices"
"1028 google-toolbox-for-mac"
"1021 re2j"
"1004 ssl_logger"
"1002 atheris"
"995 flutter.widgets"
"993 uncertainty-baselines"
"993 jsaction"
"993 vim-codefmt"
"991 adb-sync"
"1071 modernstorage"
"963 badwolf"
"941 double-conversion"
"948 textfsm"
"919 j2cl"
"915 mr4c"
"910 asylo"
"910 active-learning"
"906 rowhammer-test"
"898 clif"
"886 cityhash"
"881 android-arscblamer"
"870 quiver-dart"
"865 inception"
"858 inject.dart"
"854 nerfies"
"852 logger"
"909 intermock"
"833 budoux"
"827 flutter-provide"
"825 gcm"
"868 cmockery"
"820 namebench"
"814 certificate-transparency"
"812 fuzzbench"
"809 rappor"
"802 built_value.dart"
"798 ringdroid"
"797 pik"
"797 vsaq"
"791 android-gradle-dsl"
"793 emoji-scavenger-hunt"
"788 macops"
"781 jax-md"
"773 xls"
"772 tsunami-security-scanner-plugins"

At least following things are incorrectly ordered:

  • Index 64 "1071 modernstorage"
  • Index 47 "1117 wwwbasic"
  • Index 67 "948 textfsm"
  • Index 49 "1107 monster-mash"

The same problem on the global search:

curl -s 'https://api.github.com/search/repositories?q=stars%3A%3E1000&sort=stars&order=asc' | jq '.items[] | "\(.stargazers_count) \(.name)"'
"1001 canvas"
"1006 AI-Programmer"
"1001 bashhub-client"
"1001 GOAD"
"1015 page-cache"
"1001 ImageResizer"
"1001 rust-bert"
"1008 lyra"
"1001 ZeusPlugin"
"1001 minitorch"
"1001 Sparky"
"1001 flutter_wechat_assets_picker"
"1001 awesome-AI-books"
"998 Refactorator"
"1001 computer_expert_paper"
"1001 vim-cpp-enhanced-highlight"
"1005 gopher"
"1028 mysigmail"
"1003 Stereogram.js"
"1001 ABD"
"1001 any-touch"
"1001 JSpider"
"1001 torrent-net"
"1001 eslint-plugin-simple-import-sort"
"1021 pluginbase"
"1001 Absinthe"
"1002 hacker-menu"
"1001 swift-tagged"
"1001 libcs50"
"1003 Gherkin"

It's even worse here: "998 Refactorator" shouldn't even be here - the filtering is now failing!

@litetex
Copy link
Owner Author

litetex commented Aug 1, 2022

Reply from support:

I heard back from engineering to understand the fix implemented and below is the explanation they shared:

"The change made ensures that the search index will be updated every time a repo is starred or unstarred.

For example, the Refactorator repository has 998 stars, but it must have had over 1000 the last time the search index was updated. With the fix, no more inconsistencies of this kind can be introduced. However, existing inconsistencies will remain the same unless some update is made to the repos in question that causes them to be reindexed, such as someone starring or unstarring them.

We don't have a good mechanism for identifying and repairing all of the currently-inconsistent repos automatically, so this is the only approach available to us."

I hope this explanation helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant