Join GitHub today
Expose full repository languages/file types inventory via GraphQL #2587
Each repository has a full "inventory", which is the result of processing all of its files, determinining their languages, and counting bytes of each language/file type. This is not exposed in GraphQL; only the top language is exposed (via
Also, it should include line counts, not just byte counts.
Details: For every Git tree and blob, there should be a GraphQL field
The most common use case is to get this info for a repository at its HEAD commit, so there should be a
The language determination from the existing
The expected usage patterns are mainly:
https://app.hubspot.com/contacts/2762526/company/464956351 specifically asked for this, and they have that scale. They would like to be able to compute these stats over all repositories to know how their usage of languages is changing over time (eg "are we using more or less of $LANG?").
Based on demos and conversations with the other companies mentioned in the top comment (where the decision maker for deploying Sourcegraph would have seen immediate product value if Sourcegraph were able to give stats about language usage), it would also be an effective element of the initial demo. It would be something that directly gives the decision maker value, instead of just giving value to their team. For example, in the demo, we would ask "Do you even have a way to know which languages are in use?" "No." "Sourcegraph can tell you - here's a Python script [and in the future, a UI screen] that hits our API and tells you. Run it!" "Wow, this is a new capability that lets me plan hiring/training better...and there were these other questions I wanted answered: ..." (which leads to more good high-level product value conversations).
But the constraint is really that this feature shouldn't totally fail at 30k-repo scale, not that it needs to be fast at that scale. It is OK if it takes minutes or hours, and multiple requests, to compute for 30k repos when requested. For example, if it was all computed on-the-fly and had a 60-second request timeout, that impl would very likely not satisfy that customer need.