Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Jsonnet language #2653

Closed
wants to merge 1 commit into from
Closed

Conversation

davidzchen
Copy link

Jsonnet is a functional, formally-verified configuration generation language for JSON.

Examples in the wild: https://github.com/search?utf8=%E2%9C%93&q=extension%3Ajsonnet+NOT+nothack&type=Code&ref=searchresults

Documentation: https://google.github.io/jsonnet/doc/

Corresponding issue: google/jsonnet#43

@pchaigno
Copy link
Contributor

I'm not from GitHub staff but I don't think there are enough files to add Jsonnet as a language. As per the contribution guidelines:

In most cases we prefer that each new file extension be in use in hundreds of repositories before supporting them in Linguist.

I only count 12 repositories across 10 users in your search example ;)

@davidzchen
Copy link
Author

Can someone from GitHub staff clarify on the requirements for adding a language?

On #2348, @bkeepers said the following:

We normally require hundreds of samples before adding support for an extension

which seems to imply that the requirements is having hundreds of files for a given language.

One of the accepted languages, JSONLD (added in #957), appears to have over 15k files, but I only count ~18 repositories with a few (~3) of the repositories containing the bulk of .jsonld files: https://github.com/search?p=1&q=extension%3Ajsonld+NOT+nothack&ref=searchresults&type=Code&utf8=%E2%9C%93

@larsbrinkhoff
Copy link
Contributor

@davidzchen Note that there's a difference between adding a new language and addning a new extension to an already supported language. I believe @bkeepers was referring to the latter.

Also, the requirements have changed slightly over time, so the quote may be outdated.

@davidzchen
Copy link
Author

@larsbrinkhoff That is a fair point. It definitely makes sense to have different requirements for adding an extension and adding a language since the implications on the classifier and computation resources will be different.

In any case, it would be good to get some clarification from GitHub Staff about the requirements for both cases. There has been some discussion on this in #2657 as well, but as pointed out on both issues, having hundreds of repositories does not always seem to be a hard requirement, and other factors, such as the unlikelihood of collisions for a given file extension, are also considered. In this case, .jsonnet is a very unambiguous file extension.

I will leave it to GitHub Staff to make the final decision on whether the Jsonnet language can be added at this point or whether we should wait until there are more files in more repositories.

@sparkprime
Copy link
Contributor

Ping!

It'd be good to know exactly how much adoption you're looking for. Note that the jsonnet repo has > 400 stars.

@davidzchen
Copy link
Author

Rebased and resolved merge conflicts.

@bkeepers
Copy link
Contributor

bkeepers commented Oct 7, 2015

Thanks for the pull request, @davidzchen! My preference would be to wait until there is a little more usage in the wild. There are a lot of samples, but they are in just a handful of repositories. Let's revisit in 3 months and see what it looks like.

@bkeepers bkeepers closed this Oct 7, 2015
@davidzchen
Copy link
Author

Thanks for your feedback, @bkeepers. Sounds good, let's revisit this in 3 months.

@davidzchen
Copy link
Author

@bkeepers It has been 3 months since this PR was first reviewed. Do you think there is now sufficient usage to allow adding Jsonnet to Linguist?

@bkeepers
Copy link
Contributor

It looks like there is a little more usage, but it still only in tens of repositories instead of hundreds. My preference would be to wait longer, but I'll defer to @arfon and others that have been more active in linguist recently.

@pchaigno
Copy link
Contributor

I only count 12 repositories across 10 users in your search example ;)

It's now 16 repositories among 14 users.

@davidzchen
Copy link
Author

@bkeepers Thanks for the feedback.

@pchaigno Is there an easy way to get these numbers? The Repositories and Users numbers on the sidebar on the search results page don't seem very useful.

@pchaigno
Copy link
Contributor

Is there an easy way to get these numbers?

Not that I know of. I use a script which iterates on all pages for Code results... :/

@sparkprime
Copy link
Contributor

Can you share the script? :)

@bkeepers
Copy link
Contributor

Can you share the script? :)

Feel free to submit a PR to add it to scripts/

@pchaigno pchaigno mentioned this pull request Jul 31, 2016
@devth
Copy link

devth commented Jan 5, 2017

Bump. There are tons of results for extension:jsonnet now.

@sparkprime
Copy link
Contributor

These days you'd have to also count the .libsonnet files although they're probably in the minority.

@pchaigno
Copy link
Contributor

pchaigno commented Jan 8, 2017

I counted 46 repositories from 39 users for .jsonnet and 3 repositories from 3 users for .libsonnet.

@benley
Copy link

benley commented Jan 8, 2017

For what it's worth, I suspect that several Github Enterprise users would benefit from this support. Jsonnet has had a fair bit of uptake as a config language within various companies, where the results don't generally show up in public repositories.

@kgraney
Copy link

kgraney commented Feb 1, 2017

I'm a fan of this change as well. Are private and enterprise repositories being included in the counts? Jsonnet is actually quite nice for stuff like config files, which are perhaps less likely to be open sourced than in a private repo.

@sparkprime
Copy link
Contributor

.libsonnet and .jsonnet are together now > 1000 results

https://github.com/search?utf8=%E2%9C%93&q=extension%3Alibsonnet+NOT+djhfjdhfdhfd&type=Code
https://github.com/search?p=97&q=extension%3Ajsonnet+NOT+djhfjdhfdhfd&type=Code&utf8=%E2%9C%93

If this is sufficient usage, we'll refresh the PR

@hausdorff
Copy link

@bkeepers Do you have any specific guidance here? If the answer is still no, then it would be nice to have a clear usage goal so that we know when we can bother you again. :)

@lildude
Copy link
Member

lildude commented Aug 18, 2017

@bkeepers Do you have any specific guidance here? If the answer is still no, then it would be nice to have a clear usage goal so that we know when we can bother you again. :)

The same thing applies as before: we need to see large scale in-the-wild usage with the CONTRIBUTING.md suggesting "[i]n most cases we prefer that extensions be in use in hundreds of repositories before supporting them in Linguist."

Using a very crude script (I'll tidy it up and add it to this repo at some point) to search using the API...

... a search for the libsonnet extension found...

  • 149 files,
  • in 22 unique public repos,
  • spread across 18 different owners.

... and for the jsonnet extension, found...

  • 1012 files,
  • in 79 unique public repos,
  • spread across 65 different owners.

As you can see, this is still not popular enough.

@lildude
Copy link
Member

lildude commented Aug 18, 2017

Ooops, found a bug in my crude script which meant the jsonnet tally was wrong. I've updated the note above with the correct results. Sadly, still not popular enough 😢

@hausdorff
Copy link

@lildude that's good info, but what I'm really wondering is when we should come back. It sounds like you're saying to come back when there are 200 unique repositories?

@lildude
Copy link
Member

lildude commented Aug 19, 2017

We can't really put it down to precise figures. It's more by general feel based on a combination of number of repos and uniqueness of the files, spread across the users. 200 copies (as opposed to forks) of the same repo is easy to achieve with 5 people. It doesn't make it wide spread usage.

It's also quite common to see a lot of forks of the same repo for a new language with little variation in each fork, which is especially common in education environments. Yes, the number of repos is high, but the variation and real world usage isn't.

I'd love to be able to precisely quantify this as we could easily write a test for it, but we can't right now due to limitations of the API.

@sparkprime
Copy link
Contributor

@lildude Did you ever check in your script? I'd like to see where we are now (6 months later). Thanks!

@lildude
Copy link
Member

lildude commented Feb 15, 2018

@sparkprime Nope, and I can't as it triggers GitHub's abuse controls unless I use a whitelisted token, which I obviously can't share, and I don't feel comfortable sharing an abusive script 😄.

@sparkprime
Copy link
Contributor

Any chance you can abuse Github for me? :)

@lildude
Copy link
Member

lildude commented Feb 15, 2018

Sure.

.jsonnet:

Total files found: 2934
Unique public user/repos: 74
Unique owners: 63

.libsonnet:

Total files found: 418
Unique public user/repos: 66
Unique owners: 48

The API only returns 1000 results so a few repos can dramatically affect the total number of unique repos and users by increasing the number of files they have with that extension. The totals should also be taken with a pinch of salt as it'll include files that clearly aren't the desired language but have the same extension.

@sparkprime
Copy link
Contributor

Thanks!

I'm surprised that the number of repos has gone down from 79 to 74. Are you doing more de-duping of repos now than last time you ran this?

@lildude
Copy link
Member

lildude commented Feb 15, 2018

Are you doing more de-duping of repos now than last time you ran this?

Nope. I noticed this too, hence I added the extra qualifying paragraph. In short, more active users with more files can push less active user/repos out of the first 1000 search results and that is probably what has happened here.

@sparkprime
Copy link
Contributor

Ah, sorry I didn't get that the first time around :)

I just did a bunch of searches using the API that split the space into portions < 1000 files, e.g. by doing one with "NOT params" and one with "params". This was fairly arduous and non-automatic but it got me the following stats:

Number of repos: 159
Number of unique repo owners: 120

See you in another 6 months :)

@tvi
Copy link

tvi commented Aug 15, 2018

Can we get this now? 😄

@benley
Copy link

benley commented Oct 3, 2018

Another 6 months has passed. Can we reopen this yet?

@pchaigno
Copy link
Contributor

pchaigno commented Oct 3, 2018

I've added the .jsonnet extension to #4219. We're still not at hundreds of repositories (93). I'll keep tracking usage every few months.

@dancompton
Copy link

dancompton commented Jan 19, 2019

Bump popularity is increasing @pchaigno please add .libsonnet as well and add it to the jsonnet count.

@pchaigno
Copy link
Contributor

pchaigno commented Mar 4, 2019

The .jsonnet file extension is now popular enough for inclusion (see #4219). @davidzchen Do you want me to reopen this pull request or would you prefer to resubmit as a new pull request?

@sparkprime
Copy link
Contributor

It will need updating I think as there is new syntax over the last 5 years. Also the libsonnet prefix also should be included as probably more files are written in that. I'm not sure it would have changed the number of active repos much though, since if a repo has a libsonnet file it probably also has a jsonnet file in it as well. Think of it as like .h and .cc.

@sparkprime
Copy link
Contributor

I made a new PR: #4455

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet