Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisiting language groups #4291

Closed
Alhadis opened this issue Oct 9, 2018 · 40 comments · Fixed by #4979
Closed

Revisiting language groups #4291

Alhadis opened this issue Oct 9, 2018 · 40 comments · Fixed by #4979
Assignees

Comments

@Alhadis
Copy link
Collaborator

Alhadis commented Oct 9, 2018

This issue is a continuation of what @pchaigno started with #3093:

This pull request makes SASS a language of its own, distinct from CSS.
This change was requested several times in #2933, #3084, #2585 and #2650.

There are several languages on GitHub which presently fall under the usage statistics of another, "parent" language, which certainly deserve reconsideration — or at the very least, some public discussion for highlighting the reasons why they fall under another language's umbrella.

To start, here are the languages which I believe are valid candidates for degrouping. I'll extend this list over time as discussion from other users confirms other candidates:

Languages which should be degrouped

Candidate Currently grouped under Reason(s) for decoupling
Svelte HTML #4291 (comment)
Sass/SCSS CSS Extremely different syntax and semantics. Sass has programmatic features and some "object-oriented" features; CSS is strictly declarative.
Less CSS See above. Less's syntax is much closer to "pure" CSS than Sass/SCSS, but it's still programmatic in nature and considerably different enough to warrant separation.
JSON Fixed in #4345 JavaScript JSON is a general-purpose data serialisation language, and virtually every modern programming language has support for reading and parsing JSON syntax to some extent (either natively or via a library). It's the closest thing we have to a universally interoperable data exchange format.

Moreover, there's little point in retaining a connection with JavaScript. JSON is classed as a data language, so it won't appear in usage statistics anyway.

I've refrained from bringing up any languages I've never worked with or lack familiarity with (such as PostCSS and Stylus), each of which might be candidates as well. Comments are welcome.

Good examples of language groups

Here are some languages which are justifiable in having a parent language:

  • HTML ← The various HTML+… languages (HTML+Django, HTML+ECR, HTML+ERB, etc)
  • Python ← Python console, Python traceback
  • JSON ← JSON with comments
  • Haskell ← Literate Haskell
  • CoffeeScript ← Literate CoffeeScript
  • Pic ← Roff [1]

/cc @pchaigno, @lildude, @controversial, @nazar-pc, @EmmaRamirez, @plibither8

Footnotes

  1. Regarding an argument @arfon made in Distinguish SASS and CSS #3093:

My rationale for not doing this is that SASS is almost always used to generate CSS (please correct me if I'm wrong here) and so it makes sense (to me at least!) to have this listed under CSS for the general repository stats.

Pic is interesting because it can be compiled to other languages that aren't Roff (like SVG or TeX),
but the language itself is based upon Roff syntax and even permits low-level Roff constructs to be used inline. In other words, it's not so cleanly separated, and demonstrates why transpilation targets are fallacious reasoning w.r.t. whether Sass should be distinguished from CSS or not.

@plibither8
Copy link

Analogous to the SCSS-CSS argument, Pug (Jade) and other templating languages that are radically different from HTML should also be considered, as they currently fall in the HTML group.

@Alhadis
Copy link
Collaborator Author

Alhadis commented Oct 9, 2018

Agreed. Personally, I think most (if not all) templating languages should be decoupled from their target output. There's a reason they're templating languages, after all... and it isn't "just HTML" if I open a Pug template in a browser and see a weird mix of half-empty tags and loops. 😉

I think if a parent language is unambiguous and well-specified (as HTML and CSS are), a child language should be either a subset or a hybrid of different languages. Conversely, Assembly and Shell are umbrella terms of sorts which cover numerous dialects and implementations, so having them as parent languages makes more sense, IMHO.

@stale
Copy link

stale bot commented Nov 8, 2018

This issue has been automatically marked as stale because it has not had activity in a long time. If this issue is still relevant and should remain open, please reply with a short explanation (e.g. "I have checked the code and this issue is still relevant because ___."). Thank you for your contributions.

@stale stale bot added the Stale label Nov 8, 2018
@Alhadis Alhadis removed the Stale label Nov 8, 2018
@pchaigno
Copy link
Contributor

pchaigno commented Dec 5, 2018

I think this is a pretty good list to start with and I doubt we'll be able to have a comprehensive list (we don't know and use all languages Linguist supports ourselves). To move forward, should we agree on a short guideline to decide if languages should be grouped in the future (so that we can better handle future cases we missed here)?

I think that comment by @Alhadis is a pretty good starting point:

I think if a parent language is unambiguous and well-specified (as HTML and CSS are), a child language should be either a subset or a hybrid of different languages. Conversely, Assembly and Shell are umbrella terms of sorts which cover numerous dialects and implementations, so having them as parent languages makes more sense, IMHO.

@lildude What's your opinion on this?

@Alhadis
Copy link
Collaborator Author

Alhadis commented Dec 5, 2018

I've pulled JSX from the list. For a start, it's not as clear-cut as TypeScript is (Flow typing and JSX tags both fall under the umbrella of "JSX", more or less). Plus the distinction itself is problematic for reasons I've explained here.

Alhadis added a commit that referenced this issue Dec 6, 2018
lildude pushed a commit that referenced this issue Dec 10, 2018
* Sever relationships between CSON/JSON and parents

Fixes: #4344

References: #4291

* Fix missing CodeMirror mode for JSONLD files
@lildude
Copy link
Member

lildude commented Dec 14, 2018

Whoops, lost this in my inbox at some point and was just reminded by @Alhadis in #4353.

I think that comment by @Alhadis is a pretty good starting point:

I think if a parent language is unambiguous and well-specified (as HTML and CSS are), a child language should be either a subset or a hybrid of different languages. Conversely, Assembly and Shell are umbrella terms of sorts which cover numerous dialects and implementations, so having them as parent languages makes more sense, IMHO.

@lildude What's your opinion on this?

Seems reasonable to me.

Languages which should be degrouped

... as does this. Do it.

@Alhadis
Copy link
Collaborator Author

Alhadis commented Dec 14, 2018

... as does this. Do it.

I'm gonna enjoy this...

@Alhadis
Copy link
Collaborator Author

Alhadis commented Dec 14, 2018

Just an FYI: this might take a while because of conflicting colour proximities. 😅

@lildude
Copy link
Member

lildude commented Dec 14, 2018

Just an FYI: this might take a while because of conflicting colour proximities.

We might be able to get rid of that soon 🤞 I had a chat with a colleague and your suggestion at #4331 (comment) may become a thing 🔜.

@Alhadis
Copy link
Collaborator Author

Alhadis commented Dec 14, 2018

Holy shit. 😮 🎉 🎉 ❤️

@Alhadis
Copy link
Collaborator Author

Alhadis commented Dec 14, 2018

Guys, I've pushed a WIP branch for the degrouped languages I'm familiar with, but I'll hold off from submitting a PR until some time has elapsed (or until the potential changes have been reified).

In the meantime, feel free to push any changes you think are missing or necessary. 👍

Alhadis added a commit that referenced this issue Dec 14, 2018
@Alhadis
Copy link
Collaborator Author

Alhadis commented Dec 14, 2018

... of course, when pushing topic branches, it'd help if I actually had commits to go with them.

Remind me not to leave changes staged for several hours, because my crap memory will have me believing they've already been committed. 😁

@Alhadis
Copy link
Collaborator Author

Alhadis commented Jan 8, 2019

@lildude I realised another reason why the colour-proximity thing is strangling us — Language authors gravitate toward vibrant colours when deciding their project's logo/branding/colour-scheme. So over time, more and more languages will be added to Linguist with clashing colours: bright red, dark blue, bright blue, purple, warm yellow, etc.

So the remaining "available colours" we can assign them will inevitably be sickly shades of pale green, washed out red, white-ish pink, etc. The current constellation of colour choices is already proving this: when adding Asymptote, I noticed its official colour was #FF0000 (bright-red). That clashed with PostScript, Mercury, Red (the language, lol), Ruby, and several others which were likely "pushed" away from their official colours shades due to the colour-proximity requirements.

Having said that, there's no way I'm gonna add 12 uncoloured/grey languages that were degrouped from their parent languages, most of which have branding with vibrant, distinctive colour choices. Nor do I want to drop 12 grossly inaccurate colour-choices into the language bar to represent Less, SASS, etc.

@lildude
Copy link
Member

lildude commented Jan 8, 2019

@Alhadis I hear you, and hopefully we can remove this once #4291 (comment) happens. It's on a team's radar, just need to see it come to fruition.

@lildude
Copy link
Member

lildude commented Jan 9, 2019

Especially for you @Alhadis 😘

github_linguist__language_savant__if_your_repository_s_language_is_being_reported_incorrectly__send_us_a_pull_request_

Changelog entry

@Alhadis
Copy link
Collaborator Author

Alhadis commented Jan 9, 2019

This is the happiest day of my life, holy shit. 😀

What should we do about the colour-proximity check?

@wopian
Copy link
Contributor

wopian commented Jan 9, 2019

Would this mean existing languages will be able to get their official colour after the proximity changes now that there's a separator?

@Alhadis
Copy link
Collaborator Author

Alhadis commented Jan 9, 2019

Yes!

@pchaigno
Copy link
Contributor

pchaigno commented Jan 9, 2019

Should we keep some semblance of color proximity detection though? If only to prevent all colors from becoming blue... I was thinking we could simply relax our proximity constraint?

@Alhadis
Copy link
Collaborator Author

Alhadis commented Jan 9, 2019

If only to prevent all colors from becoming blue...

That's a non-issue, and only likely to be noticed in repositories which contain multiple languages that incidentally use almost identical colours.

@pchaigno
Copy link
Contributor

pchaigno commented Jan 9, 2019

repositories which contain multiple languages that incidentally use almost identical colours

Isn't this a kind of birthday paradox and therefore the probability of that happening is actually higher than one might expect? :p

@Alhadis Do you think we should remove the constraint on colors entirely?

@stale
Copy link

stale bot commented Mar 5, 2019

This issue has been automatically marked as stale because it has not had activity in a long time. If this issue is still relevant and should remain open, please reply with a short explanation (e.g. "I have checked the code and this issue is still relevant because ___."). Thank you for your contributions.

@BenEmdon
Copy link

BenEmdon commented Oct 2, 2019

Hey y'all 👋
Thought I'd add my thoughts on this too.

After using Ruby .erbs in a number of non HTML ways (like code generation), I wonder if we should consider reclassifying it? What options are available?

CC: @Alhadis @pchaigno

@Alhadis
Copy link
Collaborator Author

Alhadis commented Oct 2, 2019

Could you post a code sample of what you mean? I can't fathom how HTML markup could be making code generation easier...

In any case, unusual use-cases of a language benefit from a linguist-language override for the affected files... =)

@BenEmdon
Copy link

BenEmdon commented Oct 2, 2019

.erbs just seem like a means to template text (of any kind). There doesn't appear to be anything HTML specific about them.

ERB filenames have a preceding filetype in their name. A HTML ERB would have the filename name.html.erb, while a Java ERB would have the filename name.java.erb, and a conf ERB would have the filename name.conf.erb.

It seems like we could infer the language type of an ERB file from it's preceding filename. How do you feel about this? 👍 👎

Examples of ERBs being used for code generation

@Alhadis
Copy link
Collaborator Author

Alhadis commented Oct 3, 2019

Ah, I see. So it's really more of a generic templating system that (naturally) lends itself well to server-side HTML rendering? If it isn't HTML-centric, it might make sense to rename it to Embedded Ruby instead (as well as degrouping it).

However, that'd still be of minimal benefit to syntax highlighting and language classification. Because Linguist is limited to classifying languages that've been registered ahead of time, it'd be impossible to classify files as, say, Java+ERB, INI+ERB. So, the best we can do is rename it to something more appropriate and/or make it a child-language of Ruby.

I'm really not the right person to be discussing anything Ruby-related, though. Since I've no knowledge of what ERB files are really used for, I can't confidently assert my suggestions are suitable (are these code-generation cases only 10% of ERB-using repositories? A third? ~50%?). @lildude would be right person to ask about this, but since he's currently @busydude, it's probably safer to leave this matter be for now. =)

@BenEmdon
Copy link

BenEmdon commented Oct 3, 2019

The idea of renaming it to embedded ruby –a child language of ruby seem acceptable to me.

I would speculate that HTML+ERB is the most dominant variant due to the popularity of Ruby on Rails. Should the HTML+ERB variant stay classified as an HTML like language, since HTML rendering is still a major use case for ERBs?

@Alhadis
Copy link
Collaborator Author

Alhadis commented Oct 3, 2019

Yes, I think so. Exceptions can always use a .gitattributes override to flag it as another language (which affects syntax highlighting too). Granted, this means they're limited to either Ruby or whatever language is being templated... but it's better than (mis)classing it as partly HTML.

@BenEmdon
Copy link

BenEmdon commented Oct 3, 2019

I don’t mind proposing the change in a PR 😄
Are there other PRs which did something similar? Pointing me to another PR would help me get a head start!

@Alhadis
Copy link
Collaborator Author

Alhadis commented Oct 3, 2019

Renaming a language is a simple procedure (though that wasn't always the case…). You can use #4171 as an example. Basically, it's just:

  1. Rename entry in ./lib/linguist/languages.yml. Then,
    1. Keep the list alphabetised.
      Ordering is case-sensitive (so sorted in binary order: uppercase before lowercase).
    2. Remove the entry's group: HTML field.
      If you want the entry to contribute to the usage statistics of Ruby, replace the line with group: Ruby instead.
  2. Rename samples/ directory: ./samples/Old Name/./samples/New Name/
  3. Scan the following files/directories for mentions of the old name. Update each file accordingly (remember, language names are case-sensitive):
  4. Run bundle exec rake samples to update samples database.
  5. Run script/list-grammars to regenerate the grammars list.
  6. Run bundle exec rake test to run Linguist's test suite. Anything you've missed will display loud hairy feedback: you'll know when you've covered everything. 👍

Sidenote

I just realised we could always keep HTML+ERB and add Embedded Ruby as a separate language. The extensions of HTML+ERB could target .html.erb and .html.erb.deface, whilst the new Embedded Ruby language could simply target .erb more broadly. This is much more complicated, and would necessitate the addition of heuristics and regression tests to disambiguate... however, this feels to me like it might be the winning solution.

Again, I'd wait for @lildude's input before rushing off to submit a PR. Should my solution be found preferable, well, your PR will have been in vain. 😉

@BenEmdon
Copy link

BenEmdon commented Oct 3, 2019

I just realised we could always keep HTML+ERB and add Embedded Ruby as a separate language. The extensions of HTML+ERB could target .html.erb and .html.erb.deface, whilst the new Embedded Ruby language could simply target .erb more broadly. This is much more complicated, and would necessitate the addition of heuristics and regression tests to disambiguate... however, this feels to me like it might be the winning solution.

I agree with this 👍 I'll wait on @lildude input before tackling this.

@ObserverOfTime
Copy link
Contributor

Svelte should be removed from the HTML group. It's similar to Vue which is already on its own.

@hilder-vitor
Copy link

The language Sage is really built on top of Python and their syntaxes are almost the same, but they are not 100% equal. For example, y^3 computes the cube of y in Sage instead of the y XOR 3, as in Python.
Moreover, R.<t> = QQ['x'] is a valid line of code in Sage, while in Python it raises a SyntaxError.

Besides the syntax, the other features are very different. For instance, Python treats mathematical expressions numerically, while Sage treats them symbolically, thus, 1/3 is 0.3333 in Python, but it is a fraction in Sage, and sqrt(2) is 1.4142 in Python, but it is, well, sqrt(2) in Sage.

Even if one has never declared x anywhere, the following is a valid Sage script which prints -1:

sage: f = cos(x)
sage: f(x = pi)
-1

All that said, I would like to invite you to consider degrouping Python and Sage.

@Alhadis
Copy link
Collaborator Author

Alhadis commented Jul 2, 2020

@hilder-vitor You should submit a pull-request to degroup them; this thread is chiefly for discussing languages whose "independence" is ambiguous and open to debate. There's clearly no ambiguity or room for subjectivity in what you've described.

@Alhadis
Copy link
Collaborator Author

Alhadis commented Jul 2, 2020

@ObserverOfTime I missed your comment when you posted it. I've added Svelte to the list.

@github-linguist github-linguist locked as resolved and limited conversation to collaborators Jun 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants