Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove samples/LANG/filenames as a source of truth #4078

Merged

Conversation

pchaigno
Copy link
Contributor

All filenames must now be explicitly listed in languages.yml.

Description

A recent classification error urged me to look into this. rust-lang/crates.io-index was incorrectly recognized as C because of file sc/ri/script. Although we do not have a script filename listed in languages.yml, we do have a script sample file in samples/C/filenames. (I probably incorrectly put this sample file there because of its shebang and lack of extension.) Linguist uses the list of filenames from samples/LANG/filenames/, in addition to those defined in languages.yml, to determine which files belong to a language:
https://github.com/github/linguist/blob/dd3b1eec911072ca051ecccabc03a47bafa2c76c/lib/linguist/language.rb#L542-L548

I think it would be better to rely on languages.yml as the single source of truth for languages and their files. I removed the culprit lines and added the appropriate filenames to languages.yml.
I also moved samples/C/filenames/script to its correct location, thus solving the classification with rust-lang/crates.io-index.

Checklist:

  • I am associating a language with a new file extension.

  • I am adding a new language.

  • I am fixing a misclassified language

    • I have included a new sample for the misclassified language:
      • Sample source(s):
        • [URL to each sample source, if applicable]
      • Sample license(s):
    • I have included a change to the heuristics to distinguish my language from others using the same extension.
  • I am changing the source of a syntax highlighting grammar

  • I am adding new or changing current functionality

    • I have added or updated the tests for the new or changed functionality.

Copy link
Member

@lildude lildude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I think it might be a good idea to also use some sort of consistency around the use of quotes around the filenames in languages.yml. Some of your additions have quotes, others don't.

@pchaigno
Copy link
Contributor Author

@lildude Should I remove or add them everywhere? Only in my modifications, or everywhere in the file? script/set-language-ids adds them everywhere I think; we didn't have any before this script existed.

@lildude
Copy link
Member

lildude commented Mar 27, 2018

So as not to deviate from the original intent of this PR, lets stick with adding the quotes to just the lines you're adding. We can come back and do the others later, or as other peeps touch the various sections, we can ask them to do it.

@lildude
Copy link
Member

lildude commented Mar 28, 2018

CI should pass once #4081 has been merged.

All filenames must now be explicitly listed in languages.yml. A test
makes sure they are.
@pchaigno pchaigno force-pushed the assert-filenames-samples-listed branch from 2ef7d27 to e4855b3 Compare March 30, 2018 17:57
@pchaigno
Copy link
Contributor Author

I updated the branch.

@pchaigno
Copy link
Contributor Author

pchaigno commented Apr 2, 2018

@lildude Is anything else needed?

@lildude
Copy link
Member

lildude commented Apr 2, 2018

@lildude Is anything else needed?

Yup, consistent quoting for the lines you're adding - #4078 (comment)

@pchaigno
Copy link
Contributor Author

pchaigno commented Apr 2, 2018

Ah. Should have added a comment on that. I tried to be consistent with the remaining of the file: double quotes are only added (by Ruby's YAML) when there's a dot, i.e., for file extensions and some filenames.

@lildude
Copy link
Member

lildude commented Apr 2, 2018

Ah. Should have added a comment on that. I tried to be consistent with the remaining of the file: double quotes are only added (by Ruby's YAML) when there's a dot, i.e., for file extensions and some filenames.

Ah, I see now. 👍

Copy link
Member

@lildude lildude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@pchaigno pchaigno merged commit 0bf4b8a into github-linguist:master Apr 2, 2018
@pchaigno pchaigno deleted the assert-filenames-samples-listed branch April 2, 2018 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants