Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new Hack file extensions. #4354

Merged
merged 1 commit into from
Nov 28, 2019
Merged

Conversation

azjezz
Copy link
Contributor

@azjezz azjezz commented Dec 14, 2018

Description

Add support for hack hhi extension.

Checklist:

Copy link
Contributor

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

I've not checked the usage for the .hack and .hhi extensions yet. Also, I did not found the change to the heuristic; did you push it?

Note: A change to the heuristic rule is only required if other languages share the same extensions.

lib/linguist/languages.yml Outdated Show resolved Hide resolved
@azjezz
Copy link
Contributor Author

azjezz commented Dec 15, 2018

@pchaigno i have removed the .hck extension, also i didn't add heuristic changes as the .hack and .hhi extensions are not used by any other language

@stale
Copy link

stale bot commented Feb 4, 2019

This pull request has been automatically marked as stale because it has not had recent activity, and will be closed if no further activity occurs. If this pull request was overlooked, forgotten, or should remain open for any other reason, please reply here to call attention to it and remove the stale status. Thank you for your contributions.

@stale stale bot added the Stale label Feb 4, 2019
@azjezz
Copy link
Contributor Author

azjezz commented Feb 4, 2019

cc @pchaigno

@stale stale bot removed the Stale label Feb 4, 2019
@pchaigno
Copy link
Contributor

pchaigno commented Feb 4, 2019

Usage for the .hhi extension seems fine. I've counted 830 repositories by 746 users using Harvester and the extension:hhi hh search query.

We'll have an issue with the .hack extension however, if we merge this pull request as-is. There are plenty of .hack files that contain just zeros and ones which will be classified as Hack with this pull request. I don't know what these files are. Either we can identify them and classify them as their own (data?) language, or we'll need to remove the .hack extension from Hack for the moment.

EDIT: Sorry for the delay in answering. I've had quite a busy January :(

@azjezz
Copy link
Contributor Author

azjezz commented Feb 5, 2019

@pchaigno

Problems with .php extension :

currently its hard to detect hack files that use .php extension, for example i noticed that today my repository was reporting 11% hack, when its a pure php app. the issue was that some .php files contained only HTML, since there's no <?php, linguist detects them as hack.

see : https://github.com/azjezz/web-php-mock-ups

Problems with .hh extension :

hh files used to be detected as c++ header files, but since we added a heuristic rule last time to check for <?hh it was fixed.

the issue here is that hack is not planning to keep the <?hh tag after hhvm 4.0

Solution :

.hack seems to be the only reasonable solution.
there's no other language using this extension, makes it easy to detect ( note: there's no <?hh tag in .hack files )

Solution for 01 files

the solution would be to identify them and classify them as their own (data or binary) language.


cc @fredemmott

@fredemmott
Copy link

We’ve not yet announced support for .hack files - an while the support was committed, it is about to change, and these pull request examples will no longer be considered valid - as of next monday’s release, having any ‘<?’ token in a .hack file is a syntax error. They’re always strict mode.

@azjezz
Copy link
Contributor Author

azjezz commented Feb 5, 2019

@fredemmott i suggested these examples before we discussed the <?hh tag in .hack files 😛 i have removed the <?hh tag.

@jeffomatic
Copy link

Just for context: the existing .hack files are almost overwhelmingly going to be from solutions to the Nand2Tetris course, a popular MOOC for computer science (I recommend it!). .hack is a "compiled" machine code format for the fictitious CPU that students implement. As such, it's not a language that can be syntax-highlighted in any meaningful way. The content of those files are ASCII 1's and 0's, plus whitespace.

@pchaigno
Copy link
Contributor

pchaigno commented Feb 5, 2019

Thank you all for the information!

I've checked 2817 .hack files from the extension:hack 0000000000010000 OR 0000000000000000 search query using Harvester and found 931 repositories among 918 users. It's enough to add support for that second language. Once we've added that language, it shouldn't be hard to distinguish between the two with 100% accuracy :-)
@jeffomatic Any idea what we should name it? Looks like the official name is Hack, so maybe "Nand2Tetris Hack"?

@fredemmott i suggested these examples before we discussed the <?hh tag in .hack files i have removed the <?hh tag.

We try to follow usage and not specifications in Linguist. If one language says some syntax is illegal but a large number of users still rely on that syntax, we'll try to support it. So I'm not sure we should remove the tags here. The best thing would be to preserve the sample files in their original form.

@fredemmott
Copy link

fredemmott commented Feb 5, 2019

We try to follow usage and not specifications in Linguist. If one language says some syntax is illegal but a large number of users still rely on that syntax, we'll try to support it. So I'm not sure we should remove the tags here. The best thing would be to preserve the sample files in their original form.

There's definitely not a large number of users; Hack uses a of variant PHP-style autoloading, and no release of the autoloader supports the .hack extension at all - and it's only been in master since Jan 15th. Any usage of <?hh in .hack files is extremely early adopters running a locally patched version or master from the last two weeks - as far as I'm aware, it's just @azjezz.

@fredemmott
Copy link

Though, <?hh should remain supported in .php, .hh, .hhi files etc.

@azjezz
Copy link
Contributor Author

azjezz commented Feb 5, 2019

@fredemmott the .hack files in @nuxed were generated using hh-to-hack so they don't use <?hh already

Alhadis added a commit to file-icons/atom that referenced this pull request Feb 14, 2019
@jeffomatic
Copy link

@jeffomatic Any idea what we should name it? Looks like the official name is Hack, so maybe "Nand2Tetris Hack"?

I think that's fine. It is pretty unlikely that anyone would intentionally configure a language context for these files since they are far closer to binaries than they are human-consumable text.

Caveat that I don't speak for the Nand2Tetris project or community in any real way other than having gone through the course as a hobbyist.

@pchaigno
Copy link
Contributor

Thanks @jeffomatic!

@azjezz - Could you add the Nand2Tetris Hack language to Linguist in this pull request as well, with a heuristic to disambiguate between the two .hack language? I think we can probably add Nand2Tetris as a data language.

Something along the lines of:

- extensions: ['.hack']
  rules:
  - language: Nand2Tetris Hack
    pattern: ^[01]+$
  - language: Hack

should be enough.

@azjezz
Copy link
Contributor Author

azjezz commented Feb 15, 2019

@pchaigno done 👍

Copy link
Contributor

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I was unclear; you'll need to add the Nand2Tetris Hack language as well by following the contribution guidelines.

lib/linguist/languages.yml Show resolved Hide resolved
Copy link
Contributor

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Travis CIbuilds are failing. I commented the few things you'll need to change to pass the tests.

There's one failure I can't quite explain yet. I'll have a look when I'll be on my computer.

lib/linguist/languages.yml Outdated Show resolved Hide resolved
lib/linguist/languages.yml Outdated Show resolved Hide resolved
test/test_language.rb Outdated Show resolved Hide resolved
test/test_language.rb Outdated Show resolved Hide resolved
@stale
Copy link

stale bot commented Mar 18, 2019

This pull request has been automatically marked as stale because it has not had recent activity, and will be closed if no further activity occurs. If this pull request was overlooked, forgotten, or should remain open for any other reason, please reply here to call attention to it and remove the stale status. Thank you for your contributions.

@stale stale bot added Stale and removed Stale labels Mar 18, 2019
@pchaigno
Copy link
Contributor

pchaigno commented Mar 18, 2019

My guess would be that the tests are failing because the tokenizer is unable to extract tokens from the Nand2Tetris hack sample file. If that's the case, we'll probably need to adapt the tokenizer or add an exception mechanism. I haven't found time to debug this yet.

@pchaigno pchaigno self-assigned this Mar 18, 2019
@lexidor
Copy link

lexidor commented May 3, 2019

@pchaigno if the Nand2Tetris files are only ones and zeroes (and unlimited whitespace), then there are no real tokens. This language is not read as source code. Would it be okay if I did something dirty and just specified the token definition as "Each byte is the start of a new token"? Would this somehow cause something unrelated to break? I would like to have syntax highlighting in hack files, so sorry for not prioritising the Nand2Tetris hack format more.

@lexidor
Copy link

lexidor commented May 20, 2019

@pchaigno Are we following the route of not requiring tokenizers. Tokenizing such files is rather pointless, since there are no tokens. Hack code is being used in dothack files in many more places than it was at the release of HHVM4. This leaves entire repositories as black text on GitHub.

@azjezz
Copy link
Contributor Author

azjezz commented Jun 6, 2019

cc @pchaigno, any updates here ? after #4544 , the .hack files syntax is "highlightable", but requires using a gitattribute override to classify .hack as Hack, this PR solves that issue.

example : .gitattribute / Translator.hack

@lildude
Copy link
Member

lildude commented Nov 18, 2019

Wow! How did I miss this one for so long. Sorry folks.

Support for the .hack extension has now been added in #4588 and we completely forgot about the Nand2Tetris files. I don't think anyone has noticed... or really cares about those files being identified as Hack (famous last words 😆) so lets update this branch and change it to only add support for the .hhi extension.

@azjezz
Copy link
Contributor Author

azjezz commented Nov 18, 2019

@lildude i myself forgot about this PR :)

i have update the PR to only address .hhi extension.

@lildude lildude merged commit ca11839 into github-linguist:master Nov 28, 2019
@azjezz azjezz deleted the patch-1 branch December 14, 2020 15:02
@github-linguist github-linguist locked as resolved and limited conversation to collaborators Jun 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants