New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added HAProxy configuration language #4303
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution!
I've left a few comments inline.
Here is an overview of how syntax highlighted files will appear on GitHub.com with the new grammar: Lightshow link. |
@NickMRamirez Could you add one or two other sample files? I'm still seeing a lot of misclassifications by the Bayesian classifier between HAProxy and INI files. I'm expecting that it should be able to do a better job with more training data since HAProxy files contain a set of easily identifiable keywords. Files of the same size as the one you created, or a bit larger should be fine; try to avoid huge files, as the Bayesian then keeps all tokens in memory. |
Added more sample files. Let me know if I should add more. |
I've downloaded 2347 files from your search query and ran Linguist on it. As far as I can see, it detects all HAProxy files perfectly. I'd say we're good to go! |
Great to hear! Thanks for your work on this! |
@pchaigno regarding:
Shouldn't we be using a heuristic first? This PR doesn't have a heuristic but introduces a duplicate extension. |
@lildude Adding more samples was enough to improve the Bayesian classifier's accuracy significantly. We went from 1 to 4 samples and now don't have miss-classifications anymore (in the 2347 files corpus I downloaded). In the past, we've kept heuristic rules for cases where the Bayesian classifier is unable to identify recurring keywords. Here, we have a set of recurring keywords ( |
@pchaigno 👍 Thanks for the explanation and validation work. 🙇 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Welcome to Linguist and thanks for your contribution.
Adding a new language for HAProxy configuration files, e.g. haproxy.cfg. The ".cfg" file extension is already used by the INI language, but I have added samples to both sample folders to differentiate them.
Checklist:
I am associating a language with a new file extension.
[X ] I am adding a new language.
[X ] The extension of the new language is used in hundreds of repositories on GitHub.com.
Search for haproxy.cfg filename:
https://github.com/search?q=filename%3Ahaproxy.cfg&type=Code
Search for .cfg that is not haproxy.cfg:
https://github.com/search?q=roundrobin+extension%3Acfg+-filename%3Ahaproxy.cfg
[X ] I have included a real-world usage sample for all extensions added in this PR:
[X ] I have included a syntax highlighting grammar: https://github.com/abulimov/atom-language-haproxy
I have updated the heuristics to distinguish my language from others using the same extension.
I am fixing a misclassified language
I am changing the source of a syntax highlighting grammar
I am updating a grammar submodule
I am adding new or changing current functionality