Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ragel in C++ Host is incorrectly identified as 'Ragel in Ruby Host' #1251

Closed
gsauthof opened this issue Jun 4, 2014 · 3 comments
Closed

Comments

@gsauthof
Copy link

gsauthof commented Jun 4, 2014

This search returns some false positives:

https://github.com/gsauthof/imapdl/search?l=ragel%20in%20ruby%20host

There is no Ruby code at all - the correct host language would be C++.

@arfon
Copy link
Contributor

arfon commented Dec 3, 2014

Thanks for reporting this. Unfortunately as .rl is only defined as an extension for Ragel in Ruby Host and the first detection method for Linguist is by filename this means we're always going to detect .rl files as Ragel in Ruby Host.

If .rl is a common C++ extension then please consider opening a pull request adding this as an extension. You can read our contributing guidelines here

@arfon arfon closed this as completed Dec 3, 2014
@gsauthof
Copy link
Author

Huh? Was this closed as won't-fix? The issue is not fixed but you have closed the bug. On the other hand you have suggested a pull request. Doesn't really make sense.

Regarding the by-filename detection: is this really how linguist works? Because this would be very limiting. I mean what does linguist do about .h files? Just assume that it is always C header file?

.h is an extension that is used for C header files but also one that is very often used for C++ headers. See also for example the discussion in #1250 which acknowledges this problem and references some work on heuristics.

.rl is the file extension which is often used for Ragel Machine Specifications. Ragel supports many host languages, including C/C++/D/Ruby etc. - although the emphasis is on C in its manual. Usually, a .rl file just contains small snippets of host language code in its actions/glue code. The ragel program compiles the specification to host language code.

Thus, as a first step it would make much sense to change 'Ragel in Ruby Host' to just 'Ragel'.

And if you want to get ambitious you can still add a more sophisticated detection of the language of the embedded action bodies/glue code.

@arfon
Copy link
Contributor

arfon commented Dec 22, 2014

Huh? Was this closed as won't-fix? The issue is not fixed but you have closed the bug. On the other hand you have suggested a pull request. Doesn't really make sense.

Hi @gsauthof - hopefully I've explained what the current situation is - .rl files are currently only defined in one place within Linguist.

Regarding the by-filename detection: is this really how linguist works? Because this would be very limiting. I mean what does linguist do about .h files? Just assume that it is always C header file?

Linguist has a number of strategies for identifying languages, the first is by filename which will return Ragel in Ruby Host for .rl files as no other languages currently define .rl as an extension.

For files such as .h where many languages specify the same extension then we go through these strategies in order:

    STRATEGIES = [
      Linguist::Strategy::Filename,
      Linguist::Shebang,
      Linguist::Heuristics,
      Linguist::Classifier
    ]

As I said in my earlier comment, if .rl is commonly used within the C++ community then the next step is to open a pull request adding in this extension. If you'd like to do this then please take a look at our contributing guidelines here.

gsauthof added a commit to gsauthof/linguist that referenced this issue Jan 24, 2015
Source:

Download http://www.colm.net/files/ragel/ragel-6.9.tar.gz and look unter
`examples`. There are a bunch of `.rl` files that all contain Ragel
specifications with some C or C++ in action snippets.  Not one example
including ruby.

Ragel (the state machine description compiler/language) also supports
languages besides C/C++, but a .rl file usually contains just small
snippets of the target language (C/C++ ...) in actions of a state machine.

Thus, the correct classification of `.rl` files is just 'Ragel' and
NOT 'Ragel in Ruby Host'.

See also my comment in the prematurely closed issue:

github-linguist#1251 (comment)
@github-linguist github-linguist locked as resolved and limited conversation to collaborators Jun 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants