New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Prolog filetype support #3171
base: master
Are you sure you want to change the base?
Conversation
This looks good, but, to my thinking, incomplete without support for Visual Prolog, which is what Lexilla's lexer actually targets. I opened techee#4 to give an idea of what a more inclusive file def might look like. |
@techee, I'm confident that SWI-Prolog users will be completely happy 👍🏼 They're getting more lexical categories than even the VS Code extension recognizes. Notice, however, that Neither is as complete as prolog.vim, but I think we have to accept that regex-capable parsers like Vim and Textmate grammars are simply better than Scintilla's match-every-character-of-one-lexeme-at-a-time model: Footnotes |
Scintilla lexers are C++, so in theory they could do anything, just somebody has to code it :-) |
I'm getting slightly lost in what you propose to do - I just took the VS code keywords as you suggested but basically I could merge all the keywords together, i.e.:
I just suspect not many people will use Geany for Visual Prolog which seems to be Windows-only, proprietary and with an official IDE (whose authors probably wrote the lexer and use it in their IDE).
Yeah, you should be able to do more things in the code than using regular expression. |
Sounds reasonable. Let's just settle for enough SWI-PL keywords to provide a common denominator between the Vim and VS Code implementations. Type specifiers are unique to Visual Prolog and could be better implemented by a tags parser anyway. Serious users would expect their custom types to be styled the way
Well, being easy but inefficient allowed Python an PHP to become the institutions they are today. There was a proposal made a long time ago to teach Scintilla how to consume flex files. Footnotes
|
So should I merge the vim and VS code keywords? Right now it's the VS code keywords only. |
I recommended that set because it's much bigger overall, but it reflects only one syntactic category (builtin functions). |
I'm not sure I understand how Scintilla follows the "inefficient but easy" tradition, I would have said that writing everything in C++ follows the "difficult but efficient" tradition 😄 Recognising that words (identifiers/names/whatever your language calls them) can represent several different syntactic constructs, and these tend to change as the language evolves, Scintilla provides the facility for the application (thats Geany) to provide several lists of words and facilities for the lexer to efficiently recognise if/which list a word is in, and members of those lists can be styled differently. Most lexers happily use this facility, but how many lists they support varies from lexer to lexer. This facility is even (mis)used by Geany to supply lists of typenames detected by the ctags parsers/tagfiles dynamically at runtime for some languages (eg C/C++).1 The prolog lexer supports these lists:
I think @techee only provided for two in the filetype file. Maybe they can all be allowed since there is no ctags parser so none need to be reserved for that. Then the lists might be better arranged. Footnotes
|
Easy to implement:
Inefficient if you count the time spent chasing subtle bugs inside character-counting loops — the so-called "inextricable suite of if / elseif / goto code" https://sourceforge.net/p/scintilla/feature-requests/1074 Then again, maybe "Worse is Better" was the tradition I had in mind. |
On 1331 "Easy" to use, Neil must have had his tongue in cheek, just ignore the thousands of lines of Scite which is just a "test editor", but not easy to write a lexer for a language which was what we are talking about. 1074 is one persons opinion, not gospel truth, only my opinion is gospel truth 😁 [end humility] "Worse is better" says the user experience is poor to simplify implementation, but I would argue the implementation difficulty of writing lexers in C++ results in a better user experience due to the speed, try opening a big HTML in Geany and in gedit (which uses regex syntax highlighting). But certainly the lexer development experience is worse if you are not a C++ist. |
@rdipardo I merged the vim and vs code keywords into one - does it look good to you? |
@techee, I think we're finally done with keywords. 👍 Now, on to a messier issue I just detected. In SWI-Prolog, relational operators are prefixed with Visual Prolog interprets the if (sc.state == SCE_VISUALPROLOG_DEFAULT) {
if (sc.Match('@') && isOpenStringVerbatim(sc.chNext, closingQuote)) {
sc.SetState(SCE_VISUALPROLOG_STRING_VERBATIM); I guess Geany could always intercept or override the |
Geany can't really override any styles since the lexer will just put them back each time it runs, and Scintilla, not Geany, controls when that happens and what range of the file is re-lexed. A style can't be "ignored" but it can be mapped to the default style, and we could do that for But then in the example in your image that would mean the second Which is the least of the two evils?
Well clearly someone uses Visual Prolog or they would not have contributed the lexer. Somebody could contribute a patch to Lexilla (controlled by a property) that changed |
The optional exception for SWI-Prolog would have to be opt-in for the sake of editors that already consume this lexer. I'm even less enthusiastic about that idea because the track record of "multi-lexers" is a lousy one. The implementation of JavaScript template strings remains blocked by the need to recognize
If Geany's users want to see their SWI-Prolog files in living colour, find them a SWI-Prolog lexer. |
I agree that mixed language lexers/parsers tend to be problematic, and I can understand Neils decision not to work on javascript when it became too complex (shudder, and I guess nobody else has stepped up either). But this is just a difference between compilers, not languages. There is prior art in having differences in the lexer to accommodate differences in tools, for example LexASM.cpp allows comment characters to be varied to match the differing assemblers In the meantime it can be left as is, or the [Edit: and lexilla's fortran lexer actually defines two lexers in LexFortran.cxx one for old Fortran and one for F77 as another example of language variants being handled, and the Haskell lexer allows for GHC compiler extensions that vary the usage of identifier characters in the language, the matlab lexer has matlab and octave lexers in the same file, and thats only from a sample of the lexers, so handling common variations is accepted and can be packaged as multiple lexers in the one file (sharing most of their code) or selection by option.] |
I've just pushed a patch here disabling '@' as a literal string start character - it was pretty simple. There seem to be many lexers having configuration options like this so I don't expect there would be a problem upstream. If it works as expected here, I'll send a patch upstream. |
It would be ideal if Footnotes
|
Done. |
One more thing we might consider mapping when looking at the vim example is variables. Right now in filetypes.prolog we set
but we could use e.g.
(I'm not really sure what the right mapping is in this case.) |
Should be trivial since a Prolog variable must begin with a capital letter or an underscore 1. Footnotes |
@elextr Do you know how |
It's not something we need to implement - it's already implemented by the lexer. It's whether we should map it inside |
@rdipardo just to expand on @techee comment above, the way the highlighting works is:
This system allows colour schemes (aka themes) to just specify a single set of styles and have similar entities in all languages be styled the same, but it depends on filetypes files mapping the syntactic entities in a sensible manner. There is no style name for a "variable" because almost no language allows identifiers to be classed as variables (as distinct from functions and types and other stuff) purely syntactically, so no colour schemes will define a style for it, thats why @techee suggested the "parameter" style name. It is possible to map to a style in the filetype file, but that then won't change when the colour scheme changes, so a dark colour thats ok on light schemes may not be visible on dark schemes, so its always better to map to existing style names, even if their semantics are slightly different. |
To be exact, underscores denote an unused variable (called a "singleton"). The lexer maps them to } else if (sc.Match('_')) {
sc.SetState(SCE_VISUALPROLOG_ANONYMOUS); Only words in proper title-case are styled as } else if (isUpperLetter(sc.ch)) {
sc.SetState(SCE_VISUALPROLOG_VARIABLE); |
Thats possible in the filetype file, see explanation above. |
Oh, of course, case closed :-)
Well, he is more prolific, but SWI-prolog seems to mostly be Jan, but then Lexilla shows as almost all Neil because of the patch not PR process the project has used in the past, so it may be unfair to both prologs. |
Agree, removed. |
Yes, that looks right. "Interpretive" styles like muted colours for singletons have some value for learners, but Prologists won't need visual aids.
Your PR will need unit tests, and, to separate the mutually incompatible lexer options, those will need to be configured with conditional properties. There are no existing Prolog lexer tests, so a language-specific It's now possible to easily assign lexer properties to a definite number of diverse file types, like this:
The test file doesn't have to be a coherent program. Like AllStyles.rb, it can simply iterate all the lexical classes. The same content can be saved twice, once each with the *.pro and *.pl extensions. The feature under test is that, for example, a sequence like # Visual Prolog properties
match test01.pro
lexer.visualprolog.verbatim.strings=1
lexer.visualprolog.backquoted.strings=0
# ISO/SWI-Prolog properties
match test01.pl
lexer.visualprolog.verbatim.strings=0
lexer.visualprolog.backquoted.strings=1 You can simply copy the keyword groups from SciTE's default configuration, and assign them all to the *.pro extension. Leftover keyword groups can be filled with SWI-Prolog lexemes and assigned to the *.pl extension. |
@rdipardo Thanks! Do you have any good source of sample files that could be used as unit tests? It probably involves both VisualProlog samples and SWI prolog samples. |
Before anything, you'll need to apply this patch: Fix-EOL-splitting-in-LexVisualProlog.diff.txt Lexilla's testing protocol has dramatically improved over the past year, and it now checks for consistency across EOL modes. A hard failure is raised if the CR and LF of a Windows EOL is in two different styles, e.g., In fairness to the lexer, the problem really comes from the flawed implementation of Footnotes |
@rdipardo Just curious, don't you want to take over the lexilla part of this PR and submit the necessary changes upstream? I'm not as familiar neither with Scintilla nor with Prolog as you seem to be and you will probably be a better person to explain the necessary changes to Neil. |
This is ultimately a feature request: #3086. The broken EOL styles are a side issue, yes, but they're blocking the addition of the SWI-Prolog features. In truth, any new features are premature until the lexer functions properly for its original purpose. I can take care of the EOL styles and the inaugural (Visual Prolog) tests that would require. But I'm not going to assume ownership of a feature request I didn't open. |
@rdipardo OK, I'll try to prepare something myself (once I have more time). |
No rush: I've already proposed a fix for the EOL splitting: ScintillaOrg/lexilla#83 |
The path is now clear for the SWI-PL feature request. All that's left is to:
|
Geany normally does not maintain modifications from Scintilla/Lexilla releases, is the Lexilla patch in the latest release? |
Committed only yesterday (Aussie time), but still too late for 5.1.7: ScintillaOrg/lexilla@3d02c15f |
@rdipardo Many thanks for the provided unit tests and all the other work here, it has been a great time-saver for me. I've just opened this PR upstream: |
The lexer changes have been merged upstream so I updated this PR with the upstream VisualProlog lexer. |
The upstream Prolog lexer may be undergoing a substantial refactoring soon. I can't seem to get Geany to build for me these days, but maybe now's a good time for @techee to synchronize his topic branch? |
Sorry to disappoint you, any PR should use a version of a lexer that matches the Lexilla version in Geany, if its a version that is several into the future it might go backward if someone upgrades Lexilla one step. [Edit: basically we don't have a process for specifying the version of individual lexers] |
This patch adds basic Prolog support (only scintilla laxer, there's no ctags parser). I used swi-prolog for the compiler and run commands which I believe is the most commonly used prolog implementation. This patch contains keyword suggestions made by @rdipardo, thanks.
@rdipardo I just rebased this PR on top of master and also made the necessary updates related to the updated lexer. Since you are probably the most proficient prolog user around, would you have a look at it if it looks alright? |
It's everything I would want, thanks! But where are the symbols? Or is tag parsing still on the TODO list? |
AFAICT from uctags nobody has written a parser for prolog. |
Not on my todo list :-). But if you want to create something, I can imagine a super-simple approach of implementing the parser which works just based on the common way prolog files are indented; indented lines would be ignored by the parser and for non-indented lines the string between the start of the line until https://github.com/geany/geany/blob/master/ctags/parsers/erlang.c Before starting to work on something like that, better to ask at the universal-ctags project whether the maintainer doesn't prefer the parser in the form of a regex-based parser. |
@b4n @eht16 OK to merge this PR? I'd like to avoid the situation like with the Raku parser (#3169, also updated on top of 2.0 now) where it rots in open PRs and then, before the release, it's too late to get merged. I went through the various open PRs and open issues providing/requesting support of certain languages because some seem to be a bit neglected:
What do you think? Edit: Moved the discussion to #3651 |
Maybe move the above comment to a separate issue so it doesn't get lost when this is merged (maybe with ticklist of filetypes to add). This PR LGBI. |
Yep, done in #3651 |
Except for the "ticklist of filetypes to add", or tasklist, which GitHub can automatically generate for list items with a "tick box" beside them, e.g. - [ ] Needs a tick <!-- note the single space -->
- [X] Ticked! |
This patch adds basic Prolog support (only scintilla laxer, there's no
ctags parser). I used swi-prolog for the compiler and run commands which
I believe is the most commonly used prolog implementation. I used the
keywords from here:
https://github.com/mxw/vim-prolog/blob/master/syntax/prolog.vim
I only used Prolog at school many years ago but it's an interesting language and I believe Geany should support it (which is why #3086 resonated in my head).
Fixes #3086