Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fortran. Character variables ending in backslash cause highlighting issues. #1508

Open
ecasglez opened this issue Aug 14, 2020 · 7 comments
Open
Assignees

Comments

@ecasglez
Copy link
Contributor

I have used the demo section of your website to produce the following examples. It also happens in a local installation with version 2.6.1

I have some variables containing paths in Fortran. If the variables end in any character other than '\' everything works fine as in the following figure.

IssueSlashOk2

However, if the variables end in '\', the highlighting is wrong as in the following figure. You can see that line 5 containing WRITE(*,*) pathname is in red and in line 6 pathname = is also in red, and they shouldn't. In addition, in line 6 there are some frames around the backslashes.

IssueSlashBad

If there is only one line with a variable ending in '\' as in the following figure, line 5 (with the WRITE statement) is now ok, but there are the frames around the backslashes.

IssueSlashOk1

I am attaching the code used as examples to this issue.

IssueBackslash.zip

@kurtmckee
Copy link
Contributor

@ecasglez, isn't this invalid Fortran code? A quick Google search for "fortran escape character" turns up a result on Oracle's documentation for Fortran 77. It states that a \' combination cannot terminate a single-quoted string like this. It may be that all of the backslash characters should be escaped, like this:

PROGRAM testslash
    IMPLICIT NONE
    CHARACTER(LEN=:),ALLOCATABLE :: pathname
    pathname = 'C:\\users\\user\\testslash\\'
    WRITE(*,*) pathname
END PROGRAM testslash

I don't yet see that this is a bug in the Fortran lexer.

@ecasglez
Copy link
Contributor Author

ecasglez commented Sep 7, 2020

I would say this is valid Fortran, or at least it is compiled without any warnings using gfortran.

I think the default treatment of backslashes is compiler-dependent in Fortran. Having a look at the gfortran documentation here you can see there are two options called '-fbackslash and -fno-backslash to modify the treatment of this character. The default using gfortran is -fno-backslash which means treating a backslash character as an ordinary backslash. On the other hand using -fbackslash they are treated as escape characters. A deeper explanation of this option can be found here.

I understand from the link you provided that the behavior in the Oracle compiler might be the opposite. By default they are used as escape characters and you need to compile with option -xl to use the backslash character as an ordinary character.

Here you can see a issue on Doxygen related to strings ending in backslash too.

@Anteru
Copy link
Collaborator

Anteru commented Sep 12, 2020

I don't see how we can solve this reliably in Pygments. I'd err on the side of caution and keep the current behavior for now.

@mbraakhekke
Copy link

I have same problem. I'm using the Intel Fortran compiler which distinguishes between standard strings and C-strings based on an optional C-string specifier. C-strings use escaping with the backslash character, while regular strings do not.

I discovered an additional issue that seems to be related. pygments.highlight() seemed to take forever on one file. After investigating I found out that this was caused by a combination of a string literal with a backslash and a comment line with a bunch of backslashes. See here for a minimal example.

The problem lies in FortranLexer(), since I can reproduce it with the following code:

from pygments.lexers import FortranLexer

source = r"""
foobar = 'foo'//'\'//'bar'
!\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
"""

for token in FortranLexer().get_tokens(source):
    print(token)

The run time increases rapidly with each additional backslash added to the comment line.

I admit this is an edge case but I definitely would add a +1 for solving the issue brought up by the OP.

@jeanas
Copy link
Contributor

jeanas commented Jul 28, 2022

That sounds a lot like catastrophic backtracking in the string regexp. I’ll take a look when I’m back home in a few days.

Regarding escaping vs normal backslashes, I don’t see any way to highlight correctly but to add a lexer option for code with escaping backslashes.

@jeanas jeanas self-assigned this Jul 28, 2022
@kurtmckee
Copy link
Contributor

kurtmckee commented Jul 28, 2022

@mbraakhekke, you've discovered a catastrophic backtracking bug.

'strings': [
(r'(?s)"(\\\\|\\[0-7]+|\\.|[^"\\])*"', String.Double),
(r"(?s)'(\\\\|\\[0-7]+|\\.|[^'\\])*'", String.Single),
],

Would you open a new ticket that includes the sample code to reproduce the problem, and reference this issue as well? I'm interested in addressing this but likely can't jump into it immediately due to existing obligations.

Edit: Or @jean-abou-samra may address this before me! 🥳

@mbraakhekke
Copy link

Sure, I'll open a new ticket. But note that there's a clear link with the backslash problem. If I use a double slash instead of a single in my example, the catastrophic backtracking problem is gone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants