Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flang] Inhibit case of false tokenization of Hollerith #79029

Merged
merged 1 commit into from
Jan 26, 2024

Conversation

klausler
Copy link
Contributor

#78927 contains a case of fixed-form source in which a Hollerith literal is mistakenly tokenized, leading to grief later due to apparently unbalanced parentheses.

The source looks like "REAL*8 R8HEAP(SCRSIZE)" and the Hollerith literal is misrecognized as such because it follows "8R". In order to properly tokenize Hollerith literals in old comma-free FORMAT statements like "1 FORMAT(3I5HFLANG)", the tokenizer in the prescanner treats a letter after an integer token ("3I") as a special case. The fix is to do this only when the characters involved are nested in parentheses and Hollerith is a possibility.

Fixes #78927.

llvm#78927 contains a case
of fixed-form source in which a Hollerith literal is mistakenly
tokenized, leading to grief later due to apparently unbalanced
parentheses.

The source looks like "REAL*8 R8HEAP(SCRSIZE)" and the Hollerith
literal is misrecognized as such because it follows "8R".  In order
to properly tokenize Hollerith literals in old comma-free FORMAT
statements like "1 FORMAT(3I5HFLANG)", the tokenizer in the prescanner
treats a letter after an integer token ("3I") as a special case.
The fix is to do this only when the characters involved are nested
in parentheses and Hollerith is a possibility.

Fixes llvm#78927.
@llvmbot llvmbot added flang Flang issues not falling into any other category flang:parser labels Jan 22, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 22, 2024

@llvm/pr-subscribers-flang-parser

Author: Peter Klausler (klausler)

Changes

#78927 contains a case of fixed-form source in which a Hollerith literal is mistakenly tokenized, leading to grief later due to apparently unbalanced parentheses.

The source looks like "REAL*8 R8HEAP(SCRSIZE)" and the Hollerith literal is misrecognized as such because it follows "8R". In order to properly tokenize Hollerith literals in old comma-free FORMAT statements like "1 FORMAT(3I5HFLANG)", the tokenizer in the prescanner treats a letter after an integer token ("3I") as a special case. The fix is to do this only when the characters involved are nested in parentheses and Hollerith is a possibility.

Fixes #78927.


Full diff: https://github.com/llvm/llvm-project/pull/79029.diff

1 Files Affected:

  • (modified) flang/lib/Parser/prescan.cpp (+5-4)
diff --git a/flang/lib/Parser/prescan.cpp b/flang/lib/Parser/prescan.cpp
index 68d7d9f0c53c475..029652adbca1df5 100644
--- a/flang/lib/Parser/prescan.cpp
+++ b/flang/lib/Parser/prescan.cpp
@@ -605,13 +605,14 @@ bool Prescanner::NextToken(TokenSequence &tokens) {
       do {
         EmitCharAndAdvance(tokens, *at_);
       } while (IsHexadecimalDigit(*at_));
-    } else if (IsLetter(*at_)) {
-      // Handles FORMAT(3I9HHOLLERITH) by skipping over the first I so that
-      // we don't misrecognize I9HOLLERITH as an identifier in the next case.
-      EmitCharAndAdvance(tokens, *at_);
     } else if (at_[0] == '_' && (at_[1] == '\'' || at_[1] == '"')) { // 4_"..."
       EmitCharAndAdvance(tokens, *at_);
       QuotedCharacterLiteral(tokens, start);
+    } else if (IsLetter(*at_) && !preventHollerith_ &&
+        parenthesisNesting_ > 0) {
+      // Handles FORMAT(3I9HHOLLERITH) by skipping over the first I so that
+      // we don't misrecognize I9HOLLERITH as an identifier in the next case.
+      EmitCharAndAdvance(tokens, *at_);
     }
     preventHollerith_ = false;
   } else if (*at_ == '.') {

Copy link
Contributor

@clementval clementval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@foxtran
Copy link
Member

foxtran commented Jan 22, 2024

Thanks! Works fine for 1.5M LoC :)

@klausler klausler merged commit 776e25a into llvm:main Jan 26, 2024
6 checks passed
@klausler klausler deleted the bug78927 branch January 26, 2024 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:parser flang Flang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[flang] legacy extension of kind selector does not work properly
4 participants