Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[plsql] Ampersand '&' causes PMD processing error in sql file - Lexical error in file #195

Closed
andreasmarkussen opened this issue Jan 17, 2017 · 4 comments · Fixed by #3161
Labels
a:suggestion An idea, with little analysis on feasibility, to be considered an:enhancement An improvement on existing features / rules in:grammar About the grammar of a lexer or parser, eg, a parse/lex exception
Milestone

Comments

@andreasmarkussen
Copy link

andreasmarkussen commented Jan 17, 2017

When using PMD for [PLSQL] I am getting a Lexical error when parsing files with &.

Rule Set:

  • n/a - not a rule but in the lexical analysis.

Description:
It seems to be caused by the ampersand, since if I remove it, the script is processed without issues.
I do, however need the ampersand for variable inclusion.

Several places on oracle.com this is being used:
https://blogs.oracle.com/opal/entry/sqlplus_101_substitution_varia#9_4

Code Sample demonstrating the issue:

--both define and spool are SQL*Plus commands, and they should not be ended with a semi-colon.
define patch_name = acme_module
spool &patch_name..log

Running PMD through:
CLI

Using this command

    pmd -V -d spool_sub.sql -f html -R ruleset.xml

Gives this error

Counting for net.sourceforge.pmd.lang.plsql.ast.ExecutableCode
Counting for net.sourceforge.pmd.lang.plsql.ast.OracleObject
Loaded rule NcssObjectCount
Loaded rule NcssMethodCount
Loaded rule TooManyFields
Loaded rule ExcessivePackageSpecificationLength
Loaded rule NPathComplexity
Loaded rule TooManyMethods
Loaded rule CyclomaticComplexity
Loaded rule ExcessiveTypeLength
Loaded rule ExcessiveParameterList
Loaded rule ExcessiveObjectLength
Loaded rule ExcessiveMethodLength
Using PLSQL version: PLSQL
Counting for net.sourceforge.pmd.lang.plsql.ast.ExecutableCode
Counting for net.sourceforge.pmd.lang.plsql.ast.OracleObject
Loaded rule CyclomaticComplexity
Loaded rule NPathComplexity
Loaded rule ExcessiveParameterList
Loaded rule TooManyMethods
Loaded rule ExcessiveObjectLength
Loaded rule ExcessiveMethodLength
Loaded rule NcssMethodCount
Loaded rule ExcessivePackageSpecificationLength
Loaded rule NcssObjectCount
Loaded rule TooManyFields
Loaded rule ExcessiveTypeLength
Counting for net.sourceforge.pmd.lang.plsql.ast.ExecutableCode
Counting for net.sourceforge.pmd.lang.plsql.ast.OracleObject
Processing C:\db\spool_sub.sql
Error while processing file: C:\db\spool_sub.sql
about:blank
net.sourceforge.pmd.lang.ast.TokenMgrError: Lexical error in C:\db\spool_sub.sql at line 3, column 7.  Encountered: "&" (38), after : ""
	at net.sourceforge.pmd.lang.plsql.ast.PLSQLParserTokenManager.getNextToken(PLSQLParserTokenManager.java:4459)
	at net.sourceforge.pmd.lang.plsql.ast.PLSQLParser.jj_consume_token(PLSQLParser.java:39778)
	at net.sourceforge.pmd.lang.plsql.ast.PLSQLParser.SqlPlusCommand(PLSQLParser.java:384)
	at net.sourceforge.pmd.lang.plsql.ast.PLSQLParser.Input(PLSQLParser.java:198)
	at net.sourceforge.pmd.lang.plsql.PLSQLParser.parse(PLSQLParser.java:47)
	at net.sourceforge.pmd.SourceCodeProcessor.parse(SourceCodeProcessor.java:91)
	at net.sourceforge.pmd.SourceCodeProcessor.processSource(SourceCodeProcessor.java:138)
	at net.sourceforge.pmd.SourceCodeProcessor.processSourceCode(SourceCodeProcessor.java:76)
	at net.sourceforge.pmd.SourceCodeProcessor.processSourceCode(SourceCodeProcessor.java:43)
	at net.sourceforge.pmd.processor.PmdRunnable.call(PmdRunnable.java:78)
	at net.sourceforge.pmd.processor.PmdRunnable.call(PmdRunnable.java:25)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

<html><head><title>PMD</title></head><body>
<center><h3>PMD report</h3></center><center><h3>Problems found</h3></center><table align="center" cellspacing="0" cellpadding="3"><tr>
<th>#</th><th>File</th><th>Line</th><th>Problem</th></tr>
</table><hr/><center><h3>Processing errors</h3></center><table align="center" cellspacing="0" cellpadding="3"><tr>
<th>File</th><th>Problem</th></tr>
<tr bgcolor="lightgrey"> 
<td>C:\db\spool_sub.sql</td>
<td>Error while processing C:\db\spool_sub.sql</td>
</tr>
</table></body></html>
@jsotuyod jsotuyod changed the title Ampersand '&' causes PMD processing error in sql file - Lexical error in file [plsql] Ampersand '&' causes PMD processing error in sql file - Lexical error in file Jan 17, 2017
@jsotuyod jsotuyod added the a:bug PMD crashes or fails to analyse a file. label Jan 17, 2017
@jsotuyod jsotuyod added this to the 5.4.4 milestone Jan 17, 2017
@jsotuyod
Copy link
Member

As far as I can tell, the variable substitution is a feature purely of SQL*Plus, and not part of the PL/SQL language, and running such code directly on the database server would produce a similar error. As such, it's technically correct for the violation being reported.

Maybe we could allow such extensions by flagging some files as SQL*Plus... We would need to think this carefully though, SQL grammars vary wildly and this could open up way too many new issues / over complicate the grammar, plus we currently have no way of doing such flagging.... Maybe adding "sqlplus" as a "language version" for PLSQL?

@jsotuyod jsotuyod added an:enhancement An improvement on existing features / rules a:suggestion An idea, with little analysis on feasibility, to be considered and removed a:bug PMD crashes or fails to analyse a file. labels Jan 17, 2017
@adangel adangel removed this from the 5.4.4 milestone Jan 28, 2017
@hvbtup
Copy link
Contributor

hvbtup commented Feb 10, 2021

We often use lexical variables for defining the version number in a comment and in a constant. For example, our code looks like this:

DEFINE MODUL_VERSION='V 1.21.6'

CREATE OR REPLACE PACKAGE Package_Kunden_Subsys
/*
 * Modul        :  package_kunden_subsys.pck
 * Version      :  &MODUL_VERSION, $Revision: 200871 $
 *
 * Description:  Bla Blah
...
END PACKAGE Package_Kunden_Subsys;
/

CREATE OR REPLACE PACKAGE BODY Package_Kunden_Subsys AS

  -- Version des Moduls:                   Name                   Synonym            Version         Revision
  modul_version constant varchar2(100) := 'Package_Kunden_Subsys, Pck_Kunden_Subsys, &MODUL_VERSION, $Revision: 200871 $';
....

or for types in a slightly different way:

REM Modul        :  veroeffentlichungobj.typ
REM
REM Version      :  &MODUL_VERSION, $Revision: 146960 $
create or replace type VeroeffentlichungObj
...

The & inside the package comment does not cause any problems with PMD, but the comment inside the REM (SQL*Plus) comment causes the TokenMgrError.

Of course, it is not easy for the parser to handle lexical variable references like this. This is even more true if one keeps in mind that the interpretation of the characters & and \ depends on the SQL*Plus settings SET DEFINE and SET ESCAPE.

But is it perhaps possible to just ignore the & inside comments REM ... -- ... and /* ... */ ?

@hvbtup
Copy link
Contributor

hvbtup commented Feb 11, 2021

Regarding the dependency of "" and "&" from the environment (e.g. the SQ*Plus settings for DEFINE and ESCAPE), I came to the conclusion that probably the best way to avoid problems with these characters is to run a pre-processor to before presenting the source to PMD. The pre-processor is probably out of the scope of PMD - or is there a standard way to pre-process file within PMD before presenting it to the parser? - I don't know, I'm a PMD noob.
Anyway my idea of the pre-processor is like this:
The pre-processor (let's call it SPP for SQL pre-processor fort short) processes two files:

  1. A file called definitions.sql (the file name should be configurable).
  2. The current source code file

The SPP maintains the following state:

  • A string-string-dictionary d of the current lexical variable definitions
  • A boolean value defineActive (whether DEFINE is on or off)
  • A boolean value escapeActive (whether ESCAPE is on or off)
  • A character value escapeChar (the current ESCAPE character)

This state is initialized with SQLPlus factory defaults for DEFINE and ESCAPE.
The dictionary d is initialized empty, because the SQLPlus defaults like _DATE, _CONNECT and so on all depend on the environment (e.g. the DB connection).

The SPP processes file 1 (definitions.sql), then saves its state.
For all files given to PMD resp. the SPP, the state is re-initialized from this state before processing the file.

Processing a file, the SPP scans the file line by line.
If the line is a SET DEFINE or SET ESCAPE command, it updates its internal state (e.g. sets defineActive = true). It then outputs the line as is.
For all other lines:
The first step is to substitute lexical variables (if defineActive is true) and escape characters (if escapeActive is true) and then output the modified line. The next step is to check if the (modified) line is a DEFINE command. If yes, update the dictonary d.

When substituting lexical variables in the first step, this works the same as in SQL*Plus, including the special handling of a dot to terminate the variable name.
It may be useful to keep variables which are explicitly set empty in d instead of removing the variable name from d.

If the preprocessor finds a &NAME at a given time when defineActive is set and NAME is not a key in d, it emits a warning (because this means that the SQL script is depending on unknown conditions) and produces the same output as if the variable is empty. The developer should then add a DEFINE NAME=somevalue to the defines.sql script.

The only case (I think) where the behaviour of the SPP differs from SQLPlus is the command
COLUMN xxx NEW_VALUE yyy
This command (to be more precise, the following SQL query) reads the column xxx from the database to set the lexical variable yyy. Since the SPP has no DB connection, it can not do this. But anyway I don't think this is important for static code analysis. Furthermore an easy workround is to add a definition to definitions.sql

The SPP would solve the OP's problem with the start &filename.sql command and several of our problems (e.g. we use DEFINEs in some for our scripts to achieve the same effect as conditional compilation with $IF).

However, a & in a comment is still valid and this is independent of whether DEFINE is ON or OFF.
So I think that in SQL commands like SET, DEFINE, ECHO, START, REM etc (where we know that the command is always a single line), the & should be accepted as a valid character by the parser.

E.G. we have a & in our company name and thus the & is present in comments of the form

REM Copyright 2021 by Flourish&Blotts

after pre-processing (the original line could be Flourish&Blotts with DEFINE ON and ESCAPE ).

A side-effect of the SPP is that the columns in the pre-processed file may be different from the original file in those line that were actually modified by the SPP. However, this only matters if PMD finds errors in these lines, and I think that a developer should be clever enough to cope with this.

@adangel
Copy link
Member

adangel commented Feb 11, 2021

Thanks for looking into this!

But is it perhaps possible to just ignore the & inside comments REM ... -- ... and /* ... */ ?

Yes, that's definitely the first step. As far as I see, we should already ignore "&" inside "/* .../" and "/** .../" and "-- single line" style comments. We also have obviously some minimal support for some? SQLPlus commands, e.g. we support "REMARK", but not "REM". I think, that should be not too difficult, to add support for "REM" additionally (the doc is here: https://docs.oracle.com/en/database/oracle/oracle-database/19/sqpug/REMARK.html#GUID-F4BF8426-AFE4-49C9-B073-57CB91B440F8).
From what I see in our grammar, the original problem should not exist - since after "SPOOL" we should ignore all tokens until end of line.
But probably the problem is, that "&" is no token at all.... I'm wondering, whether this token has a meaning in SQL? We have it defined in some token, but the token is actually never used, so "&" is not allowed anywhere....
I didn't see, that "&" has actually any meaning in pl/sql - so I guess, we could declare this as a token. It would not be available for identifier names, column names, etc. But don't think, this character is used for that....

Your idea about the preprocessor is interesting. Currently this would be outside of PMD - that's not something we support at the moment. Maybe we can support something like this in the future (some kind of additional configurable processing stage before parsing).

You also found the side-effect, which is my concern, when adding a preprocessor - PMD would report violations on the preprocessed files and not on the original files. Yes, developers should be able to trace that back to the original file, but computers might not be, e.g. if this is used in a automated CI tool (code climate, etc.). This effect needs to have some thought. Not sure, what the solution would be here. The preprocessed files could be published as well for the PMD report, or if the preprocessor stage is integrated in PMD, we might be able to keep a mapping between original locations and preprocessed locations. This mapping could be actually created by the preprocessor outside of PMD as well - then a postprocessor for the PMD report is needed to change the reported locations back to the original.

@adangel adangel added this to the 6.33.0 milestone Mar 25, 2021
@adangel adangel added the in:grammar About the grammar of a lexer or parser, eg, a parse/lex exception label Mar 25, 2021
adangel added a commit to hvbtup/pmd that referenced this issue Mar 25, 2021
adangel added a commit to hvbtup/pmd that referenced this issue Mar 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:suggestion An idea, with little analysis on feasibility, to be considered an:enhancement An improvement on existing features / rules in:grammar About the grammar of a lexer or parser, eg, a parse/lex exception
Projects
None yet
4 participants