Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preprocessor fails on escaped double quote within quotes #13

Closed
tored opened this issue Apr 28, 2020 · 3 comments
Closed

Preprocessor fails on escaped double quote within quotes #13

tored opened this issue Apr 28, 2020 · 3 comments

Comments

@tored
Copy link
Contributor

tored commented Apr 28, 2020

From the IUP header file iupkey.h line 15

https://sourceforge.net/p/iup/iup/HEAD/tree/trunk/iup/include/iupkey.h

$parser = new  \PHPCParser\PreProcessor\Parser();
$parser->parse('iupkey.h', <<<HEADER
#define K_quotedbl    '\"'  /* 34 */
HEADER);`
Fatal error: Uncaught RuntimeException: Unterminated " in PreProcessor\Parser.php:112

Removing the block comment generates a different error

$parser = new  \PHPCParser\PreProcessor\Parser();
$parser->parse('iupkey.h', <<<HEADER
#define K_quotedbl    '\"'
HEADER);`
Fatal error: Uncaught LogicException: Unknown character literal escape sequence: '\\"' in PreProcessor\Tokenizer.php:101
@tored tored changed the title Preprocessor parser fails on escaped double quote within quotes Preprocessor fails on escaped double quote within quotes Apr 28, 2020
@ircmaxell
Copy link
Owner

Is that even legal syntax?

Anyway, you can add a branch to this switch to check for the double quote: https://github.com/ircmaxell/php-c-parser/blob/master/lib/PreProcessor/Tokenizer.php#L93-L102

If that works (for the second error at least).

As far as the first error, it's because it needs to parse comments as part of comment stripping. https://github.com/ircmaxell/php-c-parser/blob/master/lib/PreProcessor/Parser.php#L100-L130

Basically, an if block would need to be added to intercept ' characters, and skip those as well...

@tored
Copy link
Contributor Author

tored commented Apr 29, 2020

I'm far from an expert on C & the preprocessor, but if I'm reading the C standard correctly it should be legal to escape double quote like that in the preprocessor

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf

Define statement is

# define identifier replacement-list new-line

and replacement-list expands thru pp-tokens, preprocessing-token, character-constant, c-char-sequence, c-char, escape-sequence and finally to simple-escape-sequence and that is one of

\' \" \? \\ \a \b \f \n \r \t \v

Caveat, I normally don't read the C standard.

Given this test program

#include <stdio.h>
#define K_quotedbl    '\"'  /* 34 */

int main()
{
    printf("%c", K_quotedbl);
    return 0;
}

gcc -E -std=c89 -pedantic -Wall -Wextra macro.c

expands macro to

printf("%c", '\"');

That program will print double quote, Removing the escape of the double quote in the macro, the resulting program will expand to the following & but print the same result.

printf("%d", '"');

I see if I maybe have sometime next week to make a pull request to at least fix the double quote.

@ircmaxell
Copy link
Owner

Well, in that case, perhaps the first block should just eat any escape sequence. So the switch would also need cases for a, b, f, n, r, t, v. Something like;

                    switch ($buffer[1]) {
                        case '0':
                            $value = chr(0);
                            break;
                        case 'a':
                            $value = chr(0x07);
                            break;
                        case 'b':
                            $value = chr(0x08);
                            break;
                        case 'f':
                            $value = chr(0x0C);
                            break;
                        case 'n':
                            $value = chr(0x0A);
                            break;
                        case 'r':
                            $value = chr(0x0D);
                            break;
                        case 't':
                            $value = chr(0x09);
                            break;
                        case 'v':
                            $value = chr(0x0B);
                            break;
                        case 'x':
                            $value = chr(intval(substr($buffer, 2), 16));
                            break;
                        case 'n':
                            $value = chr(intval(substr($buffer, 2), 8));
                            break;
                        default: 
                            // default to the literal
                            $value = $buffer[1];
                    }

Thanks for digging into this!

@bwoebi bwoebi closed this as completed in 910eebc Mar 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants