illegal primary in regular expression ^(|[^0-9])$ #4

garyfeng opened this Issue Apr 6, 2015 · 7 comments


None yet

2 participants

garyfeng commented Apr 6, 2015

Got the following error when I tried json.awk on OSX.

awk: illegal primary in regular expression ^(|[^0-9])$ at [^0-9])$
 source line number 158 source file json.awk
 context is
        } else if (TOKEN ~ >>>  /^(|[^0-9])$/ <<< ) {
step- commented Apr 7, 2015

Hello garyfeng, thank you for logging this issue.

Removing '|' isn't the way to go. That regex is supposed to match either the null string or an integer number, with null string taking precedence over the number. If you remove '|' it might work for your particular input data, but might not work later on for generic data.
This issue is most likely due to the specific awk version and derivative that runs on your OSX box. Unfortunately I am unable to test on OSX. If you provide me with information about your version and a test case that reproduces your problem I can try and reproduce it on linux or Windows and come up with a patch.

What I need from you:

  1. OSX version
  2. awk version and if you know it, its derivative family (GNU, mawk, busybox, etc.)
  3. test case reproducing the reported error

Thank you

garyfeng commented Apr 7, 2015

FYI, OSX 10.10.2, awk version 20070501, and the script fails regardless of the data.

I realized that your code requires GNU AWK, which Apple doesn't support by default. I will have to look for a different solution. Thanks!

@garyfeng garyfeng closed this Apr 7, 2015
step- commented Apr 8, 2015

I'm surprised; what made you realize that JSON.awk needs GNU awk? It does not, I believe. For instance, on the kindle platform JSON.awk runs on busybox awk. Also, quoting

Written for awk, does not require gawk extensions

Feel free to reopen this issue in case you change your mind. Thank you.

@garyfeng garyfeng reopened this Apr 9, 2015
garyfeng commented Apr 9, 2015

I may have misread wrt gawk. But looks like this is an Apple thing. The following example fails:

awk '/^(|.*)$/ {print }' 

but removing the '|' works

awk '/^(.*)$/ {print }' 

OSX AWK version

>> awk --version
awk version 20070501
garyfeng commented Apr 9, 2015

Sounds like the OSX AWK expects | to have the form A|B, but not |A or B| .

$ awk '/^([A-Z]|)$/ {print }' 
awk: illegal primary in regular expression ^([A-Z]|)$ at $
 source line number 1
 context is
     >>> /^([A-Z]|)$/ <<< 

but this is fine:

$ awk '/^([A-Z]|[0-9])$/ {print }' 

This may be relevant: Quote:

From: Mike Frysinger <vapier at gentoo dot org>
To: libc-alpha at sourceware dot org
Date: Mon, 24 Dec 2007 14:24:38 -0500
Subject: [patch] use POSIX awk expression rather than GNU extension

in elf/Makefile, GNU awk accepts this expression:
but POSIX says that '|' occurring right after '(' or right before ')' produces 
undefined results.  a replacement expression is:
attached patch makes this change.
step- commented Apr 10, 2015

Thanks, you're on top of it.

Indeed POSIX awk utilizes extended regular expressions (ERE) and the very last two lines of the POSIX ERE specification read (bold emphasis and numerals are mine):

[The ERE grammar does not permit (1) several constructs that previous sections specify as having undefined results:]
'|' appearing first or last in an ERE, or immediately following '|' or '(', or immediately preceding ')'
Implementations are permitted (2) to extend the language to allow these. Conforming (3) applications cannot use such constructs.

So, busybox awk and gawk, and even gawk --posix (I tested it), are non-conforming (3), as they extend the language, under permission (2), by allowing the non-permitted (1) construct.

I will push a fix for this issue in a short while. Thank you for following up so thoroughly.

@step- step- self-assigned this Apr 10, 2015
@step- step- closed this Apr 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment