Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

difference between --with-included-regex and without #2

Closed
grylem opened this issue Jul 21, 2021 · 39 comments
Closed

difference between --with-included-regex and without #2

grylem opened this issue Jul 21, 2021 · 39 comments

Comments

@grylem
Copy link

grylem commented Jul 21, 2021

Hi
please check this:
./configure --disable-nls --enable-all-extensions
make
make check
All 137 tests were successful.

printf %s\r\n a b c
printf '%s\n' ',s/\r$//' w q | ed -s file
The ed removes '\r'

make clean
./configure --disable-nls --enable-all-extensions --with-included-regex
make
make check
All 137 tests were successful.

printf %s\r\n a b c
printf '%s\n' ',s/\r$//' w q | ed -s file
In this case, the ed does not delete '\r'

and this
ed.1.gz
Regards.

@slewsys
Copy link
Owner

slewsys commented Jul 22, 2021

From man regcomp on macOS 10.13:

The regex implementation in Mac OS X 10.8 and later is based on a heavily modified subset of TRE (http://laurikari.net/tre/).

This is new since I worked on ed for macOS. TRE is a very interesting library; one that I wanted to adopt at one time. But it had many issues compared to GNU regex library when I looked at it . But since you're having better luck than I am linking against macOS regex, I guess I shouldn't advise you against using macOS regex :)

To answer your question about the behavioral differences between macOS regex and GNU regex, macOS regex has a compile-time macro, REG_ENHANCED, which ed uses when available. To read more about it, see ENHANCED FEATURES section of man re_format. In the particular case of the expression \r, REG_ENHANCED adds recognition of C-style literals like \r. This is a non-POSIX extension unique to macOS regex. It is not supported by GNU regex, which is what you observed.

The macro REG_ENHANCED occurs in the source file src/re.c. It can be disabled by applying the attached diff.
disable-REG_ENHANCED-diff.txt

@slewsys
Copy link
Owner

slewsys commented Jul 22, 2021

I'm unable to get ed to work reliably when linked against macOS regex using either GNU C or Clang. I assume you're using Xcode? If so, I'm curious what version (e.g., the output of xcodebuild -version).

@grylem
Copy link
Author

grylem commented Jul 22, 2021

Tested on
macOS 10.13.6
Xcode 10.1
Build version 10B61
with installed CLI

printf %s\\r\\n a b c >file

printf '%s\n' ',s/\r$//' w q | ed -s file
works for your version of the ed compiled without --with-included-regex and for version from AT&T AST. For Apple version and your version with --with-included-regex does not work.

printf 'g/\r*$/s///\nwq\n' | ed file
works for all versions

compiled version with and without GNU regex
ed.tar.gz

Regards.

@slewsys
Copy link
Owner

slewsys commented Jul 22, 2021

My macOS and Xcode versions are the same as yours. So it appears that the problem is my system, not ed or macOS regex!

My previous message was sent prematurely, I don't know if you saw the latest edit: macOS regex has an ENHANCED variant that this version of ed enables which adds C-style literals. So what you're seeing is expected behavior, but it's a POSIX extension not supported by GNU regex. If you don't care for macOS ENHANCED regex, I attached in the message above a patch to disable it.

@slewsys
Copy link
Owner

slewsys commented Jul 23, 2021

and this
ed.1.gz

Thank you for the corrections.

@grylem
Copy link
Author

grylem commented Jul 26, 2021

So today for macOS you recommend using configure without --with-included-regex but in this case, we return to the old problem :)
printf %s\\n a b c
printf '%s\n' ',s/\(a\|b\)/e/g' ,p q | ed file
e
e
c
And this does not posix-compliant or am I mistaken?

Thank you for the corrections.

I am glad to help.

@slewsys
Copy link
Owner

slewsys commented Jul 28, 2021

No, you are absolutely correct. I think I was distracted by my (initial) inability to link against macOS regex. Anyhow, I agree that basic regular expressions should not be extended in ed. Those extensions are explicitly disabled in GNU regex for that reason. So I've added a patch to disable extensions to basic regular expressions in macOS regex as well. Thank you for your perseverance in bringing this to my attention :)

Less clear is how extended regular expressions should be handled, since support for extended regular expressions in ed is already an extension of the POSIX standard. What was intended for GNU is to enable all extensions unless the environment variable POSIXLY_CORRECT exists, in which case POSIX extended regular expressions are used.

Likewise, when ed is configured with extended regular expression support and linked against macOS regex library, if extended regular expressions are requested (via option -E or -r), then all extensions are enabled by default.

@grylem
Copy link
Author

grylem commented Jul 29, 2021

Thanks, now
printf '%s\n' ',s/(a|b)/e/g' ,p q | ed -E file
works fine instead
printf '%s\n' ',s/\(a\|b\)/e/g' ,p q | ed file

but doesn't work
printf '%s\n' ',s/\r$//' w q | ed -s file
only
printf 'g/\r*$/s///\nwq\n' | ed file, or printf '%s\n' ',s/\r$//' w q | ed -Es file
which option matches the posix specification?
and I also have some warnings like
cbc.c:167:23: warning: incompatible pointer types passing 'int [2]' to parameter of type 'const char *'
signal.c:106:46: warning: result of comparison of constant 2147483647 with expression of type 'unsigned short' is always
and several others.

@slewsys
Copy link
Owner

slewsys commented Jul 29, 2021

doesn't work
printf '%s\n' ',s/\r$//' w q | ed -s file

This matches the POSIX spec.

only
printf 'g/\r*$/s///\nwq\n' | ed file, or printf '%s\n' ',s/\r$//' w q | ed -Es file
which option matches the posix specification?

Even POSIX extended regular expressions do not support C-style literals like \r, so this does NOT match the POSIX spec. Nonetheless most people probably don't want POSIX extended regular expressions, so extensions to POSIX are permitted in this case.

and I also have some warnings like

Yeah, the code base needs some attention. I'll update Gnulib and gettext and address some known bugs as time allows.

@grylem
Copy link
Author

grylem commented Jul 29, 2021

Sorry for perseverance If this
printf '%s\n' ',s/\r$//' w q | ed -s file
posixly correct, why it does not work in the latest version? Previously it worked :) . I checked it with a AT&T version of ed
printf '%s\n' ',s/\r$//' w q | ed -Ss file where -S is "Enable strict regular expression interpretation" and its worked like with your old version.
from https://pubs.opengroup.org/onlinepubs/9699919799/utilities/ed.html#top

The characters listed in XBD Escape Sequences and Associated Actions ( '\', '\a', '\b', '\f', '\r', '\t', '\v' ) shall be written as the corresponding escape sequence; the '\n' in that table is not applicable.

@slewsys
Copy link
Owner

slewsys commented Jul 30, 2021

Sorry for perseverance If this
printf '%s\n' ',s/\r$//' w q | ed -s file
posixly correct, why it does not work in the latest version?

What I meant to say is that, according to POSIX, that should NOT work. So the current ed behavior is correct. The previous behavior was incorrect.

@slewsys
Copy link
Owner

slewsys commented Jul 30, 2021

I checked it with a AT&T version of ed

The AT&T version of ed recognizes \r? Is that linked against AT&T's regular expressions library?

@slewsys
Copy link
Owner

slewsys commented Jul 30, 2021

Here is the latest official Open Group specification for regular expressions. POSIX standards are collaboration of The Open Group and ISO. You can verify that C-style escape sequences like \r are not part of the standard.

@slewsys
Copy link
Owner

slewsys commented Jul 30, 2021

For reference, GNU sed recognizes \r, even when invoked with the --posix option, which is supposed to disable all GNU extensions. This represents a bug, but illustrates that POSIX standards are often not respected in the open source world.

@grylem
Copy link
Author

grylem commented Jul 30, 2021

So the current ed behavior is correct. The previous behavior was incorrect.

Thanks.

The AT&T version of ed recognizes \r? Is that linked against AT&T's regular expressions library?

In my case

@grylem
Copy link
Author

grylem commented Jul 30, 2021

For the latest version of your ed
./configure --disable-nls --enable-all-extensions
make
CDPATH="${ZSH_VERSION+.}:" && cd . && /bin/sh '/Volumes/test/ed/missing' aclocal-1.16 -I m4
missing file lib/lc-charset-dispatch.c
configure.ac:110: error: expected source file, required through AC_LIBSOURCES, not found
m4/gnulib-comp.m4:711: gl_INIT is expanded from...
configure.ac:110: the top level
autom4te: /usr/bin/m4 failed with exit status: 1
aclocal-1.16: error: autom4te failed with exit status: 1
make: *** [aclocal.m4] Error 1

@slewsys
Copy link
Owner

slewsys commented Jul 30, 2021

Hm... I'm also seeing some strangeness that I can't fully explain. I ended up hacking the script `missing' to always succeed. I may need to check in with the GNU autoconf folks to get to the bottom of it. In the mean time, try a clean clone of HEAD:

git clone  https://github.com/slewsys/ed.git
cd ./ed
./configure  --disable-nls --enable-all-extensions
make

make check is still not working for me with the current commit. Hopefully you'll have better luck!

@grylem
Copy link
Author

grylem commented Aug 1, 2021

Hopefully you'll have better luck!

No all the same

@slewsys
Copy link
Owner

slewsys commented Aug 11, 2021

Ed builds and tests okay again on macOS. The script `missing' is still hacked, which needs to be addressed. If you're trying to update a git repository, you may need to first reset to a previous commit:

git reset --hard  3972b2f79f17e8a20e9e2d835949135fd9d6cd34
git pull
git clean -fd

@grylem
Copy link
Author

grylem commented Aug 11, 2021

Ed builds and tests okay again on macOS.

yes, with the exception of old warnings.

@slewsys
Copy link
Owner

slewsys commented Aug 12, 2021

I think the incompatible pointer is the main one? I'll address those as time permits. That the missing script needs to be hacked is annoying too.

@grylem
Copy link
Author

grylem commented Aug 12, 2021

I think the incompatible pointer is the main one?

No, I mean clang warning like
cbc.c:167:23: warning: incompatible pointer types passing 'int [2]' to parameter of type 'const char *'
signal.c:104:38: warning: result of comparison of constant 2147483647 with expression of type 'unsigned short' is always true [-Wtautological-constant-out-of-range-compare]
exec.c:1066:44: warning: format string is empty [-Wformat-zero-length]
etc.

@grylem
Copy link
Author

grylem commented Nov 4, 2021

HI
please fix
make[2]: *** No rule to make target malloc/dynarray.h', needed by malloc/dynarray.gl.h'. Stop.
Regards.

@slewsys
Copy link
Owner

slewsys commented Nov 11, 2021

Hi,
I'm unable to reproduce that with the following command sequence on vanilla macOS 10.13.6/Xcode 10.1 (build 10B61):

git clone https://github.com/slewsys/ed.git
cd ./ed
./configure --enable-all-extensions
make

Can you provide the commands and environment used to produce the error? Thank you!

@grylem
Copy link
Author

grylem commented Nov 12, 2021

Hi,
I tried on macOS Monterey (Apple Silicon M1) 12.0.1 Xcode 13.1
Regards.

@grylem
Copy link
Author

grylem commented Dec 23, 2021

Hi
any news about fix?
Regards.

@slewsys
Copy link
Owner

slewsys commented Dec 23, 2021

I apologize, not yet. The config script doesn't work right for 32-bit ARM either. But in your case, it may be simply a matter of upgrading the GNUlib library.

I hope to have some free time, maybe between now and New Year's, to investigate.

Thank you for your interest :)

@slewsys
Copy link
Owner

slewsys commented Dec 30, 2021

Hi,
The HEAD of branch main now compiles and tests OK on 32-bit ARM. It looks like I'm not able to run a GitHub Action CI for Apple M1 without Apple hardware, so if you'd be willing to try this, I'd be interested to know if it works for you.

In a previous commit, I hacked the script 'missing' for the Darwin platform. The real issue is that git does not preserve timestamps. So to build ed from a clone of the git repository, it's necessary to either touch the right files or else have a full complement of autoconf, automake, libtool, m4, etc.

To work around this, I've published a source archive: https://github.com/slewsys/ed/releases/download/v2.0.11/ed-2.0.11.gd16bf96.tgz

@grylem
Copy link
Author

grylem commented Dec 30, 2021

Hi
sorry no luck.
./configure make Makefile:1452: warning: overriding commands for target dist'
Makefile:1227: warning: ignoring old commands for target dist' /Applications/Xcode.app/Contents/Developer/usr/bin/make all-recursive Makefile:1452: warning: overriding commands for target dist'
Makefile:1227: warning: ignoring old commands for target dist' Making all in lib make[2]: *** No rule to make target malloc/dynarray.h', needed by malloc/dynarray.gl.h'. Stop. make[1]: *** [all-recursive] Error 1 make: *** [all] Error 2

if i add required files to lib/malloc
make .... In file included from mbrtowc.c:65: ./mbrtowc-impl.h:93:15: error: implicit declaration of function 'mbtowc_with_lock' is invalid in C99 [-Werror,-Wimplicit-function-declaration] res = mbtowc_with_lock (&wc, p, m); ^ 1 error generated. make[4]: *** [mbrtowc.o] Error 1 make[3]: *** [all-recursive] Error 1 make[2]: *** [all] Error 2 make[1]: *** [all-recursive] Error 1 make: *** [all] Error 2

@slewsys
Copy link
Owner

slewsys commented Dec 30, 2021

In theory, GNUlib (that's what is in the lib directory) shouldn't be needed at all. That's only there in case you want to compile against GNU regex.

On the other hand, it's also possible that M1 toolchain is missing some functions. In that case, we would need to add more components from GNUlib (presently only regex and supporting functions are included).

I'd like to first try removing GNUlib altogether. If the M1 toolchain is indeed missing something, then we can import what's missing from GNUlib as needed. To that end, I've created a new branch without-gnulib. Here is the source archive: https://github.com/slewsys/ed/releases/download/untagged-ebeb10d54a7114b10018/ed-2.0.11-without-gnulib.gfca6591.tgz

@grylem
Copy link
Author

grylem commented Dec 30, 2021

Sorry, I can't download your file but clone without-gnulib
./configure
make

In file included from aux.c:8:
./ed.h:98:20: error: redeclaration of 'sys_errlist' with a different type: 'const char *[]' vs 'const char *const []'
extern const char *sys_errlist[];
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/stdio.h:361:30: note: previous declaration is here
extern __const char __const sys_errlist[];
^
In file included from aux.c:8:
./ed.h:99:12: error: redeclaration of 'sys_nerr' with a different type: 'int' vs 'const int'
extern int sys_nerr;
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/stdio.h:360:20: note: previous declaration is here
extern __const int sys_nerr; /
perror(3) external variables */
^
2 errors generated.

define HAVE_STRERROR in config.h or remove strings from ed.h

buf.c:62:7: error: incomplete type 'void' is not assignable
STRTOL_THROW (l, ls, NULL, ERR);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./ed.h:767:15: note: expanded from macro 'STRTOL_THROW'
*(sp) = _s;
~~~~~ ^
buf.c:73:7: error: incomplete type 'void' is not assignable
STRTOL_THROW (l, cs, NULL, ERR);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./ed.h:767:15: note: expanded from macro 'STRTOL_THROW'
*(sp) = _s;
~~~~~ ^

@grylem
Copy link
Author

grylem commented Jan 2, 2022

Maybe this information will be useful
configure on intel (High Sierra) and after some mnipulation can be compiled on arm64(Monterey) as arm64 binary but many checks failed.
If ed compiled on High Sierra and running on Monterey(with Rosetta 2) many checks also failed.
Regards.

@slewsys
Copy link
Owner

slewsys commented Jan 2, 2022

I think the Gnulib (malloc/dynarray) issues have been sorted out. HEAD of branch main compiles and tests pass on several systems: *BSD, OmniOS, Linux (64-bit and 32-bit ARM) and macOS High Sierra. Perhaps if you can attach config.log from Apple M1, that might help me understand what's going on.

@grylem
Copy link
Author

grylem commented Jan 2, 2022

with and without gnulib
config.tar.gz

@slewsys
Copy link
Owner

slewsys commented Jan 2, 2022

If looks like configure is using both gcc and clang? Does setting CC envar help? For example:

@slewsys
Copy link
Owner

slewsys commented Jan 2, 2022

CC=clang ./configure  --enable-all-extensions [--with-included-regex]

@grylem
Copy link
Author

grylem commented Jan 2, 2022

CC=clang ./configure --enable-all-extensions [--with-included-regex]

no effect for ed-without-gnulib, same result like ./configure --enable-all-extensions

@grylem
Copy link
Author

grylem commented Jan 2, 2022

configure ed-without-gnulib on intel transfer source to arm
make
ld: the target architecture doesn't support executable stacks
cd src
cc -o ed *.o -lc
file ed
ed : Mach-O 64-bit executable arm64
cd ..
make check
...
All 137 tests were succesful.

@grylem grylem closed this as completed Jan 4, 2022
@slewsys
Copy link
Owner

slewsys commented Jan 4, 2022

Just a heads up: I pushed some patches to the branch without-gnulib and will maintain it until your work-around isn't requierd, maybe with the help of the autoconf group.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants