This repository has been archived by the owner. It is now read-only.

Fix declaration of cdsmem in rpp #60

Open
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
2 participants
@olebole
Contributor

olebole commented May 15, 2017

This fixes a problem that rpp.e does not work correctly when build with optimization: since cdsmem is initialized everywhere in ratlib as integer mem(1) the C compiler (after f2c) assumes that the variable is not an array, but a pointer to a single variable, and any subsequent write can be replaced by just writing the last value.

It should really be a integer mem(60000) (resp. MEMSIZE in the ratfor source).

This closes #59 and closes #73. The second patch also closes #53.

Fix declaration of cdsmem in rpp
This fixes a problem that rpp.e does not work correctly when build
with optimization: since cdsmem is initialized everywhere as `integer
mem(1)` the C compiler (after f2c) assumes that the variable is not an
array, but a pointer to a single variable, and any subsequent write
can be replaced by just writing the last value.

It should really be a `integer mem(60000)`.
@iraf

This comment has been minimized.

Show comment
Hide comment
@iraf

iraf May 15, 2017

Owner
Owner

iraf commented May 15, 2017

@olebole

This comment has been minimized.

Show comment
Hide comment
@olebole

olebole May 15, 2017

Contributor

GCC 6.3.0, Debian Stretch, default optimization level (no change). Patches to the source as usual. I didn't change anyting that is not in the pull requests.
No idea about Mem, since the fix was about building, not running IRAF (and Mem is in the IRAF libs, right?).
The built executables fail in many commands with an "Out of Memory" error; this may be connected to the Mem common. I'll open another issue once I verified the failure; you may have a look then.

Contributor

olebole commented May 15, 2017

GCC 6.3.0, Debian Stretch, default optimization level (no change). Patches to the source as usual. I didn't change anyting that is not in the pull requests.
No idea about Mem, since the fix was about building, not running IRAF (and Mem is in the IRAF libs, right?).
The built executables fail in many commands with an "Out of Memory" error; this may be connected to the Mem common. I'll open another issue once I verified the failure; you may have a look then.

@olebole

This comment has been minimized.

Show comment
Hide comment
@olebole

olebole May 15, 2017

Contributor

If you are interested: in dsget.f are two subsequent assignments to the cdsmem array, and the first one was not executed with an optimization level above zero (-O1 and up). When changing the cdsmem size in this file, it got executed.

Contributor

olebole commented May 15, 2017

If you are interested: in dsget.f are two subsequent assignments to the cdsmem array, and the first one was not executed with an optimization level above zero (-O1 and up). When changing the cdsmem size in this file, it got executed.

@olebole

This comment has been minimized.

Show comment
Hide comment
@olebole

olebole May 22, 2017

Contributor

@iraf The mem common block is similarly affected, yes. See #73.

However, first the origin of the mem common block declarations is difficult to find, since it is included literally by rpp.e, and the code that does it is a preprocessed Ratfor code (unix/boot/spp/rpp/rpprat/poicod.r; the preprocessed Fortran file is unix/boot/spp/rpp/rppfor/poicod.f). Changing this is a mess...; however the second patch in this commit does it (e5f8b2c).

This patch is buggy since it does not address the main problem of the Mem common: This common is a fake; it is defined at address NULL (in zvsjmp.s). Instead, it addresses real pointers, which were just "translated" to (fake) indices. F.e. accessing memd[0] will create a segfault. The translated code is not ISO-C and may have any unexpected result in the future again.

Instead, this common block should be transformed into normal pointer arithmetics. This should not be a problem, since it is translated to C via f2c anyway... and newer Fortran version have pointer arithmetic on board.

But as a first step, we create the arrays with 1024 entries. This closes #73, see the Travis results for this patch.

Contributor

olebole commented May 22, 2017

@iraf The mem common block is similarly affected, yes. See #73.

However, first the origin of the mem common block declarations is difficult to find, since it is included literally by rpp.e, and the code that does it is a preprocessed Ratfor code (unix/boot/spp/rpp/rpprat/poicod.r; the preprocessed Fortran file is unix/boot/spp/rpp/rppfor/poicod.f). Changing this is a mess...; however the second patch in this commit does it (e5f8b2c).

This patch is buggy since it does not address the main problem of the Mem common: This common is a fake; it is defined at address NULL (in zvsjmp.s). Instead, it addresses real pointers, which were just "translated" to (fake) indices. F.e. accessing memd[0] will create a segfault. The translated code is not ISO-C and may have any unexpected result in the future again.

Instead, this common block should be transformed into normal pointer arithmetics. This should not be a problem, since it is translated to C via f2c anyway... and newer Fortran version have pointer arithmetic on board.

But as a first step, we create the arrays with 1024 entries. This closes #73, see the Travis results for this patch.

@iraf

This comment has been minimized.

Show comment
Hide comment
@iraf

iraf May 22, 2017

Owner
Owner

iraf commented May 22, 2017

@olebole

This comment has been minimized.

Show comment
Hide comment
@olebole

olebole May 22, 2017

Contributor

Which means you agree with my arguments and will merge this PR? Or not?
As shown Travis, the solution works fine for the 64-bit Linux version.
And I don't see a difference (for the port) whether you do the arithemetics with pointers or with an array. For the standards compatibility (and therefore for the safety that the code works with a standard conform compiler) this difference is however huge,
But IRAF and coding standards is an oxymoron, right?

Contributor

olebole commented May 22, 2017

Which means you agree with my arguments and will merge this PR? Or not?
As shown Travis, the solution works fine for the 64-bit Linux version.
And I don't see a difference (for the port) whether you do the arithemetics with pointers or with an array. For the standards compatibility (and therefore for the safety that the code works with a standard conform compiler) this difference is however huge,
But IRAF and coding standards is an oxymoron, right?

@iraf

This comment has been minimized.

Show comment
Hide comment
@iraf

iraf May 22, 2017

Owner
Owner

iraf commented May 22, 2017

@olebole

This comment has been minimized.

Show comment
Hide comment
@olebole

olebole May 22, 2017

Contributor

No, it means there may not be support for GCC 6 unless you can find the specific optimization causing the problem and a corresponding flag to disable it, or some other systemic workaround.

Sorry, I don't see a problem here: the patch creates a working IRAF version using the traditional memory allocation scheme. The specific gcc optimization problem (not gcc-6, but gcc-4.8, which is already some years old) that created problems like #73 or #59 is solved by this patch, without touching the layout of the common block. If you don't believe, please give me an example that works with 2.16.1 and (in your opinion) would not work with this patch. I will then include it into the test procedure.

Currently, the code compiles (and runs the tests!) on Ubuntu 14.04, Ubuntu 12.04 (both 64 bit), MacOSX 32 bit, Debian Squeeze, Debian Wheezy, Debian Jessie, Debian Stretch (both 32 and 64 bit). So. please specify what you think doesn't work, and I will test (and, if needed, fix) on these architectures. (Testing on Ubuntu 32 bit is still not possible due to #72 and testing on MacOS 64 bit due to #50.)

Binary distributions can be built with older compilers that will satisfy nearly every other user.

Optimization has been improved over the years, so using an old compiler would reduce the possible performance. If you look into the IRAF Roadmap voting from 2014, you find a significant number of users wanting "Performance Improvements". Modern compilers improve here. Why do you want to ignore that?

Contributor

olebole commented May 22, 2017

No, it means there may not be support for GCC 6 unless you can find the specific optimization causing the problem and a corresponding flag to disable it, or some other systemic workaround.

Sorry, I don't see a problem here: the patch creates a working IRAF version using the traditional memory allocation scheme. The specific gcc optimization problem (not gcc-6, but gcc-4.8, which is already some years old) that created problems like #73 or #59 is solved by this patch, without touching the layout of the common block. If you don't believe, please give me an example that works with 2.16.1 and (in your opinion) would not work with this patch. I will then include it into the test procedure.

Currently, the code compiles (and runs the tests!) on Ubuntu 14.04, Ubuntu 12.04 (both 64 bit), MacOSX 32 bit, Debian Squeeze, Debian Wheezy, Debian Jessie, Debian Stretch (both 32 and 64 bit). So. please specify what you think doesn't work, and I will test (and, if needed, fix) on these architectures. (Testing on Ubuntu 32 bit is still not possible due to #72 and testing on MacOS 64 bit due to #50.)

Binary distributions can be built with older compilers that will satisfy nearly every other user.

Optimization has been improved over the years, so using an old compiler would reduce the possible performance. If you look into the IRAF Roadmap voting from 2014, you find a significant number of users wanting "Performance Improvements". Modern compilers improve here. Why do you want to ignore that?

@olebole

This comment has been minimized.

Show comment
Hide comment
@olebole

olebole May 29, 2017

Contributor

@iraf

Binary distributions can be built with older compilers that will satisfy nearly every other user.

One more comment here: The possibility to build own (SPP) tasks is one of the features of IRAF, and they will be built with the compiler that is available on the user's system. Currently this is buggy and will produce errors that are very difficult to detect (assignments are lost, or array access is optimized out). Imagine that someone just changed a trivial parameter in his task and then recompiled -- the resulting binary may produce scientifically wrong results. Do you really want to have that?

The concept of IRAF (compilation as part of the user interface) requires that IRAF supports the compiler on the user system, not just a special outdated version on one single compilation machine worldwide.

Contributor

olebole commented May 29, 2017

@iraf

Binary distributions can be built with older compilers that will satisfy nearly every other user.

One more comment here: The possibility to build own (SPP) tasks is one of the features of IRAF, and they will be built with the compiler that is available on the user's system. Currently this is buggy and will produce errors that are very difficult to detect (assignments are lost, or array access is optimized out). Imagine that someone just changed a trivial parameter in his task and then recompiled -- the resulting binary may produce scientifically wrong results. Do you really want to have that?

The concept of IRAF (compilation as part of the user interface) requires that IRAF supports the compiler on the user system, not just a special outdated version on one single compilation machine worldwide.

@olebole

This comment has been minimized.

Show comment
Hide comment
@olebole

olebole Jun 7, 2017

Contributor

Another reason not to just stay with one precompiled binary is #85.

Contributor

olebole commented Jun 7, 2017

Another reason not to just stay with one precompiled binary is #85.

@olebole

This comment has been minimized.

Show comment
Hide comment
@olebole

olebole Jun 28, 2017

Contributor

@iraf

Anything you do to do "real pointer arithmetic" will likely break everything in the 64-bit port and require extensive code changes because it all depends on the relative sizes of pointers.

Just to clarify that: This PR does not change the pointer arithmetic. It just changes the declaration of the mem common from integer mem( 1) to integer mem( 2014), to avoid that the compiler assumes a single value here.

Contributor

olebole commented Jun 28, 2017

@iraf

Anything you do to do "real pointer arithmetic" will likely break everything in the 64-bit port and require extensive code changes because it all depends on the relative sizes of pointers.

Just to clarify that: This PR does not change the pointer arithmetic. It just changes the declaration of the mem common from integer mem( 1) to integer mem( 2014), to avoid that the compiler assumes a single value here.

@olebole

This comment has been minimized.

Show comment
Hide comment
@olebole

olebole Jul 3, 2017

Contributor

@olebole wrote:

Currently, the code compiles (and runs the tests!) on Ubuntu 14.04, Ubuntu 12.04 (both 64 bit), MacOSX 32 bit, Debian Squeeze, Debian Wheezy, Debian Jessie, Debian Stretch (both 32 and 64 bit). So. please specify what you think doesn't work, and I will test (and, if needed, fix) on these architectures. (Testing on Ubuntu 32 bit is still not possible due to #72 and testing on MacOS 64 bit due to #50.)

To give an update here: in the meantime, the other problems are fixed and so the code compiles and runs the tests on all Debian/Ubuntu/MacOS (and on a few other) platforms, both 32 or 64 bit, both current and older versions. So, I don't see your (@iraf) argument

will likely break everything in the 64-bit port

and would ask to provide a specific example, what you mean here.

Contributor

olebole commented Jul 3, 2017

@olebole wrote:

Currently, the code compiles (and runs the tests!) on Ubuntu 14.04, Ubuntu 12.04 (both 64 bit), MacOSX 32 bit, Debian Squeeze, Debian Wheezy, Debian Jessie, Debian Stretch (both 32 and 64 bit). So. please specify what you think doesn't work, and I will test (and, if needed, fix) on these architectures. (Testing on Ubuntu 32 bit is still not possible due to #72 and testing on MacOS 64 bit due to #50.)

To give an update here: in the meantime, the other problems are fixed and so the code compiles and runs the tests on all Debian/Ubuntu/MacOS (and on a few other) platforms, both 32 or 64 bit, both current and older versions. So, I don't see your (@iraf) argument

will likely break everything in the 64-bit port

and would ask to provide a specific example, what you mean here.

Fix declaration of the Mem common
This fixes again a problem that with optimization, if an array of
length 1 (or 2) is repeatedly accessed, some of the accesses may be
removed.

Generally, this is translated into C code (by f2c), and in C an access
outside of the declared array bounds leads to undefined behaviour.

I still think that this patch is buggy since it does not address the
main problem of the Mem common: This common is a fake; it is defined
at address NULL (in zvsjmp.s). Instead, it addresses __real__
pointers, which were just "translated" to (fake)
indices. F.e. accessing memd[0] will create a segfault.

Instead, this common block should be transformed into normal pointer
arithmetics. This should not be a problem, since it is translated to C
via f2c anyway... and newer Fortran version have pointer arithmetic on
board.

But as a first step, we create the arrays with 1024 entries.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.