Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pyspline doesn't work with ifx #73

Closed
A-CGray opened this issue May 23, 2024 · 7 comments · Fixed by #75
Closed

pyspline doesn't work with ifx #73

A-CGray opened this issue May 23, 2024 · 7 comments · Fixed by #75
Labels
bug Something isn't working

Comments

@A-CGray
Copy link
Member

A-CGray commented May 23, 2024

Description

pyspline tests are failing on the latest intel docker images that use the ifx compiler, see test outputs here

Copied here for archiving

pyspline
------------------------------------------------------------------------------------------------------------------------
/home/***/repos/pyspline/tests/reg_tests/test_curves.py:Test.test  ... FAIL (00:00:0.05, 177 MB)
Traceback (most recent call last):
  File "/home/***/repos/pyspline/tests/reg_tests/test_curves.py", line 99, in test
    self.regression_test(handler)
  File "/home/***/repos/pyspline/tests/reg_tests/test_curves.py", line 156, in regression_test
    run_project_test(curve, handler, test_name)
  File "/home/***/repos/pyspline/tests/reg_tests/test_curves.py", line 77, in run_project_test
    handler.root_add_val("{} projection test for point {} solution".format(test_name, i), s[i], tol=1e-9)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 182, in root_add_val
    self._add_values(name, values, **kwargs)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 321, in _add_values
    self.assert_allclose(values, db[name], name, rtol, atol, full_name)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 270, in assert_allclose
    np.testing.assert_allclose(actual, reference, rtol=rtol, atol=atol, err_msg=msg)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/numpy/testing/_private/utils.py", line 1504, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/numpy/testing/_private/utils.py", line 797, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=1e-09, atol=1e-09
Failed value for: LMS test k=4 projection test for point 0 solution
Mismatched elements: 1 / 1 (100%)
Max absolute difference: 0.77068815
Max relative difference: 1.
 x: array(0.)
 y: array(0.770688)

/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/coverage/inorout.py:519: CoverageWarning: Module pyspline was previously imported, but not measured (module-not-measured)
  self.warn(msg, slug="module-not-measured")
/home/***/repos/pyspline/tests/reg_tests/test_surfaces.py:Test.test  ... FAIL (00:00:0.26, 181 MB)
Traceback (most recent call last):
  File "/home/***/repos/pyspline/tests/reg_tests/test_surfaces.py", line 165, in test
    self.regression_test(handler)
  File "/home/***/repos/pyspline/tests/reg_tests/test_surfaces.py", line 185, in regression_test
    run_project_test(surface, handler, test_name)
  File "/home/***/repos/pyspline/tests/reg_tests/test_surfaces.py", line 95, in run_project_test
    handler.root_add_val("{} point {} projection u".format(test_name, pt), u, tol=eps)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 182, in root_add_val
    self._add_values(name, values, **kwargs)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 321, in _add_values
    self.assert_allclose(values, db[name], name, rtol, atol, full_name)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 270, in assert_allclose
    np.testing.assert_allclose(actual, reference, rtol=rtol, atol=atol, err_msg=msg)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/numpy/testing/_private/utils.py", line 1504, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/numpy/testing/_private/utils.py", line 797, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=1e-08, atol=1e-08
Failed value for: surface with ku=2, kv=4, nCtlu=5, nCtlv=5 point [0, 0, 0] projection u
Mismatched elements: 1 / 1 (100%)
Max absolute difference: 3.71377595e-06
Max relative difference: 0.02136224
 x: array(0.00017)
 y: array(0.000174)

/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/coverage/inorout.py:519: CoverageWarning: Module pyspline was previously imported, but not measured (module-not-measured)
  self.warn(msg, slug="module-not-measured")
/home/***/repos/pyspline/tests/reg_tests/test_volumes.py:Test.test  ... FAIL (00:00:0.50, 179 MB)
Traceback (most recent call last):
  File "/home/***/repos/pyspline/tests/reg_tests/test_volumes.py", line 66, in test
    self.regression_test(handler)
  File "/home/***/repos/pyspline/tests/reg_tests/test_volumes.py", line 2220, in regression_test
    run_project_test(volume, handler, test_name, pts=test_pts_outside, volBounds=volBounds)
  File "/home/***/repos/pyspline/tests/reg_tests/test_volumes.py", line 41, in run_project_test
    handler.root_add_val("{} project point u for pt={}".format(test_name, pt), u, tol=eps)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 182, in root_add_val
    self._add_values(name, values, **kwargs)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 321, in _add_values
    self.assert_allclose(values, db[name], name, rtol, atol, full_name)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/baseclasses/testing/pyRegTest.py", line 270, in assert_allclose
    np.testing.assert_allclose(actual, reference, rtol=rtol, atol=atol, err_msg=msg)
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/numpy/testing/_private/utils.py", line 1504, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/numpy/testing/_private/utils.py", line 718, in assert_array_compare
    flagged = func_assert_same_pos(x, y, func=isnan, hasval='nan')
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/numpy/testing/_private/utils.py", line 688, in func_assert_same_pos
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=1e-08, atol=1e-08
Failed value for: volume bounds with ku=2, kv=3, kw=4 project point u for pt=[ 0.16  -0.013  0.39 ]
x and y nan location mismatch:
 x: array(nan)
 y: array(0.5)

/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/coverage/inorout.py:519: CoverageWarning: Module pyspline was previously imported, but not measured (module-not-measured)
  self.warn(msg, slug="module-not-measured")


The following tests failed:
test_curves.py:Test.test
test_surfaces.py:Test.test
test_volumes.py:Test.test


Passed:  0
Failed:  3
Skipped: 0


Ran 3 tests using 8 processes
Wall clock time:   00:00:2.60

Name                                                                                         Stmts   Miss  Cover
----------------------------------------------------------------------------------------------------------------
/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/pyspline/pySurface.py     585    247    58%
/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/pyspline/pyCurve.py       401    [219](https://github.com/mdolab/docker/actions/runs/9069190501/job/25147882941#step:13:220)    45%
/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/pyspline/pyVolume.py      486    308    37%
/home/***/.pyenv/versions/3.11.9/lib/python3.11/site-packages/pyspline/utils.py         153     99    35%
----------------------------------------------------------------------------------------------------------------

Steps to reproduce issue

Current behavior

Expected behavior

Code versions

  • Operating System:
  • Python:
  • OpenMPI:
  • CGNS:
  • PETSc:
  • Compiler:
  • This repository:
@A-CGray A-CGray added the bug Something isn't working label May 23, 2024
@lamkina
Copy link

lamkina commented Aug 5, 2024

@eirikurj @A-CGray Here are the details I've uncovered:

  1. src/projections.F90 uses an uppercase 'F' extension and requires compiler pre-processing. This affects the line search because the line search parameters are defined as global constants using a #define statement:

    ! Define some parameters used for all projection functions
    #define LSFailMax 2
    #define wolfe .001
    #define nLine 20
  2. Our current compiler setup with ifx doesn't support the necessary pre-processing. I found this thread that somewhat explains the basics of the issue. This thread may also be relevant w.r.t compiler flags for pre-processing.

Either A) We figure out the correct compiler setup for the pre-processing or B) We slightly modify the projections so they don't require pre-processing. Let me know what you think.

@A-CGray
Copy link
Member Author

A-CGray commented Aug 5, 2024

Nice, thanks for finding this @lamkina . If this is as simple as changing what compiler flags are used to enable pre-processing, and the same flags will work for both ifort and ifx (and gcc?) then I'm fine with just doing that. Otherwise, maybe just switch to defining these as global constants so that we don't need to have another thing depend on the type of compiler we're using (I'm only familiar with doing this in C++ where I'd write something like const int LS_FAIL_MAX = 2;, but I assume you can do something similar in Fortran).

Also, I see adtProjections also has an F90 extension, but I didn't see any preprocessing commands in there, any idea what's going on there?

@lamkina
Copy link

lamkina commented Aug 5, 2024

I think the cleaner solution is to define these as constants and have all the extensions be lowercase f90.

I don't think the adtProjections need to be F90 so we can test that as part of this fix.

I'll wait for @eirikurj to comment then we can decide on the right way to fix this.

@eirikurj
Copy link
Contributor

I took a look, and I am not convinced that this is a preprocessing issue. To explicitly enable the preprocessor we can include the flag -fpp. This however has no effect and tests still fail. Even removing the define statements by manually search-replace the variables with values should alleviate this issue. However, it does not seem to have any effect as the tests fail.

Disabling compile optimization, i.e., setting -O0, will make the tests pass on an unmodified code, i.e., with the define statements (no -fpp flag used here). Any other optimization setting will make the tests fail. This suggests that optimization is changing the behavior, resulting in a different branch/path being taken (based on values) on the optimized code. Adding the flag -fp-model=precise, disables some optimizations making the floating point comparison more strict and value-safe, and with this option all tests pass regardless of optimization level. I briefly tested this with ifort and the behavior seems to be more inline with what gfortran behavior. Adding the same flag to ifort makes the behavior consistent with ifx using the same flag.

In code, the divergence between ifx and gfortran happens here

c = ((nfval - fval) - pgrad * step)

Running one of the test/points locally, the state of the values are shown below,

Variable ifx gfortran
fval 0.973086514811139 0.97308651481113850
nfval 0.973086514811139 0.97308651481113850
pgrad -4.045624659884222E-021 -4.0456103257696632E-021
step 1.00000000000000 1.00000000000000
c 0.000000000000000E+000 4.0456103257696632E-021

The inputs nfval,fval,step are the equal. pgrad is not equal, but it effectively zero (below machine zero).
For ifx since c is zero, the subsequent step is Infinty, effectively breaking the remainder of the iterations, resulting in the test failing. For gfortran the value for c is some very small number. Setting -fp-model=precise for ifx will result in the same value final value as gfortran. However, even if the final result is the same, the path is still not the same. ifx will continue the iteration in the lineLoop until hitting the maximum number of iterations, but not improve on the point. gfortran and ifort both exit the lineLoop earlier, but it seems to be due to inexact numerics.

Aside from compiler tweaks, I think the issue (at least in part) is due to sufficient decrease of the Wolfe condition (see here

if (nfval < fval + pgrad * wolfe * step) then
), since technically it should include equals as well, i.e., be a <= comparison, not <. Adding this correction to the if statement thus terminates the loop before this becomes an issue (since the gradient is zero), and the behavior is the same across all compilers ifx, gfortran, and ifort. Granted, this is a small change in behavior, but I think practically there should be no change in final results, and it should avoid unnecessary iterations.

I dont think digging more into the compilers is worth it, aside from just adding the compiler flags I mention above, but feel as I think its important to study ifx since its a new compiler for us. Moving the define to a module is possible, but I dont think is needed. The primary thing is the stopping condition (sufficient decrease), which I have opened a PR for here #75.

Please test this carefully and let me know what you think.

@marcomangano
Copy link
Contributor

Chiming in just to say that this is some incredible detective work! Updating the Wolfe condition here is a no-brainer. I wonder if we should look at other codes and see if there are "risky" if checks like this.

@A-CGray
Copy link
Member Author

A-CGray commented Aug 17, 2024

Wow @eirikurj this is a great find. I agree with your proposed fix in #75. On the preprocessing side, is -fpp on by default?

@eirikurj
Copy link
Contributor

@A-CGray yes preprocessing is automatically applied to a file if it has one of these extensions shown below (gfortran added for reference also).

  • ifort - .fpp or .FPP, .f or .F, .f90 or .F90, .for or .FOR, .ftn or .FTN, and .fpp or .FPP. (reference)
  • ifx - .FPP, .fpp, .F, .F90, .FOR, or .FTN on Linux, and file extension .fpp on Windows. (reference)
  • gfortran - .F, .FOR, .FTN, .fpp, .FPP, .F90, .F95, .F03 or .F08 (reference)

Since we are using F90 for our source files, we should be good across all comilers we use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants