Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_save_deg2 fails for FEniCS stable #94

Closed
johannesring opened this issue Oct 13, 2023 · 16 comments · Fixed by #98
Closed

test_save_deg2 fails for FEniCS stable #94

johannesring opened this issue Oct 13, 2023 · 16 comments · Fixed by #98

Comments

@johannesring
Copy link
Collaborator

The test test_save_deg2 fails for FEniCS stable with the following output:

Newton iteration 0: r (atol) = 4.276e+02 (tol = 1.000e-07), r (rel) = 5.250e+01 (tol = 1.000e-07)
Traceback (most recent call last):
  File "/usr/share/miniconda3/envs/turtleFSI/bin/turtleFSI", line 8, in <module>
    sys.exit(main())
  File "/usr/share/miniconda3/envs/turtleFSI/lib/python3.8/site-packages/turtleFSI/run_turtle.py", line 18, in main
    from turtleFSI import monolithic
  File "/usr/share/miniconda3/envs/turtleFSI/lib/python3.8/site-packages/turtleFSI/monolithic.py", line 195, in <module>
    vars().update(save_files_visualization(**vars()))
  File "/usr/share/miniconda3/envs/turtleFSI/lib/python3.8/site-packages/turtleFSI/problems/__init__.py", line 300, in save_files_visualization
    namespace["d_viz"].vector()[:] = namespace["dv_trans"]*d.vector()
RuntimeError:

*** -------------------------------------------------------------------------
*** DOLFIN encountered an error. If you are not able to resolve this issue
Compute Jacobian matrix
*** using the information listed below, you can ask for help at
***
***     fenics-support@googlegroups.com
***
*** Remember to include the error message listed below and, if possible,
*** include a *minimal* running example to reproduce the error.
***
*** -------------------------------------------------------------------------
*** Error:   Unable to successfully call PETSc function 'MatCreateVecs'.
*** Reason:  PETSc error code is: 98 (General MPI error).
*** Where:   This error was encountered inside /home/conda/feedstock_root/build_artifacts/fenics-pkgs_1696906530109/work/dolfin/dolfin/la/PETScBaseMatrix.cpp.
*** Process: 0
***
*** DOLFIN version: 2019.1.0
*** Git changeset:  2e001bd1aae8e14d758264f77382245e6eed04b0
*** -------------------------------------------------------------------------

Newton iteration 1: r (atol) = 5.941e-03 (tol = 1.000e-07), r (rel) = 9.318e+01 (tol = 1.000e-07)
Compute Jacobian matrix
Newton iteration 2: r (atol) = 2.873e-04 (tol = 1.000e-07), r (rel) = 1.181e-04 (tol = 1.000e-07)
Compute Jacobian matrix
Newton iteration 3: r (atol) = 2.515e-11 (tol = 1.000e-07), r (rel) = 3.317e-11 (tol = 1.000e-07)
Distance x: -2.527795e-06
Distance y: -3.826376e-10
Drag: 1.088597e+00
Lift: -9.584695e-04
FAILED

The test only fails for ubuntu-latest and for Python 3.8 and 3.10, while the test runs fine for Python 3.9 and 3.11.

The complete output is available here.

@keiyamamo
Copy link
Collaborator

The error seems to come from MPI, but it’s a bit weird since we are not using MPI in the test. I don’t know how to fix this immediately but maybe using updated version of container might help?

https://github.com/KVSlab/turtleFSI/blob/fc46e6d32e713de22784706f934ea8c13edc6b38/.github/workflows/test_turtle.yml#L20C16-L20C61

or @jorgensd might know how to fix this.

@johannesring
Copy link
Collaborator Author

johannesring commented Oct 13, 2023

That is for the tests that are run against FEniCS master (test_turtle.yml). This is not failing, but it should probably be updated to the latest tag (2023-08-14). The failing tests are run against FEniCS stable (test_turtle_conda.yml).

@keiyamamo
Copy link
Collaborator

I see. It seems like there were some updates on conda-forge fenics last couple of weeks, so that might be related...

https://github.com/conda-forge/fenics-feedstock

@jorgensd
Copy link
Collaborator

The conda envs are using PETSc 3.20, which could cause many issues for us, as PETSc no longer does garbage collection in Python (since 3.16).

@keiyamamo
Copy link
Collaborator

If it's related to PETSc version, I'm guessing there's little we can do to fix. One way is to simply remove the test for save_deg=2. I guess the problem might be related to 'PETScDMCollection.create_transfer_matrix' function.

@keiyamamo
Copy link
Collaborator

This problem affects conda-forge as you can see here. Should we simply deactivate test for save_deg=2 for now? @johannesring @jorgensd

@johannesring
Copy link
Collaborator Author

johannesring commented Oct 24, 2023

@keiyamamo - I don't think the issue with turtle on conda-forge is related to this issue. When looking at the logs, you can see that all the tests are failing due to a segfault. This is the same problem I see when restarting the tests for Python 3.10. The first test, test_cfd, fails with a segfault (see here). I've also tried to run the tests locally using act, and for Python 3.10, all the tests fail with a segfault.

I'm not sure whether we should consider disabling the test_save_deg2. It looks to be a problem only with the conda packages for Python 3.8 (and possibly Python 3.10).

Note that all the various Python versions we test use the same PETSc version (3.20.0), so the issue doesn't look to be related to the PETSc version.

@jorgensd
Copy link
Collaborator

I saw a related issue yesterday, due to using the latest version of mpi4py. Maybe try to pin it to a sensible version? @MariusCausemann, what version did you end up using?

@MariusCausemann
Copy link

I solved my problem by using mpi4py 3.1.4, but I'm not sure if it's the same problem. Good luck!

@keiyamamo
Copy link
Collaborator

Thank you @MariusCausemann ! I tried with 3.1.4 but did not solve the problem keiyamamo#3
I noticed that hdf5 has different versions than before so that might be related as well.

@johannesring
Copy link
Collaborator Author

Using PETSc 3.19.6 fixed the problem with Python 3.8, but it did not fix the problem with segfaults with Python 3.10.

@keiyamamo
Copy link
Collaborator

keiyamamo commented Oct 24, 2023

I tested with hdf5 version 1.12.2 and that solved the problem with python3.8 and 3.10 but created a problem with 3.9.... maybe some combination of versions work fine.

@keiyamamo
Copy link
Collaborator

Failure of save_deg=2 is related to PETScDMCollection.create_transfer_matrix and is reported here

@minrk
Copy link

minrk commented Oct 30, 2023

I don't understand the interactions enough, but the segfault in conda-forge/fenics-feedstock#192 is because PetscInitialize is never called. This seems to clearly be a bug in fenics, as PetscInitialize seems like it should be required before using petsc at all.

Fixing that seems ot be easy, and requires calling SubSystemsManager.init_petsc() before using petsc. But that only lets you get as far as the MPI error 98, which I also don't understand, and may be related to mpi initialization.

@keiyamamo
Copy link
Collaborator

Thank you @minrk for the explanation! I lowered version of hdf5 and mpi4py (here), and it passes pytest now, so it seems to be somehow related to the recent version of mpi4py. It might be related to this ?

@johannesring @jorgensd Do you think it would be okay to specify hdf5 and mpi4py version in environment.yml so that it passes the pytest for now?

@johannesring
Copy link
Collaborator Author

@keiyamamo - Yes, I think that is okay for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants