Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky tests using MPI #600

Closed
renefritze opened this issue Feb 21, 2019 · 10 comments · Fixed by #817
Closed

Flaky tests using MPI #600

renefritze opened this issue Feb 21, 2019 · 10 comments · Fixed by #817
Assignees
Labels
Milestone

Comments

@renefritze
Copy link
Member

It looks like the 'MPI' CI job sometimes fails due to some intermittent system condition. I have not been able to reproduce it yet. A restart of the job fixes it.

@sdrave
Copy link
Member

sdrave commented Feb 21, 2019

The job you have linked actually succeeded. Do you have an example for a failing job?

@renefritze
Copy link
Member Author

Not atm. This is basically just an FYI for when you come across a failed MPI job.

@sdrave
Copy link
Member

sdrave commented Feb 27, 2019

One example of a failing test is here, where the problem seems to be sqlite related:

https://zivgitlab.uni-muenster.de/pymor/pymor/-/jobs/32104

What is kind of strange: by default there is a timeout of 5 seconds on aquiering a lock, and I don't see that there is any operation which might take any longer ..

@renefritze
Copy link
Member Author

Suggestion 1: I replace the SQLiteRegion implementation with a new DiskRegion based on https://github.com/grantjenks/python-diskcache
They claim they're process safe.

Suggestion 2: use an ORM like sqlachemy to abstract to a SQLRegion implementation where the actual database connection is a configuration detail.

@sdrave
Copy link
Member

sdrave commented Feb 28, 2019

python-diskcache looks kind of neat. Using it would allow us to remove that nasty SQL code from pyMOR. If it will solve the issues with the MPI tests I am not sure. sqlite should be process safe as well. Maybe it's worth giving it a try.

@sdrave sdrave added this to the 2019.2 milestone Feb 28, 2019
@renefritze renefritze self-assigned this Feb 28, 2019
@sdrave
Copy link
Member

sdrave commented Mar 6, 2019

We haven't seen MPI tests failing since the adoption of diskcache. Let's hope for the best and close this issue.

@sdrave sdrave closed this as completed Mar 6, 2019
@renefritze renefritze reopened this Jul 15, 2019
@renefritze
Copy link
Member Author

Since I haven't been able to quickly determine what's going on with the newly failing MPI tests, I propose to disable them until I've found a solution. Which might take a couple of weeks until I really have time for that.
Objections, @pymor/pymor-devs ?

@renefritze
Copy link
Member Author

If there are none, I'll merge #735 when ready and rebase open PR branches on that

@pmli
Copy link
Member

pmli commented Jul 15, 2019

None from me.

@renefritze
Copy link
Member Author

I've rebased the active PRs where jobs were failing in MPI on master now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants