Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core dump with VietorisRipsPersistence and joblib #59

Closed
arksch opened this issue Oct 22, 2019 · 15 comments
Closed

Core dump with VietorisRipsPersistence and joblib #59

arksch opened this issue Oct 22, 2019 · 15 comments

Comments

@arksch
Copy link

arksch commented Oct 22, 2019

Description

Core dump when calling fit_transform on VietorisRipsPersistence with n_jobs=None or 1,
TerminatedWorkerError when n_jobs=2.

Steps/Code to Reproduce

import numpy as np
from giotto.homology import VietorisRipsPersistence
VietorisRipsPersistence(n_jobs=2).fit_transform(np.array([[[0, 0], [0, 1], [1, 0], [1, 1]]]))

Expected Results

No error is thrown.

Actual Results

Illegal instruction (core dumped)

Unreportable Reason: Cannot determine path of python module joblib.externals.loky.backend.popen_loky_posix"

Or if n_jobs=1:

TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {SIGILL(-4)}

Versions

Linux-4.15.0-65-generic-x86_64-with-Ubuntu-18.04-bionic
Python 3.6.8 (default, Oct 7 2019, 12:59:55)
[GCC 8.3.0]
NumPy 1.17.2
SciPy 1.3.1
Scikit-Learn 0.21.3
giotto-Learn 0.1.1

@arksch
Copy link
Author

arksch commented Oct 22, 2019

Possibly related to joblib/joblib#849

@gtauzin
Copy link
Collaborator

gtauzin commented Oct 22, 2019

Hi,

Thanks for the feedback. Could you provide your joblib version?

@gtauzin gtauzin mentioned this issue Oct 22, 2019
@arksch
Copy link
Author

arksch commented Oct 22, 2019

joblib 0.14.0
Seems to be the most recent version.

@ulupo
Copy link
Collaborator

ulupo commented Aug 23, 2020

@arksch was this ever resolved? Is it still happening with the latest version of giotto-tda?

@jlevy44
Copy link

jlevy44 commented Sep 8, 2020

I'm getting this issue as well.

@ulupo
Copy link
Collaborator

ulupo commented Sep 9, 2020

@jlevy44 thanks for the report! Could you please provide us with your setup and with a minimum working example so we can try to reproduce this issue?

To get the setup, please run the following snippet and paste the output:

import platform; print(platform.platform())
import sys; print("Python", sys.version)
import numpy; print("NumPy", numpy.__version__)
import scipy; print("SciPy", scipy.__version__)
import joblib; print("Joblib", joblib.__version__)
import sklearn; print("Scikit-learn", sklearn.__version__)
import gtda; print("Giotto-tda", gtda.__version__)

@jlevy44
Copy link

jlevy44 commented Sep 9, 2020

Sounds good.

Linux-3.10.0-1062.18.1.el7.x86_64-x86_64-with-debian-buster-sid
Python 3.7.6 (default, Jan  8 2020, 19:59:22) 
[GCC 7.3.0]
NumPy 1.19.1
SciPy 1.5.2
Joblib 0.16.0
Scikit-learn 0.23.2
Giotto-tda 0.2.2

Instead of posting a working example, I can send you my solution, which I was able to get to run! I ended up hacking your script and replacing the joblib backend with dask:

from gtda.homology._utils import _postprocess_diagrams
import dask, dask.diagnostics
from gtda.homology import VietorisRipsPersistence

homology_dimensions = (0, 1)
def fit_persistence(x):
    VR = VietorisRipsPersistence(
        metric='euclidean', max_edge_length=3000, homology_dimensions=homology_dimensions, n_jobs=1).fit([x])
    return VR._ripser_diagram(x)

with dask.diagnostics.ProgressBar():
    diagrams=dask.compute(*[dask.delayed(fit_persistence)(x) for x in point_clouds],scheduler="processes")

diagrams=_postprocess_diagrams(diagrams,(0,1),None,1)

This worked as intended. :)

@ulupo
Copy link
Collaborator

ulupo commented Sep 9, 2020

@jlevy44 thanks for reporting on your setup and it's interesting that you found a solution with dask!

However, it is still important to have a minimum working example (if you are able to provide it), to be able to better pinpoint the exact origin of the problem. Since we use joblib for now, we cannot fix the issue otherwise!

@ulupo
Copy link
Collaborator

ulupo commented Sep 9, 2020

Additionally, I'm wondering if you were passing a collection with a single sample even before finding the solution with dask.

@jlevy44
Copy link

jlevy44 commented Sep 9, 2020

No, I wasn't, if I'm understanding you correctly. Sometimes it (joblib backend) would work (especially if I reduced the number of point clouds to ~30), other times it would fail (especially close to 100 clouds and higher max_edge_length).

I am unable to provide our working example right now, apologies!

@ulupo
Copy link
Collaborator

ulupo commented Sep 9, 2020

@jlevy44 thank you, I understand that you cannot provide this example. Could you perhaps give me information on the shape of the point clouds you use?

@ulupo
Copy link
Collaborator

ulupo commented Sep 11, 2020

@jlevy44 aside from my above question about shapes and number of point clouds used (so I can try my best to reproduce), I'm wondering if you can try the following since you don't seem scared by the idea of changing the source code: in the call to Parallel in VietorisRipsPersistence.transform, could you try to insert various allowed values for the keyword parameter backend, according to https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html, after n_jobs? It would help greatly to know if this improves the situation or not.

@arksch
Copy link
Author

arksch commented Sep 14, 2020

@arksch was this ever resolved? Is it still happening with the latest version of giotto-tda?

I cannot reproduce it with the newest version. Solved from my side. Thanks!

Linux-4.15.0-117-generic-x86_64-with-Ubuntu-18.04-bionic
Python 3.6.9 (default, Jul 17 2020, 12:50:27)
[GCC 8.4.0]
NumPy 1.19.2
SciPy 1.5.2
Joblib 0.16.0
Scikit-learn 0.23.2
Giotto-tda 0.2.2

@ulupo
Copy link
Collaborator

ulupo commented Sep 14, 2020

@arksch thank you so much for coming back to this thread, and for the report!

It would be good to have @jlevy44's answer so we can attempt to reproduce his issue and improve the experience for other users.

@ulupo
Copy link
Collaborator

ulupo commented Feb 2, 2022

Closing as this is likely no longer an issue with the giotto-ph backend.

@ulupo ulupo closed this as completed Feb 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants