Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting progress_bar=True freezes execution for parallel_apply before reaching 1% completion on all CPU's #131

Closed
abhineetgupta opened this issue Jan 28, 2021 · 25 comments

Comments

@abhineetgupta
Copy link

When progress_bar=True, I noticed that the execution of my parallel_apply task stopped right before all parallel processes reached 1% progress mark.
Here are some further details of what I was encountering -

  • I turned on logging with DEBUG messages, but no messages were displayed when the execution stopped. There were no error messages either. The dataframe rows simply stopped processing further and the process seemed to be frozen.
  • I have two CPU's. It seems that the progress bar only updates in 1% increments. One of the progress bars reaches 1% mark, but when the number of processed rows reaches the 2% mark (which I assume is associated with the second progress bar updating to 1% as well), that's when the process froze.
  • The process runs fine with progress_bar=False.
@abhineetgupta abhineetgupta changed the title Setting progress_bar=True freezes execution for parallel_apply` before reaching 1% completion on all CPU's Setting progress_bar=True freezes execution for parallel_apply before reaching 1% completion on all CPU's Jan 28, 2021
@abhineetgupta abhineetgupta changed the title Setting progress_bar=True freezes execution for parallel_apply before reaching 1% completion on all CPU's Setting progress_bar=True freezes execution for parallel_apply before reaching 1% completion on all CPU's Jan 28, 2021
@bmacher-discovery
Copy link

Similar issue here, except that once one process reaches 100% all others get stuck at 99.99%. Problem is completely fixed by turning off the progress bars (but I don't look quite leet enough /s).

Specs:

  • SageMaker ml.m5.4xl
  • Data ~2.6M rows
  • Using parallel_apply with a function that transforms sentences to tokens, lemmatizes, and then checks for the presence of a token.

@Ronserruya
Copy link

Same issue ^
21M rows
python 3.8, OSX 10.15.7,

I'm running parallel_apply, and 2 out of 12 bars finish, the others get stuck and I'm getting a "python quit unexpectedly" error from the os

@chris-forbes
Copy link

chris-forbes commented Feb 23, 2021

Similar issue and i'm only working on about 12k rows.
It seems to get to about 300 completed items on each core then all of the forked processes just seem to die - almost like it's trying to create new threads but then it just sits there, all cores basically unused.

Python 3.6.9 on Ubuntu-18.04 WSL2

** Edit**
I removed the enable for progress_bar in my little console application, and it seems that whatever deadlock is occurring has disappeared, it seems to be progressing pretty well

@zkx06111
Copy link

Same issue here, I set the number of workers to 12 but 2 of them stopped with 1% progress.

@CptPirx
Copy link

CptPirx commented May 12, 2021

I have the same issue, working on 111k rows, Python 3.8.

@skwde
Copy link

skwde commented May 21, 2021

Same here. None of the processes make any progress.

I use parallel_apply on a groupby. It seems that the length of the groups is also not correctly recognized for the progress bar.

@quancore
Copy link
Contributor

Same, is there any workaround for it?

@abhineetgupta
Copy link
Author

Same, is there any workaround for it?

Setting progress_bar=False worked for me.

@neontty
Copy link

neontty commented Sep 2, 2021

also experiencing this issue

Python 3.8
pandarallel 1.5.2
centos
~500k rows

happens both at all <1% and sometimes at most >99%

the workaround progress_bar=False also works for me, but it would be nice to have :)

@kylegilde
Copy link

This happens to me too, but the workaround works.

@Lucas-Servi
Copy link

Same here in a ".parallel_apply(lambda)"
Froze here:
image

@nalepae
Copy link
Owner

nalepae commented Mar 4, 2022

Could you please tell me the version of pandarallel you are using?

@Lucas-Servi
Copy link

Name: pandarallel
Version: 1.5.5

@nalepae
Copy link
Owner

nalepae commented Mar 4, 2022 via email

@Lucas-Servi
Copy link

Sure, give a min...
just for the record, the execution comes to a moment where cores stop working while the cell is still running:
image

@nalepae
Copy link
Owner

nalepae commented Mar 4, 2022

Sorry, I don't get if your issue is fixed with Pandarallel 1.5.7.
If no, could you please provide:

  • Operating System:
  • Python version:
  • Pandas version:
  • Pandarallel version:
    and a minimal code sample which reproduce the issue for me to investigate?

@Lucas-Servi
Copy link

image
Stopped here.

Operating System: Linux Mint 20.3
Kernel: Linux 5.13.0-27-generic
Python version: Python 3.9.5
Pandas version: 1.4.1
Pandarallel version: 1.5.7

I made a little folder with code + 2 dataframes used.
https://easyupload.io/w9mbcv

Hope it helps!

Thanks for Pandarallel, it's amazing :)!

@nalepae
Copy link
Owner

nalepae commented Mar 7, 2022

Hello,

I do reproduce your issue with pandarallel 1.5.5, but I do not reproduce your issue with pandarallel v1.5.7.
Are you totally sure you tried it with pandarallel 1.5.7?

To know the current version of pandarallel you are using:

import pandarallel

pandarallel.__version__

To be sure you install the last version of pandarallel:

pip install pandallel --upgrade

(I guess you are not using pandarallel v1.5.7, since this version of pandarallel only uses by default the half of available CPUs. I see on your htop screenshot you have 16 CPUs and you have also 16 progress bars.)

@Lucas-Servi
Copy link

Yes, but I`m testing it on 8 or 4 cores now and still not working. This was my best shot after clean install in a new env.
image

Running on 1.5.7
It usually runs perfectly, i just had trouble with this particular script. Thanks for the support, i'm going to try something different.
:)

@till-m
Copy link
Collaborator

till-m commented Sep 12, 2022

I'm assuming this has been fixed.

@till-m till-m closed this as completed Sep 12, 2022
@parthpankajtiwary
Copy link

@nalepae @till-m I am still encountering this issue both in version 1.5.7 and 1.6.3. Some cores fail to progress freeze both with progress_bar=True and progress_bar=False

@parthpankajtiwary
Copy link

parthpankajtiwary commented Dec 16, 2022

@nalepae @till-m I am still encountering this issue both in version 1.5.7 and 1.6.3. Some cores fail to progress freeze both with progress_bar=True and progress_bar=False

I got it to work. Couple of observations:

  • I was working in Windows - so anything prior to multiprocessing that touches cuda drivers will not sit well with multiprocessing. In my case I was importing cudf, I separated the logic.
  • I was passing a model (700 MB) as an argument to the function supplied to parallel_apply, that seems to have been a bottleneck. As a work around, I have initialised the model as a global variable instead of passing it to the function and it seems to have worked fine.

@LukebethamStonehaven
Copy link

I am still getting this issue on pandarallel 1.6.5. If I set progress_bars = False I don't get any issues, but would be great to be able to use this feature.

Using parallel_apply() it just hangs here - and the data table I am using here is tiny for testing (~1 MB)
image

I am using M2 mac but think that should be fine from what I can see on the docs.

@till-m
Copy link
Collaborator

till-m commented Jul 6, 2023

Hi @LukebethamStonehaven,

can you consistently reproduce the problem like this? If yes, can you send me an SSCCE?

@sahil-zepto
Copy link

I am facing a similar issue of parallel_apply() freezing when running my code on an EC2 cluster. It was working fine up till a few days back, everday on a schedule, but suddenly it has stopped working. On running the same code on my local machine it is working alright though. I have also kept progress_bar=False. My pandarallel version is v1.6.4 in both local & EC2. Any ideas guys?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests