Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError when there are fewer DataFrame rows than workers #67

Closed
elemakil opened this issue Jan 6, 2020 · 6 comments
Closed

IndexError when there are fewer DataFrame rows than workers #67

elemakil opened this issue Jan 6, 2020 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@elemakil
Copy link

elemakil commented Jan 6, 2020

When the number of rows is below the number of workers an IndexError is raised. Minimal example:

Code

import time
import pandas as pd
from pandarallel import pandarallel

pandarallel.initialize(progress_bar=True)

df = pd.DataFrame({'x':[1,2]})
df.parallel_apply(lambda row: print('A'), time.sleep(2), print('B'), axis=1)

Output

INFO: Pandarallel will run on 6 workers.
INFO: Pandarallel will use Memory file system to transfer data between the main process and workers.
B
   0.00%                                          |        0 /        1 |                                                                                                                    
   0.00%                                          |        0 /        1 |                                                                                                                    Traceback (most recent call last):
  File "foo.py", line 8, in <module>
    df.parallel_apply(lambda row: print('A'), time.sleep(2), print('B'), axis=1)
  File "$VIRTUAL_ENV/lib/python3.7/site-packages/pandarallel/pandarallel.py", line 446, in closure
    map_result,
  File "$VIRTUAL_ENV/lib/python3.7/site-packages/pandarallel/pandarallel.py", line 382, in get_workers_result
    progress_bars.update(progresses)
  File "$VIRTUAL_ENV/lib/python3.7/site-packages/pandarallel/utils/progress_bars.py", line 82, in update
    self.__bars[index][0] = value
IndexError: list index out of range

I'm using python version 3.7.4 with pandas 0.25.3 and pandarallel 1.4.4.

@nalepae
Copy link
Owner

nalepae commented Jan 6, 2020

Thanks!

@nalepae nalepae self-assigned this Jan 6, 2020
@nalepae nalepae added the bug Something isn't working label Jan 6, 2020
@Wizacorn
Copy link

Wizacorn commented Jan 9, 2020

pandarallel==1.4.4
pandas==0.25.3
Python==3.6.8

This is the traceback received when progress_bars is not enabled.

Traceback (most recent call last):
  File "<input>", line 81, in <module>
  File "/Users/user/Software/overinflation_analysis/venv/lib/python3.6/site-packages/pandarallel/pandarallel.py", line 446, in closure
    map_result,
  File "/Users/user/Software/overinflation_analysis/venv/lib/python3.6/site-packages/pandarallel/pandarallel.py", line 354, in get_workers_result
    message_type, message = queue.get()
  File "<string>", line 2, in get
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/managers.py", line 757, in _callmethod
    kind, result = conn.recv()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
    raise EOFError
EOFError

@hardianlawi
Copy link

I encountered the same issue as @Darkyj !

terminate called after throwing an instance of 'std::system_error'
  what():  Invalid argument

@nalepae
Copy link
Owner

nalepae commented May 23, 2020

While I cannot run your code even with standard pandas (without pandarallel) because of this line:
df.parallel_apply(lambda row: print('A'), time.sleep(2), print('B'), axis=1), your issue should be fixed in pandarallel v1.4.8.

Please reopen this ticket if needed!

@nalepae nalepae closed this as completed May 23, 2020
@hawktang
Copy link

I have all of row, but same issue here

File "/home/hawktang/anaconda3/envs/topic_classifier/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/home/hawktang/anaconda3/envs/topic_classifier/lib/python3.7/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError

@jasonminsookim
Copy link

I'm still having this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants