Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Modin iloc does not work when setting list as values #5358

Open
2 of 3 tasks
Maxl94 opened this issue Dec 6, 2022 · 1 comment
Open
2 of 3 tasks

BUG: Modin iloc does not work when setting list as values #5358

Maxl94 opened this issue Dec 6, 2022 · 1 comment
Labels
bug 🦗 Something isn't working External Pull requests and issues from people who do not regularly contribute to modin P2 Minor bugs or low-priority feature requests pandas concordance 🐼 Functionality that does not match pandas

Comments

@Maxl94
Copy link

Maxl94 commented Dec 6, 2022

Modin version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest released version of Modin.

  • I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)

Reproducible Example

import modin.pandas as md
import numpy as np

### Example 1, set multiple values at once

df = md.DataFrame([1, 2, 3, 4], columns=['a'])
df['b'] = None

a = np.array([[1,2,3], [4,5,6]]).reshape(2,-1,3)
print(a.shape)

df.iloc[[1,2], [-1, -1]] = a.tolist()
df


### Example 2, set only one value
import modin.pandas as md

df = md.DataFrame([1, 2, 3, 4], columns=['a'])
df['b'] = None

a = np.array([7,8,9]).reshape(-1, 1)
print(a.shape)

df.iloc[1, -1] = a.tolist()
df

Issue Description

When trying to set a list as value of a DataFrame with .iloc to a row/column, modin raises a ValueError. The same code works as expected using pandas. It appears when setting a single value or setting multiple values at once.

Setting a simple value (e.g. int) instead of a list modin works as expected. The value is set to the DataFrame.

The error for example 1:

ValueError: could not broadcast input array from shape (2, 1, 3) into shape (2, 2)

The error for example 2:

ValueError: could not broadcast input array from shape (3, 1) into shape (1, 1)

Expected Behavior

The list should be assigned as value of the Dataframe for the selected row/column.

Error Logs

The error for example 1:

ValueError                                Traceback (most recent call last)
File ~/Projects/ki-for-mplus-v2/.venv/lib/python3.9/site-packages/modin/pandas/utils.py:330, in broadcast_item(obj, row_lookup, col_lookup, item, need_columns_reindex)
    329     else:
--> 330         return np.broadcast_to(item, to_shape)
    331 except ValueError:

File <__array_function__ internals>:180, in broadcast_to(*args, **kwargs)

File ~/Projects/ki-for-mplus-v2/.venv/lib/python3.9/site-packages/numpy/lib/stride_tricks.py:413, in broadcast_to(array, shape, subok)
    369 """Broadcast an array to a new shape.
    370 
    371 Parameters
   (...)
    411        [1, 2, 3]])
    412 """
--> 413 return _broadcast_to(array, shape, subok=subok, readonly=True)

File ~/Projects/ki-for-mplus-v2/.venv/lib/python3.9/site-packages/numpy/lib/stride_tricks.py:349, in _broadcast_to(array, shape, subok, readonly)
    348 extras = []
--> 349 it = np.nditer(
    350     (array,), flags=['multi_index', 'refs_ok', 'zerosize_ok'] + extras,
    351     op_flags=['readonly'], itershape=shape, order='C')
    352 with it:
    353     # never really has writebackifcopy semantics

ValueError: input operand has more dimensions than allowed by the axis remapping

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[40], line 9
      6 a = np.array([[1,2,3], [4,5,6]]).reshape(2,-1,3)
      7 print(a.shape)
----> 9 df.iloc[[1,2], [-1, -1]] = a.tolist()
     10 df

File ~/Projects/ki-for-mplus-v2/.venv/lib/python3.9/site-packages/modin/logging/logger_decorator.py:128, in enable_logging.<locals>.decorator.<locals>.run_and_log(*args, **kwargs)
    113 """
    114 Compute function with logging if Modin logging is enabled.
    115 
   (...)
    125 Any
    126 """
    127 if LogMode.get() == "disable":
...
    334         f"could not broadcast input array from shape {from_shape} into shape "
    335         + f"{to_shape}"
    336     )

ValueError: could not broadcast input array from shape (2, 1, 3) into shape (2, 2)

The error for example 2:

```python-traceback ValueError Traceback (most recent call last) File ~/Projects/ki-for-mplus-v2/.venv/lib/python3.9/site-packages/modin/pandas/utils.py:330, in broadcast_item(obj, row_lookup, col_lookup, item, need_columns_reindex) 329 else: --> 330 return np.broadcast_to(item, to_shape) 331 except ValueError:

File <array_function internals>:180, in broadcast_to(*args, **kwargs)

File ~/Projects/ki-for-mplus-v2/.venv/lib/python3.9/site-packages/numpy/lib/stride_tricks.py:413, in broadcast_to(array, shape, subok)
369 """Broadcast an array to a new shape.
370
371 Parameters
(...)
411 [1, 2, 3]])
412 """
--> 413 return _broadcast_to(array, shape, subok=subok, readonly=True)

File ~/Projects/ki-for-mplus-v2/.venv/lib/python3.9/site-packages/numpy/lib/stride_tricks.py:349, in _broadcast_to(array, shape, subok, readonly)
348 extras = []
--> 349 it = np.nditer(
350 (array,), flags=['multi_index', 'refs_ok', 'zerosize_ok'] + extras,
351 op_flags=['readonly'], itershape=shape, order='C')
352 with it:
353 # never really has writebackifcopy semantics

ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (3,1) and requested shape (1,1)

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
Cell In[39], line 9
6 a = np.array([7,8,9]).reshape(-1, 1)
7 print(a.shape)
----> 9 df.iloc[1, -1] = a.tolist()
10 df

File ~/Projects/ki-for-mplus-v2/.venv/lib/python3.9/site-packages/modin/logging/logger_decorator.py:128, in enable_logging..decorator..run_and_log(*args, **kwargs)
113 """
114 Compute function with logging if Modin logging is enabled.
115
(...)
125 Any
126 """
127 if LogMode.get() == "disable":
...
334 f"could not broadcast input array from shape {from_shape} into shape "
335 + f"{to_shape}"
336 )

ValueError: could not broadcast input array from shape (3, 1) into shape (1, 1)

</details>

### Installed Versions

<details>

INSTALLED VERSIONS

commit : 7f801ad
python : 3.9.12.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.0-48-generic
Version : #54-Ubuntu SMP Fri Aug 26 13:26:29 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

Modin dependencies

modin : 0.17.1
ray : 2.1.0
dask : None
distributed : None
hdk : None

pandas dependencies

pandas : 1.5.2
numpy : 1.23.5
pytz : 2022.6
dateutil : 2.8.2
setuptools : 65.6.3
pip : 22.0.2
Cython : None
pytest : 7.2.0
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.7.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli : None
fastparquet : None
fsspec : 2022.11.0
gcsfs : 2022.11.0
matplotlib : 3.6.2
numba : None
numexpr : None
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : 9.0.0
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.9.3
snappy : None
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None


</details>
@Maxl94 Maxl94 added bug 🦗 Something isn't working Triage 🩹 Issues that need triage labels Dec 6, 2022
@pyrito
Copy link
Collaborator

pyrito commented Dec 6, 2022

@Maxl94 thank you for reporting this bug! I was able to reproduce this on the latest master. I'll take a closer look to see what's causing the issue and open a PR with a fix!

@pyrito pyrito added pandas concordance 🐼 Functionality that does not match pandas P2 Minor bugs or low-priority feature requests and removed Triage 🩹 Issues that need triage labels Dec 6, 2022
@anmyachev anmyachev added the External Pull requests and issues from people who do not regularly contribute to modin label Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working External Pull requests and issues from people who do not regularly contribute to modin P2 Minor bugs or low-priority feature requests pandas concordance 🐼 Functionality that does not match pandas
Projects
None yet
Development

No branches or pull requests

3 participants