You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest released version of Modin.
I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)
Reproducible Example
importmodin.pandasaspddata1=pd.read_excel('abc.xlsx', header=[0,1]) # multiple headersdefanyFuncB(x):
dosomethingreturnxdefanyFuncA(x)
x.loc[data1[('col0','col1')].apply(anyFuncB)] #here cause the error, apply() results in a pd.Seriesdata=pd.read_excel('def.xlsx')
data.groupby(by='col0').apply(anyFuncA)
Issue Description
By just applying dataframe0.apply(anyFunc0), everything was good.
After applying dataframe0.groupby().apply(anyFunc0), if another dataframe1 has multi index and it runs dataframe1[('col0', 'col1')].apply(anyFunc1),
File "/usr/local/python3.10/lib/python3.10/site-packages/modin/pandas/series.py", line 713, in apply
if result.name == self.index[0]:
raises ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(), because here result.name is a tuple with 2 items and self.index[0] is a numpy.int64, the result of comparison is a list contents two boolean values, my temp fix is adding following code:
elif return_type == "Series":
try:
if result.name == self.index[0]:
result.name = None
except:
if (result.name == self.index[0]).all():
result.name = None
other solution could be to determine if result.name and self.index[0] is single value or not.
Expected Behavior
make the comparison correct
Error Logs
Traceback (most recent call last):
File "/home/ecommerce_production_classification/database.py", line 46, in <module>print(data.loc[:5].groupby(by='company_id').apply(lambdax: detect_data(x)))
File "/usr/local/python3.10/lib/python3.10/site-packages/modin/logging/logger_decorator.py", line 144, in run_and_logreturn obj(*args, **kwargs)
File "/usr/local/python3.10/lib/python3.10/site-packages/modin/pandas/groupby.py", line 653, in applyifnotisinstance(apply_res, Series) and apply_res.columns.equals(
File "/usr/local/python3.10/lib/python3.10/site-packages/modin/pandas/base.py", line 4294, in __getattribute__
attr =super().__getattribute__(item)
File "/usr/local/python3.10/lib/python3.10/site-packages/modin/pandas/dataframe.py", line 315, in _get_columnsreturnself._query_compiler.columns
File "/usr/local/python3.10/lib/python3.10/site-packages/modin/core/storage_formats/pandas/query_compiler.py", line 104, in <lambda>returnlambdaself: self._modin_frame.columns
File "/usr/local/python3.10/lib/python3.10/site-packages/modin/core/dataframe/pandas/dataframe/dataframe.py", line 727, in _get_columns
columns, column_widths =self._columns_cache.get(return_lengths=True)
File "/usr/local/python3.10/lib/python3.10/site-packages/modin/core/dataframe/pandas/metadata/index.py", line 194, in get
index, self._lengths_cache =self._value()
File "/usr/local/python3.10/lib/python3.10/site-packages/modin/core/dataframe/pandas/metadata/index.py", line 106, in <lambda>returnlambda: dataframe_obj._compute_axis_labels_and_lengths(axis)
File "/usr/local/python3.10/lib/python3.10/site-packages/modin/logging/logger_decorator.py", line 144, in run_and_logreturn obj(*args, **kwargs)
File "/usr/local/python3.10/lib/python3.10/site-packages/modin/core/dataframe/pandas/dataframe/dataframe.py", line 835, in _compute_axis_labels_and_lengths
new_index, internal_idx =self._partition_mgr_cls.get_indices(axis, partitions)
File "/usr/local/python3.10/lib/python3.10/site-packages/modin/logging/logger_decorator.py", line 144, in run_and_logreturn obj(*args, **kwargs)
File "/usr/local/python3.10/lib/python3.10/site-packages/modin/core/dataframe/pandas/partitioning/partition_manager.py", line 1193, in get_indices
new_idx =cls.get_objects_from_partitions(new_idx)
File "/usr/local/python3.10/lib/python3.10/site-packages/modin/logging/logger_decorator.py", line 144, in run_and_logreturn obj(*args, **kwargs)
File "/usr/local/python3.10/lib/python3.10/site-packages/modin/core/dataframe/pandas/partitioning/partition_manager.py", line 1134, in get_objects_from_partitionsreturncls._execution_wrapper.materialize(
File "/usr/local/python3.10/lib/python3.10/site-packages/modin/core/execution/ray/common/engine_wrapper.py", line 139, in materializereturn ray.get(obj_id)
File "/usr/local/python3.10/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapperreturn fn(*args, **kwargs)
File "/usr/local/python3.10/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapperreturn func(*args, **kwargs)
File "/usr/local/python3.10/lib/python3.10/site-packages/ray/_private/worker.py", line 2630, in get
values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
File "/usr/local/python3.10/lib/python3.10/site-packages/ray/_private/worker.py", line 863, in get_objectsraise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::remote_exec_func() (pid=22666, ip=172.29.158.228)
At least one of the input arguments for this task could not be computed:
ray.exceptions.RayTaskError: ray::_deploy_ray_func() (pid=22664, ip=172.29.158.228)
File "/usr/local/python3.10/lib/python3.10/site-packages/modin/core/execution/ray/implementations/pandas_on_ray/partitioning/virtual_partition.py", line 335, in _deploy_ray_func
result = deployer(axis, f_to_deploy, f_args, f_kwargs, *deploy_args, **kwargs)
File "/usr/local/python3.10/lib/python3.10/site-packages/modin/logging/logger_decorator.py", line 144, in run_and_logreturn obj(*args, **kwargs)
File "/usr/local/python3.10/lib/python3.10/site-packages/modin/core/dataframe/pandas/partitioning/axis_partition.py", line 462, in deploy_axis_funcraise err
File "/usr/local/python3.10/lib/python3.10/site-packages/modin/core/dataframe/pandas/partitioning/axis_partition.py", line 457, in deploy_axis_func
result = func(dataframe, *f_args, **f_kwargs)
File "/usr/local/python3.10/lib/python3.10/site-packages/modin/core/dataframe/pandas/dataframe/dataframe.py", line 2078, in _tree_reduce_func
series_result = func(df, *args, **kwargs)
File "/usr/local/python3.10/lib/python3.10/site-packages/modin/core/dataframe/pandas/dataframe/dataframe.py", line 4261, in apply_func
result = operator(df.groupby(by, **kwargs))
File "/usr/local/python3.10/lib/python3.10/site-packages/modin/core/storage_formats/pandas/query_compiler.py", line 3976, in <lambda>
operator=lambdagrp: agg_func(grp, *agg_args, **agg_kwargs),
File "/usr/local/python3.10/lib/python3.10/site-packages/modin/core/storage_formats/pandas/query_compiler.py", line 3957, in agg_func
result = agg_method(grp, original_agg_func, *args, **kwargs)
File "/usr/local/python3.10/lib/python3.10/site-packages/pandas/core/groupby/groupby.py", line 1824, in apply
result =self._python_apply_general(f, self._selected_obj)
File "/usr/local/python3.10/lib/python3.10/site-packages/pandas/core/groupby/groupby.py", line 1885, in _python_apply_general
values, mutated =self._grouper.apply_groupwise(f, data, self.axis)
File "/usr/local/python3.10/lib/python3.10/site-packages/pandas/core/groupby/ops.py", line 919, in apply_groupwise
res = f(group)
File "/usr/local/python3.10/lib/python3.10/site-packages/modin/utils.py", line 765, in wrapper
result = func(*args, **kwargs)
File "/home/ecommerce_production_classification/database.py", line 46, in <lambda>print(data.loc[:5].groupby(by='company_id').apply(lambdax: detect_data(x)))
File "/home/ecommerce_production_classification/database.py", line 21, in detect_datareturn classification(data, _rulesDF)
File "/home/ecommerce_production_classification/categorization.py", line 227, in classification
data = categorization(data, rules)
File "/home/ecommerce_production_classification/categorization.py", line 209, in categorizationreturn process(data, rules, '分类')
File "/home/ecommerce_production_classification/categorization.py", line 205, in process
data[rules['赋值'].columns]=pd.DataFrame(data.apply(getCategories, axis=1).to_dict()).T
File "/usr/local/python3.10/lib/python3.10/site-packages/pandas/core/frame.py", line 10374, in applyreturn op.apply().__finalize__(self, method="apply")
File "/usr/local/python3.10/lib/python3.10/site-packages/pandas/core/apply.py", line 916, in applyreturnself.apply_standard()
File "/usr/local/python3.10/lib/python3.10/site-packages/pandas/core/apply.py", line 1063, in apply_standard
results, res_index =self.apply_series_generator()
File "/usr/local/python3.10/lib/python3.10/site-packages/pandas/core/apply.py", line 1081, in apply_series_generator
results[i] =self.func(v, *self.args, **self.kwargs)
File "/home/ecommerce_production_classification/categorization.py", line 168, in getCategories
_res = rules.loc[rules[('运算式','运算式')].apply(operationToBool)]
File "/usr/local/python3.10/lib/python3.10/site-packages/modin/logging/logger_decorator.py", line 144, in run_and_logreturn obj(*args, **kwargs)
File "/usr/local/python3.10/lib/python3.10/site-packages/modin/pandas/series.py", line 713, in applyif result.name ==self.index[0]:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Installed Versions
UserWarning: Setuptools is replacing distutils.
INSTALLED VERSIONS
commit : c8bbca8
python : 3.10.12.final.0
python-bits : 64
OS : Linux
OS-release : 3.10.0-1160.108.1.el7.x86_64
Version : #1 SMP Thu Jan 25 16:17:31 UTC 2024
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest released version of Modin.
I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)
Reproducible Example
importmodin.pandasaspddata1=pd.read_excel('abc.xlsx', header=[0,1]) # multiple headersdefanyFuncB(x):
dosomethingreturnxdefanyFuncA(x)
x.loc[data1[('col0','col1')].apply(anyFuncB)] #here cause the error, apply() results in a pd.Seriesdata=pd.read_excel('def.xlsx')
data.groupby(by='col0').apply(anyFuncA)
Issue Description
By just applying dataframe0.apply(anyFunc0), everything was good.
After applying dataframe0.groupby().apply(anyFunc0), if another dataframe1 has multi index and it runs dataframe1[('col0', 'col1')].apply(anyFunc1), File "/usr/local/python3.10/lib/python3.10/site-packages/modin/pandas/series.py", line 713, in apply if result.name == self.index[0]: raises ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(), because here result.name is a tuple with 2 items and self.index[0] is a numpy.int64, the result of comparison is a list contents two boolean values, my temp fix is adding following code:
elif return_type == "Series":
try:
if result.name == self.index[0]:
result.name = None
except:
if (result.name == self.index[0]).all():
result.name = None
other solution could be to determine if result.name and self.index[0] is single value or not.
Expected Behavior
make the comparison correct
Error Logs
Installed Versions
solution modified to:
if isinstance(_ := (result.name == self.index[0]), np.ndarray):
if _.all():
result.name = None
elif _:
result.name = None
Modin version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest released version of Modin.
I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)
Reproducible Example
Issue Description
By just applying dataframe0.apply(anyFunc0), everything was good.
After applying dataframe0.groupby().apply(anyFunc0), if another dataframe1 has multi index and it runs dataframe1[('col0', 'col1')].apply(anyFunc1),
File "/usr/local/python3.10/lib/python3.10/site-packages/modin/pandas/series.py", line 713, in apply
if result.name == self.index[0]:
raises ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(), because here result.name is a tuple with 2 items and self.index[0] is a numpy.int64, the result of comparison is a list contents two boolean values, my temp fix is adding following code:
other solution could be to determine if result.name and self.index[0] is single value or not.
Expected Behavior
make the comparison correct
Error Logs
Installed Versions
UserWarning: Setuptools is replacing distutils.
INSTALLED VERSIONS
commit : c8bbca8
python : 3.10.12.final.0
python-bits : 64
OS : Linux
OS-release : 3.10.0-1160.108.1.el7.x86_64
Version : #1 SMP Thu Jan 25 16:17:31 UTC 2024
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
Modin dependencies
modin : 0.31.0
ray : 2.30.0
dask : None
distributed : None
pandas dependencies
pandas : 2.2.2
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 65.5.0
pip : 24.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : 1.4.6
psycopg2 : None
jinja2 : 3.1.4
IPython : 8.26.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2024.6.0
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.4
pandas_gbq : None
pyarrow : 16.1.0
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : 2.0.31
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None
The text was updated successfully, but these errors were encountered: