You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Reproducible example
pl.DataFrame(np.array([[[5], 7]], np.object_), schema={"a": pl.List(pl.Int64), "b": pl.Int64}, orient="row")
# or also more simplypl.Series("k", np.array([5], np.object_), pl.Int64)
Log output
---------------------------------------------------------------------------
ComputeError Traceback (most recent call last)
Cell In[35], line 1
----> 1 pl.DataFrame(np.array([[5.6]], np.object_), schema={"a": pl.Float64}, orient="row")
File ~/.local/share/virtualenvs/sc-api-DipHMMiG/lib/python3.12/site-packages/polars/dataframe/frame.py:384, in DataFrame.__init__(self, data, schema, schema_overrides, strict, orient, infer_schema_length, nan_to_null)
379 self._df = series_to_pydf(
380 data, schema=schema, schema_overrides=schema_overrides, strict=strict
381 )
383 elif _check_for_numpy(data) and isinstance(data, np.ndarray):
--> 384 self._df = numpy_to_pydf(
385 data,
386 schema=schema,
387 schema_overrides=schema_overrides,
388 strict=strict,
389 orient=orient,
390 nan_to_null=nan_to_null,
391 )
393 elif _check_for_pyarrow(data) and isinstance(data, pa.Table):
394 self._df = arrow_to_pydf(
395 data, schema=schema, schema_overrides=schema_overrides, strict=strict
396 )
File ~/.local/share/virtualenvs/sc-api-DipHMMiG/lib/python3.12/site-packages/polars/_utils/construction/dataframe.py:1331, in numpy_to_pydf(data, schema, schema_overrides, orient, strict, nan_to_null)
1328 else:
1329 if orient == "row":
1330 data_series = [
-> 1331 pl.Series(
1332 name=column_names[i],
1333 values=(
1334 data
1335 if two_d and n_columns == 1 and shape[1] > 1
1336 else data[:, i]
1337 ),
1338 dtype=schema_overrides.get(column_names[i]),
1339 strict=strict,
1340 nan_to_null=nan_to_null,
1341 )._s
1342 foriin range(n_columns)
1343 ]
1344 else:
1345 data_series = [
1346 pl.Series(
1347 name=column_names[i],
(...)
1355 foriin range(n_columns)
1356 ]
File ~/.local/share/virtualenvs/sc-api-DipHMMiG/lib/python3.12/site-packages/polars/series/series.py:319, in Series.__init__(self, name, values, dtype, strict, nan_to_null)
316 return
318 if dtype is not None:
--> 319 self._s = self.cast(dtype, strict=strict)._s
321 elif _check_for_pyarrow(values) and isinstance(
322 values, (pa.Array, pa.ChunkedArray)
323 ):
324 self._s = arrow_to_pyseries(name, values, dtype=dtype, strict=strict)
File ~/.local/share/virtualenvs/sc-api-DipHMMiG/lib/python3.12/site-packages/polars/series/series.py:3992, in Series.cast(self, dtype, strict, wrap_numerical)
3990 # Do not dispatch cast as it is expensive and used in other functions.
3991 dtype = parse_into_dtype(dtype)
-> 3992 return self._from_pyseries(self._s.cast(dtype, strict, wrap_numerical))
ComputeError: cannot cast 'Object'type
Issue description
DataFrame with non pl.Object schema cannot be created from numpy arrays with dtype object.
Working in numpy with `np.object_ is indispensable when other columns are strings or nested arrays, or to set "nulls" with NaN for integer columns.
However, since polars dtype is columnar, it guess it should support to concretize it.
Related to #17484
This isn't a bug, as the error message says ComputeError: cannot cast 'Object' type. You've got to do the conversion on the numpy/python side from object to a strict dtype that polars can work with.
@deanm000 thanks for the workaround.
Unfortunately, as I mentioned, how can I then import nulls values for string columns? With polars 0.x it was possible, through None's, but with polars 1.x I still haven't found any way.
Checks
Reproducible example
Log output
Issue description
DataFrame with non pl.Object schema cannot be created from numpy arrays with dtype object.
Working in numpy with `np.object_ is indispensable when other columns are strings or nested arrays, or to set "nulls" with NaN for integer columns.
However, since polars dtype is columnar, it guess it should support to concretize it.
Related to #17484
Expected behavior
It should work the same as
Installed versions
Same with numpy 2.0.1
The text was updated successfully, but these errors were encountered: