Can only use .str accessor with string values, which use np.object_ dtype in pandas #439

Clem-D · 2019-01-29T08:27:23Z

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): CentOS Linux release 7.5.1804 (Core)
Modin installed from (source or binary): pip install modin
Modin version: 0.3.0
Python version: 3.5.2
Exact command to reproduce: df['foo'] = df['foo'].str.replace('.', ',')

Describe the problem

This issue follows the #414
And yes df['foo'] = df['foo'].str.replace('.', ',') worked with pandas.
Actually all my code used to worked with pandas ^^

Source code / logs

Can only use .str accessor with string values, which use np.object_ dtype in pandas

The text was updated successfully, but these errors were encountered:

devin-petersohn · 2019-01-29T19:46:45Z

Thanks @Clem-D, was this working on 0.2.5? We recently added a SeriesView class that may be interfering with the normal behavior.

Would you be able to tell me if this works instead:

df['foo'] = df['foo'].series.str.replace('.', ',')

This will call the literal pandas series code. If it is working, it is a fairly easy fix.

Clem-D · 2019-01-30T08:52:46Z

Hello @devin-petersohn, I don't know if it works in 0.2.5 because as I said in #414 I got another error before (with "encoding" keyword).
Anyway using series doesn't work neither.

devin-petersohn · 2019-01-30T20:10:19Z

I see, the encoding does not allow testing on 0.2.5.

This is an interesting issue, because it is using pandas for that series code.

What does print(df['foo'].dtype) print? It is giving an error related to the dtype.

Also, does this fix the issue:

df['foo'] = df['foo'].apply(str).str.replace('.', ',')

It will force the column to string dtype because it is not recognizing it as a string column.

Clem-D · 2019-01-31T12:40:33Z

print(df['foo'].dtype) gives "object"
df['foo'] = df['foo'].apply(str).str.replace('.', ',') actually works :)

devin-petersohn · 2019-01-31T21:44:05Z

This is interesting. I will investigate to see if I can reproduce this. It may be that internally Modin is losing track of the dtype after encoding is set to latin1.

devin-petersohn · 2019-02-08T18:31:05Z

Hi @Clem-D, I have been trying to reproduce this error, but I haven't been successful. Here is some of the code I wrote to try to reproduce the error.

import pandas
import numpy as np

frame_data = np.random.randint(0, 100, size=(1000, 100))
df = pandas.DataFrame(frame_data).add_prefix("col")

# mix the dtypes to see if that is the issue
for i in range(len(df.columns)):
     df.iloc[:, i] = ["hi " + str(o) if o > 50 else o for o in df.iloc[:, i]]

df.to_excel("temp.xlsx", encoding="latin1")

import modin.pandas as pd
df = pd.read_excel("temp.xlsx", encoding="latin1")
df["col1"] = df["col1"].str.replace("hi", "hello")

Could you provide a sample of the column data that you're trying to replace? That would help a lot.

Clem-D · 2019-02-20T11:23:36Z

Sure ! Here is an example file
modinExample.xlsx

And the following code I use :

python --version gives me 3.5.1
then I typed python to access to the IDLE

>>> import ray
/home/user/.pyenv/versions/3.5.1/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
WARNING: Not monitoring node memory since `psutil` is not installed. Install this with `pip install psutil` (or ray[debug]) to enable debugging of memory-related crashes.

>> ray.init()
WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-02-20_12-04-38_32023/logs.
Waiting for redis server at 127.0.0.1:10671 to respond...
Waiting for redis server at 127.0.0.1:61758 to respond...
Starting Redis shard with 10.0 GB max memory.
Starting the Plasma object store with 6.6930343930000005 GB memory using /dev/shm.
Failed to start the UI, you may need to run 'pip install jupyter'.
{'object_store_address': '/tmp/ray/session_2019-02-20_12-04-38_32023/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2019-02-20_12-04-38_32023/sockets/raylet', 'redis_address': '10.69.10.51:10671', 'node_ip_address': None, 'webui_url': None}

>>> ray.global_state.cluster_resources()["CPU"]
4.0 

>>> import modin.pandas as pd
>>> df = pd.read_excel("/home/user/Desktop/modinExample.xlsx", encoding="latin1")
/home/cdelestre/.pyenv/versions/noe/lib/python3.5/site-packages/modin/error_message.py:32: UserWarning: `read_excel` defaulting to pandas implementation.
To request implementation, send an email to feature_requests@modin.org.
  warnings.warn(message)

>>> df['Parent(s) (by id coma separated ie. «1,2»)'] = df['Parent(s) (by id coma separated ie. «1,2»)'].str.replace('.', ',')
File "<stdin>", line 1, in <module>
  File "/home/user/.pyenv/versions/noe/lib/python3.5/site-packages/modin/pandas/series.py", line 181, in __getattribute__
    method = self.series.__getattribute__(item)
  File "/home/user/.pyenv/versions/noe/lib/python3.5/site-packages/pandas/core/accessor.py", line 133, in __get__
    accessor_obj = self._accessor(obj)
  File "/home/user/.pyenv/versions/noe/lib/python3.5/site-packages/pandas/core/strings.py", line 1895, in __init__
    self._validate(data)
  File "/home/user/.pyenv/versions/noe/lib/python3.5/site-packages/pandas/core/strings.py", line 1917, in _validate
    raise AttributeError("Can only use .str accessor with string "
AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

>>> df['foo'] = df['foo'].str.replace('.', ',')
File "<stdin>", line 1, in <module>
  File "/home/user/.pyenv/versions/noe/lib/python3.5/site-packages/modin/pandas/series.py", line 181, in __getattribute__
    method = self.series.__getattribute__(item)
  File "/home/user/.pyenv/versions/noe/lib/python3.5/site-packages/pandas/core/accessor.py", line 133, in __get__
    accessor_obj = self._accessor(obj)
  File "/home/user/.pyenv/versions/noe/lib/python3.5/site-packages/pandas/core/strings.py", line 1895, in __init__
    self._validate(data)
  File "/home/user/.pyenv/versions/noe/lib/python3.5/site-packages/pandas/core/strings.py", line 1917, in _validate
    raise AttributeError("Can only use .str accessor with string "
AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

>>> df['bar'] = df['bar'].str.replace('.', ',')
File "<stdin>", line 1, in <module>
  File "/home/user/.pyenv/versions/noe/lib/python3.5/site-packages/modin/pandas/series.py", line 181, in __getattribute__
    method = self.series.__getattribute__(item)
  File "/home/user/.pyenv/versions/noe/lib/python3.5/site-packages/pandas/core/accessor.py", line 133, in __get__
    accessor_obj = self._accessor(obj)
  File "/home/user/.pyenv/versions/noe/lib/python3.5/site-packages/pandas/core/strings.py", line 1895, in __init__
    self._validate(data)
  File "/home/user/.pyenv/versions/noe/lib/python3.5/site-packages/pandas/core/strings.py", line 1917, in _validate
    raise AttributeError("Can only use .str accessor with string "
AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas

pip freeze gives me :

dask==1.1.0
(...)
modin==0.3.0
(...)
numpy==1.15.0
pandas==0.23.4
pathlib2==2.3.3
(...)
xlrd==1.0.0
XlsxWriter==1.0.2

(I filtered only interesting packages but I can give you the full output if you want).

devin-petersohn · 2019-03-21T18:56:19Z

Hi @Clem-D, sorry for the late reply. With the example you provided, I get the same error in pandas and Modin. Is there a different example that is also working in pandas?

Clem-D · 2019-04-15T08:38:12Z

Hello,
You're right it's because I forgot to add the line
df[["Parent(s) (by id coma separated ie. «1,2»)"]] = df[["Parent(s) (by id coma separated ie. «1,2»)"]].astype(str)
Before
df['Parent(s) (by id coma separated ie. «1,2»)'] = df['Parent(s) (by id coma separated ie. «1,2»)'].str.replace('.', ',')
which works with regular panda

devin-petersohn · 2020-06-01T14:45:46Z

Closing this. Feel free to reopen if the discussion should continue or if issue was not resolved.

devin-petersohn closed this as completed Jun 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can only use .str accessor with string values, which use np.object_ dtype in pandas #439

Can only use .str accessor with string values, which use np.object_ dtype in pandas #439

Clem-D commented Jan 29, 2019

devin-petersohn commented Jan 29, 2019

Clem-D commented Jan 30, 2019

devin-petersohn commented Jan 30, 2019

Clem-D commented Jan 31, 2019

devin-petersohn commented Jan 31, 2019 •

edited

Loading

devin-petersohn commented Feb 8, 2019

Clem-D commented Feb 20, 2019

devin-petersohn commented Mar 21, 2019

Clem-D commented Apr 15, 2019

devin-petersohn commented Jun 1, 2020

Can only use .str accessor with string values, which use np.object_ dtype in pandas #439

Can only use .str accessor with string values, which use np.object_ dtype in pandas #439

Comments

Clem-D commented Jan 29, 2019

System information

Describe the problem

Source code / logs

devin-petersohn commented Jan 29, 2019

Clem-D commented Jan 30, 2019

devin-petersohn commented Jan 30, 2019

Clem-D commented Jan 31, 2019

devin-petersohn commented Jan 31, 2019 • edited Loading

devin-petersohn commented Feb 8, 2019

Clem-D commented Feb 20, 2019

devin-petersohn commented Mar 21, 2019

Clem-D commented Apr 15, 2019

devin-petersohn commented Jun 1, 2020

devin-petersohn commented Jan 31, 2019 •

edited

Loading