-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can only use .str accessor with string values, which use np.object_ dtype in pandas #439
Comments
Thanks @Clem-D, was this working on 0.2.5? We recently added a Would you be able to tell me if this works instead: df['foo'] = df['foo'].series.str.replace('.', ',') This will call the literal pandas series code. If it is working, it is a fairly easy fix. |
Hello @devin-petersohn, I don't know if it works in 0.2.5 because as I said in #414 I got another error before (with "encoding" keyword). |
I see, the encoding does not allow testing on 0.2.5. This is an interesting issue, because it is using pandas for that What does Also, does this fix the issue: df['foo'] = df['foo'].apply(str).str.replace('.', ',') It will force the column to string dtype because it is not recognizing it as a string column. |
|
This is interesting. I will investigate to see if I can reproduce this. It may be that internally Modin is losing track of the |
Hi @Clem-D, I have been trying to reproduce this error, but I haven't been successful. Here is some of the code I wrote to try to reproduce the error. import pandas
import numpy as np
frame_data = np.random.randint(0, 100, size=(1000, 100))
df = pandas.DataFrame(frame_data).add_prefix("col")
# mix the dtypes to see if that is the issue
for i in range(len(df.columns)):
df.iloc[:, i] = ["hi " + str(o) if o > 50 else o for o in df.iloc[:, i]]
df.to_excel("temp.xlsx", encoding="latin1")
import modin.pandas as pd
df = pd.read_excel("temp.xlsx", encoding="latin1")
df["col1"] = df["col1"].str.replace("hi", "hello") Could you provide a sample of the column data that you're trying to |
Sure ! Here is an example file And the following code I use :
>>> import ray
/home/user/.pyenv/versions/3.5.1/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
WARNING: Not monitoring node memory since `psutil` is not installed. Install this with `pip install psutil` (or ray[debug]) to enable debugging of memory-related crashes.
>> ray.init()
WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
Process STDOUT and STDERR is being redirected to /tmp/ray/session_2019-02-20_12-04-38_32023/logs.
Waiting for redis server at 127.0.0.1:10671 to respond...
Waiting for redis server at 127.0.0.1:61758 to respond...
Starting Redis shard with 10.0 GB max memory.
Starting the Plasma object store with 6.6930343930000005 GB memory using /dev/shm.
Failed to start the UI, you may need to run 'pip install jupyter'.
{'object_store_address': '/tmp/ray/session_2019-02-20_12-04-38_32023/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2019-02-20_12-04-38_32023/sockets/raylet', 'redis_address': '10.69.10.51:10671', 'node_ip_address': None, 'webui_url': None}
>>> ray.global_state.cluster_resources()["CPU"]
4.0
>>> import modin.pandas as pd
>>> df = pd.read_excel("/home/user/Desktop/modinExample.xlsx", encoding="latin1")
/home/cdelestre/.pyenv/versions/noe/lib/python3.5/site-packages/modin/error_message.py:32: UserWarning: `read_excel` defaulting to pandas implementation.
To request implementation, send an email to feature_requests@modin.org.
warnings.warn(message)
>>> df['Parent(s) (by id coma separated ie. «1,2»)'] = df['Parent(s) (by id coma separated ie. «1,2»)'].str.replace('.', ',')
File "<stdin>", line 1, in <module>
File "/home/user/.pyenv/versions/noe/lib/python3.5/site-packages/modin/pandas/series.py", line 181, in __getattribute__
method = self.series.__getattribute__(item)
File "/home/user/.pyenv/versions/noe/lib/python3.5/site-packages/pandas/core/accessor.py", line 133, in __get__
accessor_obj = self._accessor(obj)
File "/home/user/.pyenv/versions/noe/lib/python3.5/site-packages/pandas/core/strings.py", line 1895, in __init__
self._validate(data)
File "/home/user/.pyenv/versions/noe/lib/python3.5/site-packages/pandas/core/strings.py", line 1917, in _validate
raise AttributeError("Can only use .str accessor with string "
AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
>>> df['foo'] = df['foo'].str.replace('.', ',')
File "<stdin>", line 1, in <module>
File "/home/user/.pyenv/versions/noe/lib/python3.5/site-packages/modin/pandas/series.py", line 181, in __getattribute__
method = self.series.__getattribute__(item)
File "/home/user/.pyenv/versions/noe/lib/python3.5/site-packages/pandas/core/accessor.py", line 133, in __get__
accessor_obj = self._accessor(obj)
File "/home/user/.pyenv/versions/noe/lib/python3.5/site-packages/pandas/core/strings.py", line 1895, in __init__
self._validate(data)
File "/home/user/.pyenv/versions/noe/lib/python3.5/site-packages/pandas/core/strings.py", line 1917, in _validate
raise AttributeError("Can only use .str accessor with string "
AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
>>> df['bar'] = df['bar'].str.replace('.', ',')
File "<stdin>", line 1, in <module>
File "/home/user/.pyenv/versions/noe/lib/python3.5/site-packages/modin/pandas/series.py", line 181, in __getattribute__
method = self.series.__getattribute__(item)
File "/home/user/.pyenv/versions/noe/lib/python3.5/site-packages/pandas/core/accessor.py", line 133, in __get__
accessor_obj = self._accessor(obj)
File "/home/user/.pyenv/versions/noe/lib/python3.5/site-packages/pandas/core/strings.py", line 1895, in __init__
self._validate(data)
File "/home/user/.pyenv/versions/noe/lib/python3.5/site-packages/pandas/core/strings.py", line 1917, in _validate
raise AttributeError("Can only use .str accessor with string "
AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
dask==1.1.0 (I filtered only interesting packages but I can give you the full output if you want). |
Hi @Clem-D, sorry for the late reply. With the example you provided, I get the same error in pandas and Modin. Is there a different example that is also working in pandas? |
Hello, |
Closing this. Feel free to reopen if the discussion should continue or if issue was not resolved. |
System information
df['foo'] = df['foo'].str.replace('.', ',')
Describe the problem
This issue follows the #414
And yes
df['foo'] = df['foo'].str.replace('.', ','
) worked with pandas.Actually all my code used to worked with pandas ^^
Source code / logs
Can only use .str accessor with string values, which use np.object_ dtype in pandas
The text was updated successfully, but these errors were encountered: