Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pd.DataFrame don't always return the same dtype as in pandas #4060

Closed
c3-cjazra opened this issue Jan 26, 2022 · 3 comments
Closed

pd.DataFrame don't always return the same dtype as in pandas #4060

c3-cjazra opened this issue Jan 26, 2022 · 3 comments
Labels
bug 🦗 Something isn't working External Pull requests and issues from people who do not regularly contribute to modin P2 Minor bugs or low-priority feature requests pandas concordance 🐼 Functionality that does not match pandas

Comments

@c3-cjazra
Copy link

c3-cjazra commented Jan 26, 2022

System information

modin 0.12.0
Ray 1.7.1
python 3.9.7

Describe the problem

import modin.pandas as modin_pd
import pandas as pandas_pd


df_modin = modin_pd.DataFrame({"b": modin_pd.Series([], dtype='bool'),
                "a": modin_pd.Series([], dtype='int64')
            }, index=[])

df_pandas = pandas_pd.DataFrame({"b": pandas_pd.Series([], dtype='bool'),
                "a": pandas_pd.Series([], dtype='int64')
            }, index=[])

print(df_pandas.dtypes)
print(df_modin.dtypes)

---- df_pandas.dtypes
b     bool
a    int64
dtype: object

----df_modin.dtypes
b    object. <-- not like pandas
a    object. <-- 
dtype: object

also _to_pandas() seems to modify the dtypes to object. For exmaple

json ='{"schema":{"fields":[{"name":"index","type":"string"},{"name":"b","type":"boolean"},{"name":"a","type":"integer"}],"primaryKey":["index"],"pandas_version":"0.20.0"},"data":[]}'
df_modin = modin_pd.read_json(a, orient='table')
df_pandas = pandas_pd.read_json(a, orient='table')

assert df_pandas.dtypes.equals(df_modin.dtypes) <-- true
 
print(df_pandas.dtypes)
print(df_modin._to_pandas().dtypes) <-- _to_pandas changes the dtype to object

--- df_pandas.dtypes
b     bool 
a    int64
dtype: object

---- df_modin._to_pandas().dtypes
b    object
a    object
dtype: object
@mvashishtha mvashishtha added bug 🦗 Something isn't working pandas concordance 🐼 Functionality that does not match pandas labels Jan 27, 2022
@mvashishtha
Copy link
Collaborator

mvashishtha commented Jan 27, 2022

@c3-cjazra Thank you for reporting this issue! I've massaged your code snippets a little to reproduce each issue. The first is:

import modin.pandas as pd
import pandas

df_modin = pd.DataFrame({"b": pd.Series([], dtype='bool'),
                "a": pd.Series([], dtype='int64')
            }, index=[])

df_pandas = pandas.DataFrame({"b": pandas.Series([], dtype='bool'),
                "a": pandas.Series([], dtype='int64')
            }, index=[])

print(df_pandas.dtypes)
print(df_modin.dtypes)

And the second snippet:

import modin.pandas as pd
import pandas

json ='{"schema":{"fields":[{"name":"index","type":"string"},{"name":"b","type":"boolean"},{"name":"a","type":"integer"}],"primaryKey":["index"],"pandas_version":"0.20.0"},"data":[]}'
df_modin = pd.read_json(json, orient='table')
df_pandas = pandas.read_json(json, orient='table')

assert df_pandas.dtypes.equals(df_modin.dtypes) 
 
print(df_pandas.dtypes)
print(df_modin._to_pandas().dtypes) 

@mvashishtha mvashishtha self-assigned this Jan 28, 2022
mvashishtha pushed a commit to mvashishtha/modin that referenced this issue Jan 28, 2022
Signed-off-by: mvashishtha <mahesh@ponder.io>
@mvashishtha mvashishtha removed their assignment Feb 11, 2022
@mvashishtha
Copy link
Collaborator

mvashishtha commented Feb 11, 2022

Per the discussion here in my attempted fix #4108 , we don't yet know how to fix this.

@vnlitvinov vnlitvinov added the P2 Minor bugs or low-priority feature requests label Aug 29, 2022
@anmyachev anmyachev added the External Pull requests and issues from people who do not regularly contribute to modin label Apr 19, 2023
@anmyachev
Copy link
Collaborator

The problem is fixed on the current master - 947e06b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working External Pull requests and issues from people who do not regularly contribute to modin P2 Minor bugs or low-priority feature requests pandas concordance 🐼 Functionality that does not match pandas
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants