Skip to content

get_dummies(dummy_na=True) treats int like float when creating dummy names #20693

@superxiao

Description

@superxiao

pd.get_dummies(pd.DataFrame({"id":[1,2,3]}), columns=["id"], dummy_na=True)
will produce

id_1.0 id_2.0 id_3.0 id_nan
0 1 0 0 0
1 0 1 0 0
2 0 0 1 0

Which creates different column names from pd.get_dummies(pd.DataFrame({"id":[1,2,3]}), columns=["id"])

id_1 id_2 id_3
0 1 0 0
1 0 1 0
2 0 0 1

It looks that this line

levels = np.append(levels, np.nan)

converts the int array into float array because of np.nan.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDtype ConversionsUnexpected or buggy dtype conversionsMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions