Skip to content

Initialize empty DataFrame with dataclass #60530

@brandonchinn178

Description

@brandonchinn178

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I wish there was some way to ensure a DataFrame has the same schema that would be inferred from a list of dataclass objects, if the list happens to be empty.

Feature Description

One possibility is to allow passing the dataclass as pd.DataFrame(dtype=Point2D). Ref #4464.

Note that while this is similar to #4464 (and possibly blocked by #4464), I don't think it's a duplicate, since there's still the issue of building the same dtype schema that would be inferred for a dataclass object, even if we had compound dtypes.

Alternative Solutions

Current workaround:

def _dataframe_with_schema(data: list[T], dataclass: type[T]):
    types = typing.get_type_hints(dataclass)
    overrides = {
        datetime.datetime: np.dtype("datetime64[ns]"),
    }
    schema = [
        (name, overrides.get(type, type))
        for name, type in (
            (field.name, types[field.name])
            for field in dataclasses.fields(dataclass)
        )
    ]

    df = pd.DataFrame(data, columns=[name for name, _ in schema])
    return df.astype(dict(schema))

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions