Skip to content

Proseg v3 transcript dataframe issue #1137

@niklasmueboe

Description

@niklasmueboe

Proseg v3 directly outputs the data as spatialdata zarr-store. The Points dataframe (stored as parquet within the zarr) that contains the transcripts has a column called assignment that stores the cell assignment as integer. However, for transcripts assigned to background this is null. When reading the zarr-store with spatialdata dask/pandas converts this column to float due to the null values.

Theoretically this issue could easily be fixed by changing the dtype_backend in the read_parquet function for the points. However, this will currently fail the validation logic (apparently only numpy dtypes are allowed?) and may have further implications.

This issue does not exist when writing the zarr-store directly via spatialdata as pandas will store a bunch of pandas-specific metadata into the parquet file including the dataype-backend for each column. But given that Proseg writes the dataframe directly from Rust with an Arrow Writer this metadata is not available and integer columns with null will be converted to float when loading it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions