Proseg v3 transcript dataframe issue

Proseg v3 directly outputs the data as spatialdata zarr-store. The Points dataframe (stored as parquet within the zarr) that contains the transcripts has a column called assignment that stores the cell assignment as integer. However, for transcripts assigned to background this is null. When reading the zarr-store with spatialdata dask/pandas converts this column to float due to the null values.

Theoretically this issue could easily be fixed by changing the `dtype_backend` in the `read_parquet` function for the points. However, this will currently fail the validation logic (apparently only numpy dtypes are allowed?) and may have further implications.

This issue does not exist when writing the zarr-store directly via spatialdata as pandas will store a bunch of pandas-specific metadata into the parquet file including the dataype-backend for each column. But given that Proseg writes the dataframe directly from Rust with an Arrow Writer this metadata is not available and integer columns with null will be converted to float when loading it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proseg v3 transcript dataframe issue #1137

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Proseg v3 transcript dataframe issue #1137

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions