Skip to content

spatialdata_io.xenium fails with 'str' object has no attribute 'decode' unless transcripts=False #90

@pakiessling

Description

@pakiessling

Hi there,

I am running into an error loading Xenium data generated this week with:

spatialdata-io | 0.0.7
Instrument software version | 1.7.1.0
Analysis version | xenium-1.7.0.2

Error Message:
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)

----> [1](vscode-notebook-cell://ssh-remote%2Blogin18-2.hpc.itc.rwth-aachen.de/work/rwth1209/projects/xenium_qc/qc.ipynb#W3sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0) slide1_region1a = spatialdata_io.xenium("[/work/rwth1209/projects/xenium_qc/output-XETG00229__0008633__Region_2__20231025__142922](https://vscode-remote+ssh-002dremote-002blogin18-002d2-002ehpc-002eitc-002erwth-002daachen-002ede.vscode-resource.vscode-cdn.net/work/rwth1209/projects/xenium_qc/output-XETG00229__0008633__Region_2__20231025__142922)",
      [2](vscode-notebook-cell://ssh-remote%2Blogin18-2.hpc.itc.rwth-aachen.de/work/rwth1209/projects/xenium_qc/qc.ipynb#W3sdnNjb2RlLXJlbW90ZQ%3D%3D?line=1)                      )

File [/work/rwth1209/enviroments/spatial_data/lib/python3.11/site-packages/spatialdata_io/readers/xenium.py:125](https://vscode-remote+ssh-002dremote-002blogin18-002d2-002ehpc-002eitc-002erwth-002daachen-002ede.vscode-resource.vscode-cdn.net/work/rwth1209/enviroments/spatial_data/lib/python3.11/site-packages/spatialdata_io/readers/xenium.py:125), in xenium(path, n_jobs, cells_as_shapes, nucleus_boundaries, transcripts, morphology_mip, morphology_focus, imread_kwargs, image_models_kwargs)
    123 points = {}
    124 if transcripts:
--> 125     points["transcripts"] = _get_points(path, specs)
    127 images = {}
    128 if morphology_mip:

File [/work/rwth1209/enviroments/spatial_data/lib/python3.11/site-packages/spatialdata_io/readers/xenium.py:174](https://vscode-remote+ssh-002dremote-002blogin18-002d2-002ehpc-002eitc-002erwth-002daachen-002ede.vscode-resource.vscode-cdn.net/work/rwth1209/enviroments/spatial_data/lib/python3.11/site-packages/spatialdata_io/readers/xenium.py:174), in _get_points(path, specs)
    171 table["feature_name"] = table["feature_name"].apply(lambda x: x.decode("utf-8"), meta=("feature_name", "object"))
    173 transform = Scale([1.0 [/](https://vscode-remote+ssh-002dremote-002blogin18-002d2-002ehpc-002eitc-002erwth-002daachen-002ede.vscode-resource.vscode-cdn.net/) specs["pixel_size"], 1.0 [/](https://vscode-remote+ssh-002dremote-002blogin18-002d2-002ehpc-002eitc-002erwth-002daachen-002ede.vscode-resource.vscode-cdn.net/) specs["pixel_size"]], axes=("x", "y"))
--> 174 points = PointsModel.parse(
    175     table,
    176     coordinates={"x": XeniumKeys.TRANSCRIPTS_X, "y": XeniumKeys.TRANSCRIPTS_Y, "z": XeniumKeys.TRANSCRIPTS_Z},
    177     feature_key=XeniumKeys.FEATURE_NAME,
    178     instance_key=XeniumKeys.CELL_ID,
    179     transformations={"global": transform},
    180 )
    181 return points

File [/work/rwth1209/enviroments/spatial_data/lib/python3.11/functools.py:946](https://vscode-remote+ssh-002dremote-002blogin18-002d2-002ehpc-002eitc-002erwth-002daachen-002ede.vscode-resource.vscode-cdn.net/work/rwth1209/enviroments/spatial_data/lib/python3.11/functools.py:946), in singledispatchmethod.__get__.<locals>._method(*args, **kwargs)
    944 def _method(*args, **kwargs):
    945     method = self.dispatcher.dispatch(args[0].__class__)
--> 946     return method.__get__(obj, cls)(*args, **kwargs)

File [/work/rwth1209/enviroments/spatial_data/lib/python3.11/site-packages/spatialdata/models/models.py:572](https://vscode-remote+ssh-002dremote-002blogin18-002d2-002ehpc-002eitc-002erwth-002daachen-002ede.vscode-resource.vscode-cdn.net/work/rwth1209/enviroments/spatial_data/lib/python3.11/site-packages/spatialdata/models/models.py:572), in PointsModel._(cls, data, coordinates, feature_key, instance_key, transformations, **kwargs)
    570 for c in set(data.columns) - {feature_key, instance_key, *coordinates.values()}:
    571     table[c] = data[c]
--> 572 return cls._add_metadata_and_validate(
    573     table, feature_key=feature_key, instance_key=instance_key, transformations=transformations
    574 )

File [/work/rwth1209/enviroments/spatial_data/lib/python3.11/site-packages/spatialdata/models/models.py:600](https://vscode-remote+ssh-002dremote-002blogin18-002d2-002ehpc-002eitc-002erwth-002daachen-002ede.vscode-resource.vscode-cdn.net/work/rwth1209/enviroments/spatial_data/lib/python3.11/site-packages/spatialdata/models/models.py:600), in PointsModel._add_metadata_and_validate(cls, data, feature_key, instance_key, transformations)
    598 if is_categorical_dtype(data[c]) and not data[c].cat.known:
    599     try:
--> 600         data[c] = data[c].cat.set_categories(data[c].head(1).cat.categories)
    601     except ValueError:
    602         logger.info(f"Column `{c}` contains unknown categories. Consider casting it.")

File [/work/rwth1209/enviroments/spatial_data/lib/python3.11/site-packages/dask/threaded.py:89](https://vscode-remote+ssh-002dremote-002blogin18-002d2-002ehpc-002eitc-002erwth-002daachen-002ede.vscode-resource.vscode-cdn.net/work/rwth1209/enviroments/spatial_data/lib/python3.11/site-packages/dask/threaded.py:89), in get(dsk, keys, cache, num_workers, pool, **kwargs)
     86     elif isinstance(pool, multiprocessing.pool.Pool):
     87         pool = MultiprocessingPoolExecutor(pool)
---> 89 results = get_async(
     90     pool.submit,
     91     pool._max_workers,
     92     dsk,
     93     keys,
     94     cache=cache,
     95     get_id=_thread_get_id,
     96     pack_exception=pack_exception,
     97     **kwargs,
     98 )
    100 # Cleanup pools associated to dead threads
    101 with pools_lock:

File [/work/rwth1209/enviroments/spatial_data/lib/python3.11/site-packages/dask/local.py:511](https://vscode-remote+ssh-002dremote-002blogin18-002d2-002ehpc-002eitc-002erwth-002daachen-002ede.vscode-resource.vscode-cdn.net/work/rwth1209/enviroments/spatial_data/lib/python3.11/site-packages/dask/local.py:511), in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs)
    509         _execute_task(task, data)  # Re-execute locally
    510     else:
--> 511         raise_exception(exc, tb)
    512 res, worker_id = loads(res_info)
    513 state["cache"][key] = res

File [/work/rwth1209/enviroments/spatial_data/lib/python3.11/site-packages/dask/local.py:319](https://vscode-remote+ssh-002dremote-002blogin18-002d2-002ehpc-002eitc-002erwth-002daachen-002ede.vscode-resource.vscode-cdn.net/work/rwth1209/enviroments/spatial_data/lib/python3.11/site-packages/dask/local.py:319), in reraise(exc, tb)
    317 if exc.__traceback__ is not tb:
    318     raise exc.with_traceback(tb)
--> 319 raise exc

File [/work/rwth1209/enviroments/spatial_data/lib/python3.11/site-packages/dask/local.py:224](https://vscode-remote+ssh-002dremote-002blogin18-002d2-002ehpc-002eitc-002erwth-002daachen-002ede.vscode-resource.vscode-cdn.net/work/rwth1209/enviroments/spatial_data/lib/python3.11/site-packages/dask/local.py:224), in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
    222 try:
    223     task, data = loads(task_info)
--> 224     result = _execute_task(task, data)
    225     id = get_id()
    226     result = dumps((result, id))

File [/work/rwth1209/enviroments/spatial_data/lib/python3.11/site-packages/pandas/_libs/lib.pyx:2834](https://vscode-remote+ssh-002dremote-002blogin18-002d2-002ehpc-002eitc-002erwth-002daachen-002ede.vscode-resource.vscode-cdn.net/work/rwth1209/enviroments/spatial_data/lib/python3.11/site-packages/pandas/_libs/lib.pyx:2834), in pandas._libs.lib.map_infer()

File [/work/rwth1209/enviroments/spatial_data/lib/python3.11/site-packages/spatialdata_io/readers/xenium.py:171](https://vscode-remote+ssh-002dremote-002blogin18-002d2-002ehpc-002eitc-002erwth-002daachen-002ede.vscode-resource.vscode-cdn.net/work/rwth1209/enviroments/spatial_data/lib/python3.11/site-packages/spatialdata_io/readers/xenium.py:171), in _get_points.<locals>.<lambda>(x)
    169 def _get_points(path: Path, specs: dict[str, Any]) -> Table:
    170     table = read_parquet(path [/](https://vscode-remote+ssh-002dremote-002blogin18-002d2-002ehpc-002eitc-002erwth-002daachen-002ede.vscode-resource.vscode-cdn.net/) XeniumKeys.TRANSCRIPTS_FILE)
--> 171     table["feature_name"] = table["feature_name"].apply(lambda x: x.decode("utf-8"), meta=("feature_name", "object"))
    173     transform = Scale([1.0 [/](https://vscode-remote+ssh-002dremote-002blogin18-002d2-002ehpc-002eitc-002erwth-002daachen-002ede.vscode-resource.vscode-cdn.net/) specs["pixel_size"], 1.0 [/](https://vscode-remote+ssh-002dremote-002blogin18-002d2-002ehpc-002eitc-002erwth-002daachen-002ede.vscode-resource.vscode-cdn.net/) specs["pixel_size"]], axes=("x", "y"))
    174     points = PointsModel.parse(
    175         table,
    176         coordinates={"x": XeniumKeys.TRANSCRIPTS_X, "y": XeniumKeys.TRANSCRIPTS_Y, "z": XeniumKeys.TRANSCRIPTS_Z},
   (...)
    179         transformations={"global": transform},
    180     )

AttributeError: 'str' object has no attribute 'decode'

Maybe something in the headers changed?

Interestingly, setting transcripts=False avoids the error and results in an intact dataset containg .X .obs and .var

Let me know if you want me to upload the (small) dataset somewhere.

Cheers!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions