Skip to content

table_name= silently discarded when element already has a column with the same name #620

@timtreis

Description

@timtreis

table_name= silently discarded when element already has a column with the same name

Environment: spatialdata-plot 0.3.4.dev (main, commit 5cfedc7), Python 3.13


Problem

When the user specifies table_name="t" to color shapes by a column in a specific table, the parameter is silently discarded if the element's own GeoDataFrame already has a column with the same name.

The code at utils.py:2843:

if not labels and col_for_color in sdata[element_name].columns:
    table_name = None   # user's explicit table_name is thrown away

This is a silent contract violation: table_name= is explicitly documented to select which table to draw color data from, but the API ignores it without any warning, error, or documentation note. The element column is always preferred, regardless of what the user requested.


Minimal reproducible example

import matplotlib; matplotlib.use("Agg")
import matplotlib.pyplot as plt
import numpy as np, pandas as pd, geopandas as gpd, anndata as ad
import dask; dask.config.set({"dataframe.query-planning": False})
from shapely.geometry import Point
import spatialdata as sd
from spatialdata.models import ShapesModel, TableModel
import spatialdata_plot

# "cat" exists in BOTH element AND table — with different values
shapes = ShapesModel.parse(gpd.GeoDataFrame({
    "geometry": [Point(5, 5), Point(15, 5)],
    "radius": [2.0, 2.0],
    "cat": pd.Categorical(["X", "Y"]),   # element column
}))
obs = pd.DataFrame({
    "instance_id": [0, 1],
    "region": ["s1", "s1"],
    "cat": pd.Categorical(["A", "B"]),   # table column — different values
})
obs.index = obs.index.astype(str)
table = TableModel.parse(
    ad.AnnData(X=np.zeros((2, 1)), obs=obs),
    region=["s1"], region_key="region", instance_key="instance_id"
)
sdata = sd.SpatialData(shapes={"s1": shapes}, tables={"t": table})

fig, ax = plt.subplots()
# User explicitly requests table "t"
sdata.pl.render_shapes("s1", color="cat", table_name="t").pl.show(ax=ax)
legend = [t.get_text() for t in ax.get_legend().get_texts()]
print(f"legend = {legend}")
# Expected: ['A', 'B']  (from table "t")
# Actual:   ['X', 'Y']  (from element — table_name ignored)

Expected behaviour

table_name="t" should cause the color data to come from sdata["t"].obs["cat"], yielding legend entries ["A", "B"].

Actual behaviour

legend = ['X', 'Y'] — the element column "cat" is used, identical to a call with no table_name=. The table_name argument has zero effect.


Fix sketch

At utils.py:2843, respect an explicitly-provided table_name. Only fall back to the element column when table_name is None:

# Current (broken):
if not labels and col_for_color in sdata[element_name].columns:
    table_name = None

# Fixed: only fall back when no explicit table was requested
if not labels and col_for_color in sdata[element_name].columns and table_name is None:
    pass  # element column path — table_name stays None

When table_name is not None AND the element has the same column, the explicit table should win (or at minimum emit a UserWarning explaining which source is being used).


Triage tier: Tier 3

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions