# Tables in SpatialData

## Intro
SpatialData supports storing the annotations of `SpatialElement`s by using `AnnData` tables. These tables either annotate one or more `SpatialElement`s at the same time or no element at all. The latter can be useful when for example storing a codebook. Multiple tables can be stored in the `SpatialData` object Here we will show the following using the blobs dataset in SpatialData as an example dataset:

1. How to retrieve information on what elements a table is annotating.
2. How to change the annotation metadata of a table
3. How to construct tables that either annotate or do not annotate `SpatialElement`s and how to add them to the SpatialData object.
4. How to perform SQL like joins on tables and `SpatialElement`s in a `SpatialData` object.

## Tables annotating SpatialElements

### Retrieving information existing tables in SpatialData object
We first will show how to retrieve information from an already existing table in a `SpatialData` object

In [1]:
from spatialdata.datasets import blobs

sdata = blobs()
sdata

  table = TableModel.parse(table, region=shapes_name, region_key=region_key, instance_key=instance_key)


SpatialData object with:
├── Images
│     ├── 'blobs_image': SpatialImage[cyx] (3, 512, 512)
│     └── 'blobs_multiscale_image': MultiscaleSpatialImage[cyx] (3, 512, 512), (3, 256, 256), (3, 128, 128)
├── Labels
│     ├── 'blobs_labels': SpatialImage[yx] (512, 512)
│     └── 'blobs_multiscale_labels': MultiscaleSpatialImage[yx] (512, 512), (256, 256), (128, 128)
├── Points
│     └── 'blobs_points': DataFrame with shape: (<Delayed>, 4) (2D points)
├── Shapes
│     ├── 'blobs_circles': GeoDataFrame shape: (5, 2) (2D shapes)
│     ├── 'blobs_multipolygons': GeoDataFrame shape: (2, 1) (2D shapes)
│     └── 'blobs_polygons': GeoDataFrame shape: (5, 1) (2D shapes)
└── Tables
      └── 'table': AnnData (26, 3)
with coordinate systems:
▸ 'global', with elements:
        blobs_image (Images), blobs_multiscale_image (Images), blobs_labels (Labels), blobs_multiscale_labels (Labels), blobs_points (Points), blobs_circles (Shapes), blobs_multipolygons (Shapes), blobs_polygons (Shapes)

Blobs contains one table. Let's see what it is annotating.

In [2]:
from spatialdata.models import get_table_keys

regions, region_column_name, instance_key_column_name = get_table_keys(sdata["table"])
print(regions, region_column_name, instance_key_column_name)

blobs_labels region instance_id


We can see that this information is present in `sdata["table"].obs`. Here `regions` are the names of the `SpatialElement`s being annotated, the `region_column_name` corresponds to the column in `.obs` containing per row which `SpatialElement` is annotated by that row and the `instance_key_column_name` specifies the column with per row the information which `index` of the `SpatialElement` the table is annotating.

In [3]:
sdata["table"].obs.head()

Unnamed: 0,instance_id,region
1,1,blobs_labels
2,2,blobs_labels
3,3,blobs_labels
4,4,blobs_labels
5,5,blobs_labels


If we want to retrieve either of the three, there are three helper functions that allow for this, namely `get_annotated_regions`, `get_region_key_column` and `get_instance_key_column`. Either of these take only the `AnnData`table as an argument. Below we show an example:

In [4]:
from spatialdata import SpatialData as sd

region_column = sd.get_region_key_column(sdata["table"])
print(region_column.head())

1    blobs_labels
2    blobs_labels
3    blobs_labels
4    blobs_labels
5    blobs_labels
Name: region, dtype: category
Categories (1, object): ['blobs_labels']


## Changing the annotation target of a table
We have a helper function, `set_table_annotates_element` to change the metadata regarding the annotation target of table in a `SpatialData` object. This function takes as arguments the `table_name`, `region` and optionally the `region_key` and `instance_key`. The latter two don't have to necessarily be specified if the table is already annotating a `SpatialElement`. The current values will be reused if not specified. For any of the arguments specified, they must be present at their respective location in the `SpatialData`object or table.

In [5]:
sdata["table"].obs["region"] = "blobs_circles"
sdata.set_table_annotates_spatialelement("table", region="blobs_circles")

## Constructing and adding tables to a SpatialData object
Tables in `Spatialdata` are `AnnData` tables. Creating a table that does not annotate a `SpatialElement` is as simple as constructing an `Anndata` table. In this case the table should not contain `region`, `region_key` and `instance_key` metadata. Here an example of a table storing a dummy codebook:

In [6]:
from anndata import AnnData
from spatialdata.models import TableModel

codebook_table = AnnData(obs={"Gene": ["Gene1", "Gene2", "Gene3"], "barcode": ["03210", "23013", "30121"]})

# We don't specify arguments related to metadata as we don't annotate a SpatialElement.
sdata_table = TableModel.parse(codebook_table)
sdata["codebook"] = sdata_table
sdata["codebook"].obs

Unnamed: 0,Gene,barcode
0,Gene1,3210
1,Gene2,23013
2,Gene3,30121


Now let us create a table that annotates multiple `SpatialElement`s. Note that the order of 
the indices does not have to match the order of the indices in the `SpatialElement`. To 
showcase this we perform a permutation of the indices. Also, the `dtype`of the index column 
of the `SpatialElement` must match the `dtype` of the `instance_key` column in the table. If 
this is not the case this will result in an error when trying to add the table to the 
`SpatialData` object. Lastly, not every index of the `SpatialElement` has to be present in 
the `instance_key` column of the `SpatialData` table and vice versa. We will later show 
functions to deal with these cases.

In [7]:
polygon_index = list(sdata.shapes["blobs_polygons"].index)
# We have to do a compute here as points are lazily loaded using dask.
point_index = list(sdata["blobs_points"].index.compute())

region_column = ["blobs_polygons"] * len(polygon_index) + ["blobs_points"] * len(point_index)
instance_id_column = polygon_index + point_index

In [8]:
import numpy as np

RNG = np.random.default_rng()
table = AnnData(
    X=np.zeros((len(region_column), 1)),
    obs={"region": region_column, "instance_id": instance_id_column},
)
table = table[RNG.permutation(table.obs.index), :].copy()

# Now we have to specify all 3 annotation metadata fields.
sdata_table = TableModel.parse(
    table, region=["blobs_polygons", "blobs_points"], region_key="region", instance_key="instance_id"
)

# When adding the table now, it is validated for presence of annotation targets in the sdata object.
sdata["annotations"] = sdata_table

  sdata_table = TableModel.parse(table, region=["blobs_polygons", "blobs_points"], region_key="region", instance_key="instance_id")


### Performing SQL like joins
In order to retrieve (non) matching parts of a `SpatialElement` and the annotating tables we 
can perform SQL like join operations on the table. For this, we have the function 
`join_sdata_spatialelement_table`. It takes as arguments the `SpatialData` object, 
`spatial_element_name` as either a `str` or a list of `str`, `how` which indicates what kind 
of SQL like operation to perform (left, left_exclusive, inner, right or right_exclusive). 
Lastly, if you want the indices to match you can indicate this by passing `left` or `right` 
to the `match_rows` argument. The default here is `no`.
![sql_joins](attachments/joins_small.png)

Let us now showcase the function by first removing some indices from `blobs_polygons`and then 
performing a join.

In [9]:
from spatialdata import join_sdata_spatialelement_table

# This leaves the element with 3 shapes
sdata["blobs_polygons"] = sdata["blobs_polygons"][:3]

# We can now do a join with one spatial element
element_dict, table = join_sdata_spatialelement_table(
    sdata, spatial_element_name="blobs_polygons", table_name="annotations", how="left"
)
print(element_dict["blobs_polygons"])
table.obs

                                            geometry
0  POLYGON ((340.197 258.214, 316.177 197.065, 29...
1  POLYGON ((284.141 420.454, 267.249 371.319, 25...
2  POLYGON ((203.195 229.528, 285.506 204.414, 19...


  self._check_key(key, self.keys(), self._shared_keys)
  groups_df = table.obs.groupby(by=region_column_name)


Unnamed: 0,region,instance_id
1,blobs_polygons,1
0,blobs_polygons,0
2,blobs_polygons,2


Above we see that the table only contains those annotations corresponding to shapes that are 
still in `blobs_polygons`. The `element_dict` only contains `SpatialElement`s used in the 
join. Let us now repeat the join but with the table rows matching the indices of 
`blobs_polygons`.

In [10]:
element_dict, table = join_sdata_spatialelement_table(
    sdata, spatial_element_name="blobs_polygons", table_name="annotations", how="left", match_rows="left"
)
print(element_dict["blobs_polygons"])
table.obs

                                            geometry
0  POLYGON ((340.197 258.214, 316.177 197.065, 29...
1  POLYGON ((284.141 420.454, 267.249 371.319, 25...
2  POLYGON ((203.195 229.528, 285.506 204.414, 19...


  groups_df = table.obs.groupby(by=region_column_name)


Unnamed: 0,region,instance_id
0,blobs_polygons,0
1,blobs_polygons,1
2,blobs_polygons,2


Let us now add the filtered annotations back to the `SpatialData` object. This requires us to
use a slightly different function then the one we learned about earlier for adjusting the 
annotation metadata.

In [11]:
table = sd.table_annotates_spatialelement(table)
sdata["filtered_annotations_blobs_polygons"] = table

Lastly, we can also do the join operation on multiple `SpatialElements` at the same time.

In [12]:
element_dict, table = join_sdata_spatialelement_table(
    sdata,
    spatial_element_name=["blobs_polygons", "blobs_points"],
    table_name="annotations",
    how="left",
    match_rows="left",
)
sdata["multi_filtered_table"] = table
table.obs

  groups_df = table.obs.groupby(by=region_column_name)


Unnamed: 0,region,instance_id
0,blobs_polygons,0
1,blobs_polygons,1
2,blobs_polygons,2
5,blobs_points,0
6,blobs_points,1
...,...,...
200,blobs_points,195
201,blobs_points,196
202,blobs_points,197
203,blobs_points,198


In [13]:
# This tells us which tables we have in the SpatialData object
sdata.tables

{'table': AnnData object with n_obs × n_vars = 26 × 3
    obs: 'instance_id', 'region'
    uns: 'spatialdata_attrs', 'codebook': AnnData object with n_obs × n_vars = 3 × 0
    obs: 'Gene', 'barcode', 'annotations': AnnData object with n_obs × n_vars = 205 × 1
    obs: 'region', 'instance_id'
    uns: 'spatialdata_attrs', 'filtered_annotations_blobs_polygons': AnnData object with n_obs × n_vars = 3 × 1
    obs: 'region', 'instance_id'
    uns: 'spatialdata_attrs', 'multi_filtered_table': AnnData object with n_obs × n_vars = 203 × 1
    obs: 'region', 'instance_id'
    uns: 'spatialdata_attrs'}

### Ending on a special note
For the joins on `Shapes` and `Points` elements any type of join is supported and also any
kind of matching is supported. For `Labels` elements however, only the left join is supported
and also only `no` and `left` are supported for the argument `match_rows`. This because for `Labels` the SQL like join behaviour would otherwise be complex to implement. We also did
not forsee a usecase for this. In case you have a usecase for this, please get in touch with us by either opening an issue on [github](https://github.com/scverse/spatialdata) or via our [discourse](https://discourse.scverse.org/).