Here is my code for making our data compatible with how iSTAR expects it. Make sure you git clone the star github repository (https://github.com/daviddaiweizhang/istar.git) into your project. (the git ignore will make sure it doesn't get uploaded). You should try to 
run the demo script in the istar repo first to make sure your environment is correctly set up. It took me a little longer than expected - but make sure you make a brand new environment just for iSTAR. 

You will definitely need to do this:

git clone https://github.com/daviddaiweizhang/istar.git

cd istar

pip install -r requirements.txt

./run_demo.sh

Highly recommend you do this on gpu but if you can't, I think cpu should work fine. 

If you're like me you will definitely have some dependency issues but they eventually will get solved. Even after that you will have some issues with the code in one of the files, I think in "/gpfs/commons/home/svaidyanathan/istarc/istar/hipt_model_utils.py" you will have to 
add a parameter when they load the weights for the model to make weights_only = False. Text me if you have issues. 

After you're sure that the demo is working, you should be able to run this notebook and make the files from the sdata object that you saved out when you ran the starfysh_updated_notebook. (see below)

Once you make the files, find the run_demo.sh script in the istar repo and change the "prefix" variable to be wherever folder you saved the outputs of this notebook to. Then, run the demo sh script and you should see the outputs in that folder!

I put my visium mouse data folder inside the demo folder in the istar repo. You should probably do the same thing.

In [30]:
import scanpy as sc

adata = sc.read_h5ad("/gpfs/commons/home/svaidyanathan/istarc/results/st.h5ad")


In [31]:
from PIL import Image

img = Image.open("/gpfs/commons/home/svaidyanathan/istarc/data/visium_adult_mouse_brain/spatial/tissue_hires_image.png")  # original 10x file
print("hires size:", img.size)  # (width, height) – keep this

img = img.convert("RGB")  # drop alpha if present
img.save("/gpfs/commons/home/svaidyanathan/istarc/istar/data/visium_mouse_brain/he-raw.jpg", "JPEG", quality=95)


hires size: (1921, 2000)


In [32]:
img2 = Image.open("/gpfs/commons/home/svaidyanathan/istarc/istar/data/visium_mouse_brain/he-raw.jpg")
print("he-raw size:", img2.size)  # MUST match the hires size printed above

he-raw size: (1921, 2000)


In [33]:
import scanpy as sc
import pandas as pd
import json
from pathlib import Path


# 2) Grab scalefactors
# If you don't know the key, print(list(adata.uns["spatial"].keys()))
sample_key = list(adata.uns["spatial"].keys())[0]
sf = adata.uns["spatial"][sample_key]["scalefactors"]
scale_hires = sf["tissue_hires_scalef"]

# 3) Start from the *full-res* spot coordinates in adata.obsm["spatial"]
coords_full = adata.obsm["spatial"]          # shape (n_spots, 2)
x_full = coords_full[:, 0]
y_full = coords_full[:, 1]

# 4) Convert to hires-image pixel coordinates
x_hires = x_full * scale_hires
y_hires = y_full * scale_hires

# 5) Build spot IDs like "rowxcol"
rows = adata.obs["array_row"].astype(int).to_numpy()
cols = adata.obs["array_col"].astype(int).to_numpy()
spot_ids = [f"{r}x{c}" for r, c in zip(rows, cols)]

# 6) Assemble the locs-raw DataFrame
locs_raw = pd.DataFrame({
    "spot": spot_ids,
    "x": x_hires,
    "y": y_hires,
})

# (Optional) round to integer pixels, if the downstream code wants ints
locs_raw["x"] = locs_raw["x"].round().astype(int)
locs_raw["y"] = locs_raw["y"].round().astype(int)

# 7) Save as TSV
out_path = Path("/gpfs/commons/home/svaidyanathan/istarc/istar/data/visium_mouse_brain/locs-raw.tsv")
out_path.parent.mkdir(parents=True, exist_ok=True)
locs_raw.to_csv(out_path, sep="\t", index=False)

print(f"Wrote {out_path} with shape {locs_raw.shape}")


Wrote /gpfs/commons/home/svaidyanathan/istarc/istar/data/visium_mouse_brain/locs-raw.tsv with shape (2702, 3)


In [34]:
from PIL import Image
img = Image.open("/gpfs/commons/home/svaidyanathan/istarc/data/visium_adult_mouse_brain/spatial/tissue_hires_image.png")
img = img.convert("RGB")  # PNG may have alpha; JPEG cannot
img.save("/gpfs/commons/home/svaidyanathan/istarc/istar/data/visium_mouse_brain/he-raw.jpg", "JPEG", quality=95)

In [35]:
from PIL import Image

img = Image.open("/gpfs/commons/home/svaidyanathan/istarc/istar/data/visium_mouse_brain/he-raw.jpg")
W, H = img.size
print("image size:", W, H)

print("x range:", locs_raw["x"].min(), locs_raw["x"].max())
print("y range:", locs_raw["y"].min(), locs_raw["y"].max())


image size: 1921 2000
x range: 335 1458
y range: 213 1700


In [36]:
import scanpy as sc
import pandas as pd


# 1. Build spot IDs: "rowxcol"
rows = adata.obs["array_row"].astype(int).to_numpy()
cols = adata.obs["array_col"].astype(int).to_numpy()
spot_ids = [f"{r}x{c}" for r, c in zip(rows, cols)]

# 2. Extract the raw count matrix
# For Visium, this is adata.X and is usually sparse
import numpy as np
X = adata.X.toarray() if hasattr(adata.X, "toarray") else adata.X

# 3. Get gene names
gene_names = adata.var_names.to_list()

# 4. Build DataFrame with expected format
df = pd.DataFrame(X, columns=gene_names)
df.insert(0, "spot", spot_ids)

# 5. Save as TSV
df.to_csv("/gpfs/commons/home/svaidyanathan/istarc/istar/data/visium_mouse_brain/cnts.tsv", sep="\t", index=False)

print("Wrote cnts.tsv with shape:", df.shape)
print(df.head())


Wrote cnts.tsv with shape: (2702, 4209)
     spot  0610030E20Rik  0610033M10Rik  0610040J01Rik  0610043K17Rik  \
0  50x102            1.0            0.0            0.0            0.0   
1    3x43            1.0            0.0            0.0            0.0   
2   59x19            0.0            0.0            0.0            0.0   
3   14x94            0.0            0.0            0.0            0.0   
4   42x28            1.0            0.0            0.0            0.0   

   1110008P14Rik  1110017D15Rik  1500032F14Rik  1600002K03Rik  1600014C10Rik  \
0            7.0            0.0            0.0            0.0            0.0   
1            4.0            1.0            0.0            0.0            0.0   
2            7.0            0.0            0.0            0.0            2.0   
3            5.0            1.0            0.0            0.0            0.0   
4            3.0            1.0            0.0            0.0            0.0   

   ...  Zfyve28  Zic1  Zic3  Zic4  Zmat4

In [37]:
spot_diam = sf["spot_diameter_fullres"]  # in pixels on the full-res image
radius_raw = spot_diam * 0.5 * sf["tissue_hires_scalef"]

with open("/gpfs/commons/home/svaidyanathan/istarc/istar/data/visium_mouse_brain/radius-raw.txt", "w") as f:
    f.write(f"{radius_raw:.6f}\n")


In [38]:
sample_key = list(adata.uns["spatial"].keys())[0]
sf = adata.uns["spatial"][sample_key]["scalefactors"]

tissue_hires_scalef = sf["tissue_hires_scalef"]  # float

print (tissue_hires_scalef)
pixel_size_raw = 55.0/ (spot_diam * sf["tissue_hires_scalef"])  # micrometers per pixel

with open("/gpfs/commons/home/svaidyanathan/istarc/istar/data/visium_mouse_brain/pixel-size-raw.txt", "w") as f:
    f.write(f"{pixel_size_raw:.6f}\n")

print(pixel_size_raw)


0.17011142
3.614966093361933


In [39]:
import scanpy as sc
import pandas as pd
import numpy as np

# Load your AnnData from the SpatialData object as needed
# If you already have adata:
# adata = ...

# If you actually have a SpatialData "sdata" with an AnnData inside:
# from spatialdata import SpatialData
# sdata = SpatialData.read("your_sdata.zarr")
# adata = sdata["rna"]  # or the appropriate key

ad = adata

# 1. Get the cell type scores / logits / probabilities
qc = ad.obsm["qc_m"]            # shape (n_spots, n_celltypes)
ct_names = list(ad.uns["cell_types"])  # list of cell type labels

qc = np.asarray(qc)
assert qc.shape[1] == len(ct_names), "qc_m and cell_types length mismatch"

# 2. Convert to per-spot proportions (0–1, sum to 1 per spot)
# If qc_m is already probabilities, you can skip this block.
row_sums = qc.sum(axis=1, keepdims=True)
# Avoid division by zero for empty rows
row_sums[row_sums == 0] = 1.0
proportions = qc / row_sums

# 3. Build spot IDs in the same way you did for cnts.tsv and locs-raw.tsv
rows = ad.obs["array_row"].astype(int).to_numpy()
cols = ad.obs["array_col"].astype(int).to_numpy()
spot_ids = [f"{r}x{c}" for r, c in zip(rows, cols)]

assert proportions.shape[0] == len(spot_ids), "n_spots mismatch between qc_m and obs"

# 4. Assemble DataFrame
df = pd.DataFrame(proportions, index=spot_ids, columns=ct_names)
df.reset_index(names="spot", inplace=True)

# 5. Save as TSV
out_path = "spot_celltype_proportions.tsv"
df.to_csv(out_path, sep="\t", index=False)
print(f"Wrote {out_path} with shape {df.shape}")


Wrote spot_celltype_proportions.tsv with shape (2702, 41)
