# Feature Map Workflow

<img 
    src="./assets/00_FeatureMap_app.png" 
    alt="FeatureMap App"
    align="center" 
    style="border: 2px solid #ccc; border-radius: 8px; padding: 5px; width: 100%; box-shadow: 0px 4px 8px rgba(0,0,0,0.1);">


___

This workflow tutorial walks you through creating a lightweight UMAP-style visualization application for AnnData objects. We'll build the application incrementally, starting with data loading and preprocessing, then layer on visualization functionality in an interactive interface.

## Imports and Configuration

In [None]:
import scanpy as sc
import anndata as ad
import pooch

import holoviews.operation.datashader as hd
import datashader as ds
import colorcet as cc
import panel as pn
from panel.io import hold
import numpy as np
import holoviews as hv

pn.extension()
hv.extension('bokeh')

## Loading and Inspecting the Data

We'll use data from bone marrow mononuclear cells of healthy human donors. The samples were measured using the 10X Multiome Gene Expression and Chromatin Accessibility kit.

In [None]:
EXAMPLE_DATA = pooch.create(
    path=pooch.os_cache("scverse_tutorials"),
    base_url="doi:10.6084/m9.figshare.22716739.v1/",
)
EXAMPLE_DATA.load_registry_from_doi()

In [None]:
%%time

samples = {
    "s1d1": "s1d1_filtered_feature_bc_matrix.h5",
    "s1d3": "s1d3_filtered_feature_bc_matrix.h5",
}
adatas = {}

for sample_id, filename in samples.items():
    path = EXAMPLE_DATA.fetch(filename)
    sample_adata = sc.read_10x_h5(path)
    sample_adata.var_names_make_unique()
    adatas[sample_id] = sample_adata

adata = ad.concat(adatas, label="sample")
adata.obs_names_make_unique()
adata.var_names_make_unique()
print(adata.obs["sample"].value_counts())

In [None]:
adata

Collectively, the two samples contains ~17,000 cells per sample and 36,601 measured genes.

## Data Preprocessing

### Common Quality Control Metrics

We'll calculate quality control metrics for specific gene populations. Mitochondrial, ribosomal and hemoglobin genes are defined by distinct prefixes as listed below.

In [None]:
# mitochondrial genes, "MT-" for human, "Mt-" for mouse
adata.var["mt"] = adata.var_names.str.startswith("MT-")
# ribosomal genes
adata.var["ribo"] = adata.var_names.str.startswith(("RPS", "RPL"))
# hemoglobin genes
adata.var["hb"] = adata.var_names.str.contains("^HB[^(P)]")

In [None]:
sc.pp.calculate_qc_metrics(
    adata, qc_vars=["mt", "ribo", "hb"], inplace=True, log1p=True
)

### Filter by cells and genes

We filter cells with less than 100 genes expressed and genes that are detected in less than 3 cells. 

In [None]:
sc.pp.filter_cells(adata, min_genes=100)
sc.pp.filter_genes(adata, min_cells=3)

### Remove doublets

We'll use Scrublet to identify and remove potential doublets:

In [None]:
%% time

sc.pp.scrublet(adata, batch_key="sample")

### Count Depth Scaling Normalization

We are applying median count depth normalization with log1p transformation (log1PF).


In [None]:
# Saving count data
adata.layers["counts"] = adata.X.copy()

In [None]:
# Normalizing to median total counts
sc.pp.normalize_total(adata)
# Logarithmize the data
sc.pp.log1p(adata)

### Dimensionality Reduction and Feature Selection

Reduce the dimensionality and only include the most informative genese.

In [None]:
sc.pp.highly_variable_genes(adata, n_top_genes=2000, batch_key="sample")

Reduce the dimensionality of the data by running principal component analysis (PCA), which reveals the main axes of variation and denoises the data.

In [None]:
sc.tl.pca(adata)

Inspect the contribution of single PCs to the total variance in the data. This gives us information about how many PCs we should consider in order to compute the neighborhood relations of cells.

In [None]:
sc.pl.pca_variance_ratio(adata, n_pcs=50, log=True)

## Nearest neighbor graph constuction

Let us compute the neighborhood graph of cells using the PCA representation of the data matrix.

In [None]:
sc.pp.neighbors(adata)

This graph can then be embedded in two dimensions for visualiztion with UMAP (McInnes et al., 2018):

In [None]:
sc.tl.umap(adata)

We can now visualize the UMAP according to the `sample`. 

## Clustering

Use the Leiden graph-clustering method (community detection based on optimizing modularity) {cite}`Traag2019`. Leiden clustering directly clusters the neighborhood graph of cells, which we already computed in the previous section.

In [None]:
%%time

# Note: Using the `igraph` implementation and a fixed number of iterations 
# can be significantly faster, especially for larger datasets
sc.tl.leiden(adata, flavor="igraph", n_iterations=2)

## Building the Feature Map Explorer

We'll build it step-by-step, starting with the basic components and gradually adding more functionality.

In [None]:
# filename = 'adata_clustered.h5'
# # adata.write('./adata_clustered.h5')
# adata = ad.read_h5ad(filename)

### Step 1: Creating a Basic Plot Function

Let's start by creating a function to generate a simple dimensional reduction feature plot. This function will take data and create a scatter plot. This will be the core of our application.

In [None]:
def create_basic_featuremap(x_data, x_dim=0, y_dim=1, xaxis_label='PC1', yaxis_label='PC2'):
    plot = hv.Points(
        (x_data[:, x_dim], x_data[:, y_dim]),
        [xaxis_label, yaxis_label]
    )
    
    plot = plot.opts(alpha=0.5, tools=['hover'])
    return plot

create_basic_featuremap(adata.obsm['X_pca'])

This creates a simple scatter plot using our PCA data, showing PC1 vs PC2.

### Step 2: Adding Data Type Specific Color Support

Now, let's enhance our plot by adding color to represent different variables. If each cell (corresponding to each point in the scatter plot) is to be given a color, we can use vectors which are all N-cells long from the `adata.obs` store as the `color_data` (or we could use a column in the `X` data matrix). Our function will determine if the provided `color_data` is categorical or continuous, and set a colormap accordingly.

In [None]:
def create_colored_featuremap(x_data, x_dim, y_dim, color_data, color_var, 
                             xaxis_label, yaxis_label):
    # Determine if the color data is categorical or continuous
    is_categorical = (
        color_data.dtype.name in ['category', 'categorical', 'bool'] or
        np.issubdtype(color_data.dtype, np.object_) or
        np.issubdtype(color_data.dtype, np.str_)
    )
    
    # Set colormap and options based on data type
    if is_categorical:
        n_categories = len(np.unique(color_data))
        cmap = cc.b_glasbey_category10[:n_categories]  # Color map for categorical data
        colorbar = False
        show_legend = True
    else:
        cmap = 'viridis'  # Color map for continuous data
        colorbar = True
        show_legend = False
    
    plot = hv.Points(
        (x_data[:, x_dim], x_data[:, y_dim], color_data),
        [xaxis_label, yaxis_label], color_var
    )
    
    plot = plot.opts(color=color_var, cmap=cmap, alpha=0.5,
                     tools=['hover'], legend_position='right',
                    frame_width=300, frame_height=300)
    
    return plot, cmap

pca_data = adata.obsm['X_pca']
color_data = adata.obs['leiden'].values

colored_plot, cmap = create_colored_featuremap(
    pca_data, x_dim=0, y_dim=1, color_data=color_data, color_var='leiden',
    xaxis_label='PC1', yaxis_label='PC2',
)
colored_plot

Now our plot shows points colored by their Leiden cluster assignment. We've also added logic to handle both categorical and continuous color variables.

But the data is too overplotted and we could be missing important structure, so let's apply datashader raterization. 

### Step 3: Adding Datashader Support

For large datasets, we leverage Datashader's rasterization capabilities. Key components include:

- Aggregators: We use `ds.count_cat` for categorical data and `ds.mean` for continuous data, which determine how points are combined when they fall into the same pixel.
- Rasterization: The `hd.rasterize` function converts vector data to a raster image, greatly improving performance for large datasets.
- Dynamic Spreading: `hd.dynspread` automatically enhances visibility of sparse regions by adaptively spreading pixels.

This integration enables scalable visualization of millions of points while preserving the ability to see overall patterns and distributions.

In [None]:
def create_datashaded_featuremap(x_data, x_dim, y_dim, color_data, color_var, 
                                xaxis_label, yaxis_label, width=300, height=300):
    is_categorical = (
        color_data.dtype.name in ['category', 'categorical', 'bool'] or
        np.issubdtype(color_data.dtype, np.object_) or
        np.issubdtype(color_data.dtype, np.str_)
    )
    
    if is_categorical:
        n_categories = len(np.unique(color_data))
        cmap = cc.b_glasbey_category10[:n_categories]
        colorbar = False
    else:
        cmap = 'viridis'
        colorbar = True
    
    plot = hv.Points(
        (x_data[:, x_dim], x_data[:, y_dim], color_data),
        [xaxis_label, yaxis_label], color_var
    )
    
    # Apply datashader based on data type
    if is_categorical:
        # For categorical data, count by category
        aggregator = ds.count_cat(color_var)
        plot = hd.rasterize(plot, aggregator=aggregator)
    else:
        # For continuous data, take the mean per pixel
        aggregator = ds.mean(color_var)
        plot = hd.rasterize(plot, aggregator=aggregator)
    
    # Apply dynamic spreading to make sparse regions more visible
    plot = hd.dynspread(plot, threshold=0.5)
    
    # Set plot options
    plot = plot.opts(
        cmap=cmap,
        colorbar=colorbar,
        tools=['hover'],
        frame_width=width,
        frame_height=height,
        title=f"{color_var}"
    )
    
    return plot

# Test with UMAP data and leiden clusters
umap_data = adata.obsm['X_umap']
color_data = adata.obs['leiden'].values

datashaded_plot = create_datashaded_featuremap(
    umap_data,
    x_dim=0,
    y_dim=1,
    color_data=color_data,
    color_var='leiden',
    xaxis_label='UMAP1',
    yaxis_label='UMAP2',
    width=300,
    height=300
)
datashaded_plot

This plot uses datashader to efficiently render many points, making the visualization scalable to large datasets.

### Step 4: Adding Labels Support

Now, let's add the ability to display labels at the median position for each cluster in categorical data. This is particularly helpful when we have more than a few clusters and a separate legend becomes difficult to visually map. We'll reuse our datashader function to create the featuremap plot and then layer the labels plot.

In [None]:
def create_labeled_featuremap(x_data, x_dim, y_dim, color_data, color_var, 
                             xaxis_label, yaxis_label, width=300, height=300):

    plot = create_datashaded_featuremap(
        x_data, x_dim, y_dim, color_data, color_var,
        xaxis_label, yaxis_label, width, height
    )
    
    # Only add labels for categorical data
    is_categorical = (
        color_data.dtype.name in ['category', 'categorical', 'bool'] or
        np.issubdtype(color_data.dtype, np.object_) or
        np.issubdtype(color_data.dtype, np.str_)
    )
    
    if is_categorical:
        # Calculate median positions for each category
        unique_categories = np.unique(color_data)
        labels_data = []
        
        for cat in unique_categories:
            # Find points in this category
            mask = color_data == cat
            # Calculate median position
            median_x = np.median(x_data[mask, x_dim])
            median_y = np.median(x_data[mask, y_dim])
            # Add to labels data
            labels_data.append((median_x, median_y, str(cat)))
        
        labels_element = hv.Labels(
            labels_data, 
            [xaxis_label, yaxis_label], 
            'Label'
        ).opts(
            text_font_size='8pt',
            text_color='black'
        )
        
        plot = plot * labels_element
    
    return plot

umap_data = adata.obsm['X_umap']
color_data = adata.obs['leiden'].values

labeled_plot = create_labeled_featuremap(
    umap_data,
    x_dim=0,
    y_dim=1,
    color_data=color_data,
    color_var='leiden',
    xaxis_label='UMAP1',
    yaxis_label='UMAP2',
    width=300,
    height=300
)
labeled_plot

### Step 5: Combining Functions into a Unified Single Plot API

Let's combine our previous functions into a single, flexible function that can handle all the options:

In [None]:
def create_featuremap_plot(
    x_data, color_data, x_dim, y_dim, color_var, xaxis_label, yaxis_label,
    width=300, height=300, datashading=True, labels=False,
    cont_cmap='viridis',
    cat_cmap=cc.b_glasbey_category10):
    """
    Create a comprehensive feature map plot with options for datashading and labels
    
    Parameters:
    - x_data: numpy.ndarray, shape n_obs by n_dimensions
    - color_data: numpy.ndarray, shape n_obs color values (categorical or continuous)
    - x_dim, y_dim: int, indices to use as x or y data
    - color_var: str, name to give the coloring dimension
    - xaxis_label, yaxis_label: str, labels for the axes
    - width, height: int, dimensions of the plot
    - datashading: bool, whether to apply datashader
    - labels: bool, whether to overlay labels at median positions
    - cont_cmap: str, colormap for continuous data
    - cat_cmap: list, colormap for categorical data
    """
    # Determine if the color data is categorical or continuous
    is_categorical = (
        color_data.dtype.name in ['category', 'categorical', 'bool'] or
        np.issubdtype(color_data.dtype, np.object_) or
        np.issubdtype(color_data.dtype, np.str_)
    )
    
    # Set colormap and options based on data type
    if is_categorical:
        n_unq_cat = len(np.unique(color_data))
        cmap = cat_cmap[:n_unq_cat]
        colorbar = False
        if labels:
            show_legend = False
        else:
            show_legend = True
    else:
        cmap = cont_cmap
        show_legend = False
        colorbar = True
    
    plot = hv.Points(
        (x_data[:, x_dim], x_data[:, y_dim], color_data),
        [xaxis_label, yaxis_label], color_var
    )
    
    # Options for standard (non-datashaded) plot
    plot_opts = dict(
        color=color_var,
        cmap=cmap,
        size=1,
        alpha=0.5,
        colorbar=colorbar,
        padding=0,
        tools=['hover'],
        show_legend=show_legend,
        legend_position='right',
    )
    
    # Options for labels
    label_opts = dict(
        text_font_size='8pt',
        text_color='black'
    )
    
    # Apply datashading if requested
    if datashading:
        if is_categorical:
            # For categorical data, count by category
            aggregator = ds.count_cat(color_var)
            plot = hd.rasterize(plot, aggregator=aggregator)
            plot = hd.dynspread(plot, threshold=0.5)
            plot = plot.opts(cmap=cmap, tools=['hover'])
            
            if labels:
                # Add labels at median positions
                unique_categories = np.unique(color_data)
                labels_data = []
                for cat in unique_categories:
                    mask = color_data == cat
                    median_x = np.median(x_data[mask, x_dim])
                    median_y = np.median(x_data[mask, y_dim])
                    labels_data.append((median_x, median_y, str(cat)))
                labels_element = hv.Labels(labels_data, [xaxis_label, yaxis_label], 'Label').opts(**label_opts)
                plot = plot * labels_element
            else:
                # Create a custom legend for datashaded categorical plot
                unique_categories = np.unique(color_data)
                color_key = dict(zip(unique_categories, cmap[:len(unique_categories)]))
                legend_items = [
                    hv.Points([0,0], label=str(cat)).opts(
                        color=color_key[cat],
                        size=0
                    ) for cat in unique_categories
                ]
                legend = hv.NdOverlay({str(cat): item for cat, item in zip(unique_categories, legend_items)}).opts(
                    show_legend=True,
                    legend_position='right',
                    legend_limit=1000,
                    legend_cols=len(unique_categories) // 8 + 1,
                )
                plot = plot * legend
        else:
            # For continuous data, take the mean
            aggregator = ds.mean(color_var)
            plot = hd.rasterize(plot, aggregator=aggregator)
            plot = hd.dynspread(plot, threshold=0.5)
            plot = plot.opts(cmap=cmap, colorbar=colorbar)
    else:
        # Standard plot without datashading
        plot = plot.opts(**plot_opts)
        if is_categorical and labels:
            # Add labels for non-datashaded categorical plot
            unique_categories = np.unique(color_data)
            labels_data = []
            for cat in unique_categories:
                mask = color_data == cat
                median_x = np.median(x_data[mask, x_dim])
                median_y = np.median(x_data[mask, y_dim])
                labels_data.append((median_x, median_y, str(cat)))
            labels_element = hv.Labels(labels_data, [xaxis_label, yaxis_label], 'Label').opts(**label_opts)
            plot = plot * labels_element
    
    return plot.opts(
        title=f"{color_var}",
        tools=['hover'],
        show_legend=show_legend,
        frame_width=width,
        frame_height=height
    )

umap_data = adata.obsm['X_umap']
color_data = adata.obs['leiden'].values

unified_plot = create_featuremap_plot(
    umap_data,
    color_data,
    x_dim=0,
    y_dim=1,
    color_var='leiden',
    xaxis_label='UMAP1',
    yaxis_label='UMAP2',
    width=300,
    height=300,
    datashading=True,
    labels=True
)
unified_plot

### Step 6: Creating an Interactive App

Finally, let's create an interactive application using Panel. Our final interactive application will allow you to:

- Select different dimension reduction methods (PCA, UMAP, etc.)
- Choose which dimensions to display on x and y axes
- Color points by different variables (cluster assignments, gene expression, quality metrics)
- Toggle datashading for better performance and interpretability with large datasets
- Overlay legend labels on plot for categorical variables

Here are some of the HoloViz Panel concepts that we'll employ:

#### Reactive Programming Model
The application uses Panel's reactive programming model where changes to one component automatically trigger updates in dependent components. This is seen in how changing the dimension reduction method immediately updates the axis selectors and plot.

#### Widget Binding
We use `pn.bind()` to connect our plotting function to the widgets. This creates a reactive pipeline where any widget change automatically triggers a plot update. This declarative binding approach is much cleaner than manually handling events and updates.

#### Layout System
Panel provides a flexible layout system, shown by our use of pn.Row and pn.WidgetBox to organize the interface components. This makes it easy to create responsive layouts that adapt to different screen sizes.

#### Event Handling with param.watch
The function uses `.param.watch()` to observe changes, such as with the dimension reduction selector. This event-driven approach lets us respond to user interactions by updating related widgets.

#### Decorator-enhanced Functions
The `@hold()` decorator is used to prevent intermediate redraws when updating multiple widget properties at once. This improves performance and user experience by batching updates.

In [None]:
def create_featuremap_app(
    adata,
    dim_reduction=None,
    color_by=None,
    datashade=True,
    width=300,
    height=300,
    labels=False,
    show_widgets=True,
):
    """
    Create a configurable feature map application
    
    Parameters:
    - adata: AnnData object
    - dim_reduction: str, initial dimension reduction method
    - color_by: str, initial coloring variable
    - datashade: bool, whether to enable datashading
    - width, height: int, dimensions of the plot
    - labels: bool, whether to show labels
    - show_widgets: bool, whether to show widgets
    
    Returns:
    - app: Panel application
    """
    
    dr_options = list(adata.obsm.keys())
    default_dr = dim_reduction or dr_options[0]
    
    color_options = list(adata.obs.columns)
    default_color = color_by or color_options[0]
    
    def get_dim_labels(dr_key):
        dr_label = dr_key.split('_')[1].upper()
        num_dims = adata.obsm[dr_key].shape[1]
        return [f"{dr_label}{i+1}" for i in range(num_dims)]
    
    initial_dims = get_dim_labels(default_dr)
    
    # Widgets
    dr_select = pn.widgets.Select(name='Reduction', options=dr_options, value=default_dr)
    x_axis = pn.widgets.Select(name='X-axis', options=initial_dims, value=initial_dims[0])
    y_axis = pn.widgets.Select(name='Y-axis', options=initial_dims, value=initial_dims[1])
    color = pn.widgets.Select(name='Color By', options=color_options, value=default_color)
    datashade_switch = pn.widgets.Checkbox(name='Datashader Rasterize', value=datashade)
    label_switch = pn.widgets.Checkbox(name='Overlay Legend Labels', value=labels)

    @hold()
    def reset_dimension_options(event):
        new_dims = get_dim_labels(event.new)
        x_axis.param.update(options=new_dims, value=new_dims[0])
        y_axis.param.update(options=new_dims, value=new_dims[1])
    
    # Connect update func to reduction widget
    dr_select.param.watch(reset_dimension_options, 'value')

    def create_plot(dr_key, x_value, y_value, color_value, datashade_value, label_value):
        x_data = adata.obsm[dr_key]
        dr_label = dr_key.split('_')[1].upper()
        
        if x_value == y_value:
            return pn.pane.Markdown(f"Please select different dimensions for X and Y axes.")
        
        # Extract indices from dimension labels
        try:
            x_dim = int(x_value.replace(dr_label, "")) - 1
            y_dim = int(y_value.replace(dr_label, "")) - 1
        except (ValueError, AttributeError):
            return pn.pane.Markdown(f"Error parsing dimensions. Make sure to select valid {dr_label} dimensions.")
        
        # Get color data from .obs or X cols
        try:
            color_data = adata.obs[color_value].values
        except:
            try:
                color_data = adata.X.getcol(adata.var_names.get_loc(color_value)).toarray().flatten()
            except:
                color_data = np.zeros(adata.n_obs)
                print(f"Warning: Could not find {color_value} in obs or var")
        
        return create_featuremap_plot(
            x_data,
            color_data,
            x_dim,
            y_dim,
            color_value,
            x_value,
            y_value,
            width=width,
            height=height,
            datashading=datashade_value,
            labels=label_value,
        )
    
    
    plot_pane = pn.bind(
        create_plot,
        dr_key=dr_select,
        x_value=x_axis,
        y_value=y_axis,
        color_value=color,
        datashade_value=datashade_switch,
        label_value=label_switch
    )
    
    widgets = pn.WidgetBox(
        dr_select,
        x_axis,
        y_axis,
        color,
        datashade_switch,
        label_switch,
        visible=show_widgets,
    )
    
    app = pn.Row(widgets, plot_pane)
    return app

app = create_featuremap_app(
    adata, 
    dim_reduction='X_umap', 
    color_by='leiden',
    width=300,
    height=300,
)
app

### Conclusion
In this tutorial, we've built a simple but powerful feature map visualization tool for single-cell data stored in AnnData objects. We started with preprocessing the data and basic plotting, then incrementally added more sophisticated features and created an interactive app.