# Haplotype Network Plotting Examples
This notebook demonstrates the `plot_haplotype_network` function from the `malariagen_data` package, showcasing different ways to use the `color` parameter to visualize haplotype networks.

In [None]:
import malariagen_data

## Ag3

In [None]:
# Initialize Ag3 instance
ag3 = malariagen_data.Ag3(
    "simplecache::gs://vo_agam_release_master_us_central1",
    simplecache=dict(cache_storage="../gcs_cache"),
    debug=False,
)
ag3

N.B., manually specifying the server_port parameter doesn't seem to be necessary on colab, but is needed when running locally via Jupyter notebook, otherwise get "Address already in use" error and cannot run multiple plots in same notebook.

## Example 1: Direct Column Name (String)
Use a direct column name like 'country' to color nodes by country.

In [None]:
# Plot haplotype network with country coloring
ag3.plot_haplotype_network(
    region="2L:2,358,158-2,431,617",
    analysis="gamb_colu",
    sample_sets="3.0",
    sample_query="taxon == 'coluzzii'",
    color="country",
    max_dist=2,
)

## Example 2: Cohorts Prefix (String)
In this example, `"admin1_iso"` is used, which the function interprets as `"cohorts_admin1_iso"`, a column typically available in cohort-annotated metadata.

In [None]:
ag3.plot_haplotype_network(
    region="2L:2,358,158-2,431,617",
    analysis="gamb_colu",
    sample_query="taxon == 'coluzzii'",
    sample_sets="3.0",
    color="admin1_iso",  # Implies "cohorts_admin1_iso"
    max_dist=2,
)

This example uses a dictionary to define custom color groups based on conditions applied to the `"country"` column.

In [None]:
color_mapping = {
    "Ghana": "country == 'Ghana'",
    "Other": "country != 'Ghana'"
}
ag3.plot_haplotype_network(
    region="2L:2,358,158-2,431,617",
    analysis="gamb_colu",
    sample_query="taxon == 'coluzzii'",
    sample_sets="3.0",
    color=color_mapping,
    max_dist=2,
)

Setting `color=None` applies the default coloring scheme, typically uniform across all nodes.

In [None]:
ag3.plot_haplotype_network(
    region="2L:2,358,158-2,431,617",
    analysis="gamb_colu",
    sample_query="taxon == 'coluzzii'",
    sample_sets="3.0",
    color=None,
    max_dist=2,
)

This replicates Example 1 but uses `server_mode="external"`, useful for rendering plots in certain environments.

In [None]:
ag3.plot_haplotype_network(
    region="2L:2,358,158-2,431,617",
    analysis="gamb_colu",
    sample_query="taxon == 'coluzzii'",
    sample_sets="3.0",
    color="country",
    max_dist=2,
    server_mode="external",
)

## Af1

In [None]:
# Initialize Af1 instance
af1 = malariagen_data.Af1(
    "simplecache::gs://vo_afun_release_master_us_central1",
    simplecache=dict(cache_storage="../gcs_cache"),
    debug=False,
)
af1

Here, nodes are colored based on the `"sample_set"` column.

In [None]:
af1.plot_haplotype_network(
    region="2RL:2,358,158-2,431,617",
    sample_query="country == 'Ghana'",
    sample_sets="1.0",
    color="sample_set",
    max_dist=2,
    height=500,
    width="90%",
)

Using `"year"` implies the function looks for `"cohorts_year"` in the metadata.

In [None]:
af1.plot_haplotype_network(
    region="2RL:2,358,158-2,431,617",
    sample_query="country == 'Ghana'",
    sample_sets="1.0",
    color="year",  # Implies "cohorts_year"
    max_dist=2,
)

A dictionary defines custom groups based on the `"year"` column (assuming year data is available).

In [None]:
color_mapping = {
    "2012": "year == 2012",
    "2014": "year == 2014"
}
af1.plot_haplotype_network(
    region="2RL:2,358,158-2,431,617",
    sample_query="country == 'Ghana'",
    sample_sets="1.0",
    color=color_mapping,
    max_dist=2,
)

With `color=None`, the default coloring is applied.

In [None]:
af1.plot_haplotype_network(
    region="2RL:2,358,158-2,431,617",
    sample_query="country == 'Ghana'",
    sample_sets="1.0",
    color=None,
    max_dist=2,
)