In [None]:
# Genome Visualization with pygenomeviz

This notebook visualizes gene annotations across multiple genomes using the [pygenomeviz](https://github.com/moshi4/pygenomeviz) library (version **0.4.4**). It uses a CSV input file containing gene coordinates and annotation information to generate a comparative genome plot.


In [None]:
## Requirements

- Python ≥ 3.7  
- pandas  
- matplotlib  
- pygenomeviz==0.4.4  

You can install the required version of pygenomeviz with:

```bash
pip install pygenomeviz==0.4.4


In [None]:

---

### ✅ 3. **Add a Section Explaining the Input CSV**

In Markdown:

```markdown
## Input Data

The input CSV file `filtered_genomes_in_clade.csv` must contain the following columns:

- `genome`: Identifier for each genome
- `genome_length`: Length of the genome sequence
- `start`: Start position of the gene/feature
- `end`: End position of the gene/feature
- `strand`: Strand direction (`+` or `-`)
- `annotation_description`: Functional description of the gene

You can provide your own CSV file or use a sample.


In [None]:

# Required libraries
import pandas as pd
import matplotlib.pyplot as plt
from pygenomeviz import GenomeViz  # Requires version 0.4.4
import itertools
from matplotlib.colors import to_hex

# Load processed genome data
# This CSV should include columns: genome, genome_length, start, end, strand, annotation_description
final_df = pd.read_csv("filtered_genomes_in_clade.csv")

gv = GenomeViz(tick_style="axis")

unique_annotations = final_df['annotation_description'].dropna().unique()
color_map = itertools.cycle(plt.get_cmap("tab20").colors)  # Use a categorical colormap
annotation_colors = {annotation: to_hex(next(color_map)) for annotation in unique_annotations}

# Add one track per genome
for genome in final_df['genome'].unique():
    genome_df = final_df[final_df['genome'] == genome]

    # Extract genome length (assumes consistency across rows)
    genome_length = genome_df['genome_length'].iloc[0]

    # Create a track for the genome without displaying a label
    track = gv.add_feature_track(genome, genome_length, labelsize=0)

    # Add each gene/feature to the track
    for _, row in genome_df.iterrows():
        start = int(row['start'])
        end = int(row['end'])
        strand = row['strand']
        annotation = row['annotation_description']
        facecolor = annotation_colors.get(annotation, "gray") if pd.notna(annotation) else "gray"
        edgecolor = "black"
        track.add_feature(start, end, strand, facecolor=facecolor, edgecolor=edgecolor)

fig = gv.plotfig()

# Create a custom legend for annotation descriptions
legend_elements = [
    plt.Line2D([0], [0], color=color, lw=4, label=annotation)
    for annotation, color in annotation_colors.items()
]

# Add the legend below the plot
legend = fig.legend(
    handles=legend_elements,
    loc="center",
    bbox_to_anchor=(0.580, -0.35),
    ncol=4,
    fontsize=18,
    title="Marker gene functions",
    title_fontsize=18,
)

# Optional: Left-align the legend title
if legend and legend.get_title():
    legend.get_title().set_ha("left")
    legend.get_title().set_position((-99, 0))  # Adjust as needed

# Adjust layout to accommodate legend
plt.subplots_adjust(bottom=0.6)
