<a href="https://colab.research.google.com/github/pachterlab/LSCHWCP_2023/blob/main/Notebooks/krona_plot/generate_krona_plot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Create a Krona plot for all viruses across all cells grouped by animal and timepoint:

In [None]:
!pip install -q anndata
import anndata
import pandas as pd
import numpy as np

def nd(arr):
    """
    Function to transform numpy matrix to nd array.
    """
    return np.asarray(arr).reshape(-1)

___
# Install Krona
Also see:  
https://github.com/marbl/Krona/wiki/Installing  
https://github.com/marbl/Krona/wiki/Importing-text-and-XML-data

In [None]:
# Download release
!wget https://github.com/marbl/Krona/releases/download/v2.8.1/KronaTools-2.8.1.tar
# Untar
!tar -xvf KronaTools-2.8.1.tar
# Install
!cd KronaTools-2.8.1 && ./install.pl --prefix ./

# Path to Krona tool
ktImportText = "/content/KronaTools-2.8.1/bin/ktImportText"

___
# Load data
The count matrix was generated as shown [here](https://github.com/pachterlab/LSCHWCP_2023/tree/main/Notebooks/align_macaque_PBMC_data/7_virus_host_captured_dlist_cdna_dna).

In [None]:
# Download count matrix from Caltech Data


In [None]:
palmdb_adata = anndata.read("virus_host-captured_dlist_cdna_dna.h5ad")
palmdb_adata

Only keep macaque viruses and cells that passed QC:

In [None]:
palmdb_adata = palmdb_adata[palmdb_adata.obs["celltype"].notnull(), palmdb_adata.var["v_type"] != "below_threshold"].copy()
palmdb_adata

Load ID to phylogeny mapping:

In [None]:
# Load virus ID to taxonomy mapping
!wget https://raw.githubusercontent.com/pachterlab/LSCHWCP_2023/main/PalmDB/ID_to_taxonomy_mapping.csv
id2tax = pd.read_csv("ID_to_taxonomy_mapping.csv")

# Drop columns not needed here and drop taxonomyduplicates
id2tax = id2tax.drop("ID", axis=1).drop("strandedness", axis=1)
id2tax = id2tax.drop_duplicates()

id2tax

___
# Generate Krona html

In [None]:
%%time
master = pd.DataFrame()
for idx, timepoint in enumerate(palmdb_adata.obs["dpi_clean_merged"].unique()):
    for animal_id in palmdb_adata.obs[palmdb_adata.obs["dpi_clean_merged"] == timepoint]["donor_animal"].unique():
        adata_temp = palmdb_adata[(palmdb_adata.obs["dpi_clean_merged"] == timepoint) & (palmdb_adata.obs["donor_animal"] == animal_id), :]

        # Add total number of counts (across all cells) for each virus ID to phylogeny data temp
        virus_ids = adata_temp.var.index.values
        total_counts = nd(adata_temp.X.sum(axis=0))
        total_count_dict = {virus_ids[i]: total_counts[i] for i in range(len(virus_ids))}

        phylogeny_data_temp = id2tax.copy()
        phylogeny_data_temp['total_count'] = phylogeny_data_temp['rep_ID'].map(total_count_dict)

        # Drop viruses not in filter list
        phylogeny_data_temp = phylogeny_data_temp.dropna()

        # Remove non-relevent columns and change order of columns
        phylogeny_data_temp = phylogeny_data_temp[["total_count", "phylum", "class", "order", "family", "genus", "species", "rep_ID"]]

        # Replace dots with NaN
        phylogeny_data_temp = phylogeny_data_temp.replace(".", np.nan)

        # Add column with timepoint
        phylogeny_data_temp["timepoint"] = timepoint

        # Add column with animal id
        phylogeny_data_temp["animal_id"] = animal_id

        # Append to master dataframe
        if idx == 0:
            master = phylogeny_data_temp.copy()
        else:
            master = pd.concat([master, phylogeny_data_temp])

# Save counts + taxnomomies data to txt
master.to_csv(f'krona.txt', sep ='\t', header=None, index=False)

# Generate Krona plot
krona_out = "krona.html"
!$ktImportText krona.txt -o $krona_out -n "Virus-positive cells"

You can view the pre-computed Krona plot [here](https://htmlpreview.github.io/?https://github.com/pachterlab/LSCHWCP_2023/blob/main/krona_plot.html).