# Notebook title

Notebook description



<!-- replace template-for-colab in the url with whatever this notebook is called -->
<a href="https://githubtocolab.com/harmslab/topiary-examples/blob/main/notebooks/template-for-colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Initialize environment

In [None]:
### THIS CELL SETS UP TOPIARY IN A GOOGLE COLAB ENVIRONMENT. 
### IF RUNNING THIS NOTEBOOK LOCALLY, IT MAY BE SAFELY DELETED.

#@title Install software

#@markdown #### Installation requires two steps.

#@markdown 1. Install the software by pressing the _Play_ button on the left.
#@markdown Please be patient. This will take several minutes. <font color='teal'>
#@markdown After the  installation is complete, the kernel will reboot 
#@markdown and Colab will complain that the session crashed. This is normal.</font>
#@markdown <br/>
#@markdown 2. After this cell runs, run the "Initialize environment" cell that follows.

try:
    import google.colab
    RUNNING_IN_COLAB = True
except ImportError:
    RUNNING_IN_COLAB = False
except Exception as e: 
    err = "Could not figure out if runnning in a colab notebook\n"
    raise Exception(err) from e

if RUNNING_IN_COLAB:

    import os
    os.chdir("/content/")

    import urllib.request
    urllib.request.urlretrieve("https://raw.githubusercontent.com/harmslab/topiary-examples/main/notebooks/colab_installer.py",
                              "colab_installer.py")

    import colab_installer
    colab_installer.install_topiary(install_raxml=False,
                                    install_generax=False)

In [None]:
### IF YOU ARE RUNNING LOCALLY, make sure you installed topiary and
### make sure you activated the topiary conda environment. (If you
### did not start this notebook within that environment, close the
### session, activate the topiary environment, and restart). 

import topiary
import numpy as np
import pandas as pd 

### EVERYTHING AFTER THIS LINE IS IS USED TO SET UP TOPIARY IN A GOOGLE
### COLAB ENVIRONMENT. IF RUNNING THIS NOTEBOOK LOCALLY, THE LINES BELOW
### IN THIS CELL MAY BE SAFELY DELETED. 

#@title Initialize environment

#@markdown  Run this cell to initialize the environment after installation.
#@markdown (This cell can also be run if the kernel dies during a calculation,
#@markdown allowing you to reload modules without having to
#@markdown reinstall.) Re-run this cell if you have to re-run any subsequent
#@markdown cells so that your calculations are in the correct directory.

#@markdown We recommend setting up a working directory on your google drive. This is a 
#@markdown convenient way to pass files to topiary and will allow you to save
#@markdown your work. For example, if you type `topiary_work` into the form
#@markdown field below, topiary will save all of its calculations in the 
#@markdown `topiary_work` directory in MyDrive (i.e. the top directory at
#@markdown https://drive.google.com). This script will create the directory if 
#@markdown it does not already exist. If the directory already exists, any files
#@markdown that are already in that directory will be available to topiary. You could, 
#@markdown for example, put a file called `seed.csv` in `topiary_work` and then
#@markdown access it as "seed.csv" in all cells below.
#@markdown <br/><br/>
#@markdown Note: Google may prompt you for permission to access the drive. 
#@markdown To work in a temporary colab environment, leave this blank. 

# Select a working directory on google drive
google_drive_directory = "" #@param {type:"string"}

try:
    import google.colab
    RUNNING_IN_COLAB = True
except ImportError:
    RUNNING_IN_COLAB = False
except Exception as e: 
    err = "Could not figure out if runnning in a colab notebook\n"
    raise Exception(err) from e

if RUNNING_IN_COLAB:

    import os
    os.chdir("/content/")

    topiary._in_notebook = "colab"
    import colab_installer
    colab_installer.initialize_environment()
    colab_installer.mount_google_drive(google_drive_directory)

In [None]:
### IF RUNNING LOCALLY: set `seed_dataset =` to point to your desired csv or xlsx file. 
### Alternatively, you can set a `seed_df` to point to a pandas dataframe holding the
### seed dataset. 

seed_spreadsheet_file = "https://raw.githubusercontent.com/harmslab/topiary-examples/main/data/ly86-ly96.csv"
seed_df = None

# -----------------------------------------------------------------------------
# COLAB SPECIFIC BLOCK

#@title Load seed dataset

#@markdown Before running this cell, specify either: 
#@markdown + A file containing a seed dataset in your working
#@markdown directory (your google drive specified above).
#@markdown The default input file is an example LY86/LY96 seed dataset.
#@markdown + Select `upload_file` to upload a file directly from your computer. 

try:
    import google.colab
    RUNNING_IN_COLAB = True
except ImportError:
    RUNNING_IN_COLAB = False
except Exception as e: 
    err = "Could not figure out if runnning in a colab notebook\n"
    raise Exception(err) from e

if RUNNING_IN_COLAB:

    seed_spreadsheet_file = "https://raw.githubusercontent.com/harmslab/topiary-examples/main/data/ly86-ly96.csv" #@param {type:"string"}
    upload_file = False #@param {type:"boolean"}

    if issubclass(type(seed_spreadsheet_file),str):
        seed_spreadsheet_file = seed_spreadsheet_file.strip()

    if seed_spreadsheet_file != "" and upload_file:
        err = "Please give a seed_spreadsheet_file OR select upload file\n"
        raise ValueError(err)

    if seed_spreadsheet_file == "" and not upload_file:
        err = "Please either give a seed_spreadsheet_file or select upload file\n"
        raise ValueError(err)

    if upload_file:

        try:
            from google.colab import files
            uploaded_files = files.upload()
            keys = list(uploaded_files.keys())
            seed_spreadsheet_file = keys[0] #uploaded_files[keys[0]]
        except ImportError:
            pass

# END COLAB SPECIFIC BLOCK
# -----------------------------------------------------------------------------

# Read seed_df from the input file
if seed_df is None:

    try:
        seed_df = pd.read_csv(seed_spreadsheet_file)
    except:
        try:
            seed_df = pd.read_excel(seed_spreadsheet_file)
        except:
            err = f"Could not read {seed_spreadsheet_file}. This should be a csv or xlsx file\n"
            raise ValueError(err)

seed_df 

## Tests

<font color='red'>This cell and below should be deleted when using this as a template notebook. (Note: all tests should run on both colab and a local computer.)</font>

### Generate an alignment from a seed dataframe

In [None]:
df_location = "https://raw.githubusercontent.com/harmslab/topiary-examples/main/data/example-seed.csv"
df = pd.read_csv(df_location)
df

In [None]:
topiary.seed_to_alignment(df_location)

### Get ancestors given an alignment

In [None]:
tiny_df_location = "https://raw.githubusercontent.com/harmslab/topiary-examples/main/data/tiny-phylo/initial-input/dataframe.csv"

topiary.alignment_to_ancestors(tiny_df_location,out_dir="ali-to-anc")

In [None]:
topiary.pipeline.bootstrap_reconcile("ali-to-anc",2,overwrite=True)

### Look at ancestral reconstruction output

In [None]:
topiary.draw.tree("ali-to-anc/05_reconcile-bootstraps")

In [None]:
anc_fasta = "ali-to-anc/03_ancestors/output/reconciled-tree_ancestors/ancestors.fasta"
with open(anc_fasta as f):
    for line in f:
        print(f,end="")

In [None]:
df = pd.read_csv("ali-to-anc/03_ancestors/output/reconciled-tree_ancestors/ancestor-data.csv")
df[df.anc == "anc1"]

In [None]:
topiary.alignment_to_ancestors("seed_to_alignment_vetsAEDotS/05_clean-aligned-dataframe.csv")