<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Goal" data-toc-modified-id="Goal-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Goal</a></span></li><li><span><a href="#Var" data-toc-modified-id="Var-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Var</a></span></li><li><span><a href="#Init" data-toc-modified-id="Init-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Init</a></span></li><li><span><a href="#Load" data-toc-modified-id="Load-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Load</a></span></li><li><span><a href="#To-ultrametric" data-toc-modified-id="To-ultrametric-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>To ultrametric</a></span></li><li><span><a href="#Writing-trees" data-toc-modified-id="Writing-trees-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Writing trees</a></span></li><li><span><a href="#sessionInfo" data-toc-modified-id="sessionInfo-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>sessionInfo</a></span></li></ul></div>

# Goal

* Run phylomeasures analysis on the dataset using the various trees as input
  * measures: mntd & mpd


# Var

In [6]:
# working dir
work_dir = '/ebio/abt3_projects/databases_no-backup/curatedMetagenomicData/global_metagenomes/diversity/'

# bracken counts
brk_file = '/ebio/abt3_projects/databases_no-backup/curatedMetagenomicData/global_metagenomes/diversity/bracken_filt2.qs'

# metadata
## filtered
#metadata_filt_file = file.path(work_dir, 'CurMetDat_metadata_filt-n1846.tsv')
## all 
base_in_dir = '/ebio/abt3_projects/small_projects/nyoungblut/public_data_retireval/'
metadata_file = file.path(base_in_dir, 'CurMetDat-metagenomes', 'files', 'metadata', 'Filtered_CurMetDat_ff.tsv')

# trees
## GTDB phylogeny
phy_file = '/ebio/abt3_projects/databases_no-backup/GTDB/release89/LLMGP-DB/phylogeny/ar122-bac120_r89_1per-GTDB-Spec_gte50comp-lt5cont_rn.nwk'
## trait phylogeny
phy_trt_file = '/ebio/abt3_projects/databases_no-backup/GTDB/release89/LLMGP-DB/genomes/LLG//phenotype/ultrametric/predictions_flat_majority-votes_combined_jaccard_rn.nwk'
## gene content
### COG content
phy_cog_file = '/ebio/abt3_projects/databases_no-backup/GTDB/release89/LLMGP-DB/genomes/LLG/phenotype/ultrametric/genes-per-COG_UniRef90_bray.nwk'
phy_cogcat_file = '/ebio/abt3_projects/databases_no-backup/GTDB/release89/LLMGP-DB/genomes/LLG/phenotype/ultrametric/genes-per-COGcat_UniRef90_bray.nwk'
### pfam content
phy_pfam_file = '/ebio/abt3_projects/databases_no-backup/GTDB/release89/LLMGP-DB/genomes/LLG/phenotype/ultrametric/genes-per-pfam_UniRef90_bray.nwk'
phy_pfamcat_file = '/ebio/abt3_projects/databases_no-backup/GTDB/release89/LLMGP-DB/genomes/LLG/phenotype/ultrametric/genes-per-pfamGrp_UniRef90_bray.nwk'

# params
threads = 8
my_seed = 68372

# Init

In [7]:
library(dplyr)
library(tidyr)
library(ggplot2)
library(data.table)
library(tidytable)
library(ape)
library(LeyLabRMisc)

In [8]:
df.dims()
setDTthreads(threads)
RhpcBLASctl::blas_set_num_threads(threads)
make_dir(work_dir)

Directory already exists: /ebio/abt3_projects/databases_no-backup/curatedMetagenomicData/global_metagenomes/diversity/ 


# Load

In [9]:
# trees
## genome phylogeny
trees = list(
    'gtdb_phy' = phy_file,
    'trt_phy' = phy_trt_file,
    'phy_cog' = phy_cog_file,
    'phy_cogcat' = phy_cogcat_file,
    'phy_pfam' = phy_pfam_file,
    'phy_pfamcat' = phy_pfamcat_file
)

trees = trees %>%
    lapply(read.tree)
trees

$gtdb_phy

Phylogenetic tree with 23360 tips and 23359 internal nodes.

Tip labels:
  s__Halorubrum_sp000296615, s__Halorubrum_distributum, s__Halorubrum_trapanicum, s__Halorubrum_tropicale, s__Halorubrum_coriense, s__Halorubrum_ezzemoulense, ...
Node labels:
  100.0, d__Archaea100.0, 97.0, 99.0, 100.0, 100.0-p__Halobacterota, ...

Rooted; includes branch lengths.

$trt_phy

Phylogenetic tree with 23360 tips and 23359 internal nodes.

Tip labels:
  s__Mikella_endobia, s__Prochlorococcus_B_sp003284185, s__Prochlorococcus_A_sp003282425, s__Eperythrozoon_A_wenyonii_A, s__GN02-872_sp003260325, s__SZUA-486_sp003251635, ...
Node labels:
  , 1, 1, 1, 1, 1, ...

Rooted; includes branch lengths.

$phy_cog

Phylogenetic tree with 23360 tips and 23359 internal nodes.

Tip labels:
  s__UBA9959_sp001799795, s__2-02-FULL-45-21_sp001805845, s__UBA11600_sp002717745, s__UBA11600_sp002714165, s__UBA11600_sp002730735, s__LS-NOB_sp002705185, ...
Node labels:
  , 1, 1, 1, 1, 1, ...

Rooted; includes branch

# To ultrametric

In [11]:
# conversion
doParallel::registerDoParallel(threads)
trees = trees %>%
    plyr::llply(phytools::force.ultrametric, method=c("extend"),
                .parallel=TRUE)
trees

$gtdb_phy

Phylogenetic tree with 23360 tips and 23359 internal nodes.

Tip labels:
  s__Halorubrum_sp000296615, s__Halorubrum_distributum, s__Halorubrum_trapanicum, s__Halorubrum_tropicale, s__Halorubrum_coriense, s__Halorubrum_ezzemoulense, ...
Node labels:
  100.0, d__Archaea100.0, 97.0, 99.0, 100.0, 100.0-p__Halobacterota, ...

Rooted; includes branch lengths.

$trt_phy

Phylogenetic tree with 23360 tips and 23359 internal nodes.

Tip labels:
  s__Mikella_endobia, s__Prochlorococcus_B_sp003284185, s__Prochlorococcus_A_sp003282425, s__Eperythrozoon_A_wenyonii_A, s__GN02-872_sp003260325, s__SZUA-486_sp003251635, ...
Node labels:
  , 1, 1, 1, 1, 1, ...

Rooted; includes branch lengths.

$phy_cog

Phylogenetic tree with 23360 tips and 23359 internal nodes.

Tip labels:
  s__UBA9959_sp001799795, s__2-02-FULL-45-21_sp001805845, s__UBA11600_sp002717745, s__UBA11600_sp002714165, s__UBA11600_sp002730735, s__LS-NOB_sp002705185, ...
Node labels:
  , 1, 1, 1, 1, 1, ...

Rooted; includes branch

# Writing trees

In [8]:
F = file.path(work_dir, 'trees_ultrametric.RDS')
saveRDS(trees, F)
cat('File written:', F, '\n')

File written: /ebio/abt3_projects/databases_no-backup/curatedMetagenomicData/global_metagenomes/diversity//trees_ultrametric.RDS 


# sessionInfo

In [12]:
sessionInfo()

R version 4.1.1 (2021-08-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 18.04.6 LTS

Matrix products: default
BLAS/LAPACK: /ebio/abt3_projects2/global_metagenome_diversity/envs/phyloseq-phy/lib/libopenblasp-r0.3.17.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] LeyLabRMisc_0.2.0 ape_5.5           tidytable_0.6.5   data.table_1.14.2
[5] ggplot2_3.3.5     tidyr_1.1.4       dplyr_1.0.7      

loaded via a namespace (and not attached):
 [1] phangorn_2.7.1          pbdZMQ_0.3-5            tidyselect_1.1.1       
 [4] repr_1.1.3   