# Goal

* Primer design for clade of interest

# Var

In [1]:
base_dir = '/ebio/abt3_projects/software/dev/ll_pipelines/llprimer/experiments/HMP_most-wanted/v0.3/'
clade = 'Anaerotruncus'
domain = 'Bacteria'
taxid = 244127 

# Init

In [2]:
library(dplyr)
library(tidyr)
library(ggplot2)
library(LeyLabRMisc)


Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union




In [3]:
df.dims()
work_dir = file.path(base_dir, clade)
make_dir(work_dir)

Created directory: /ebio/abt3_projects/software/dev/ll_pipelines/llprimer/experiments/HMP_most-wanted/v0.3//Anaerotruncus 


# Genome download

* Downloading genomes from NCBI

```
OUTDIR=/ebio/abt3_projects/software/dev/ll_pipelines/llprimer/experiments/HMP_most-wanted/Anaerotruncus/
mkdir -p $OUTDIR
ncbi-genome-download -p 12 -s genbank -F fasta --genera Anaerotruncus -o $OUTDIR bacteria
```

# Genome quality

* Filtering genomes by quality

In [4]:
D = file.path(base_dir, clade, 'genbank')
files = list_files(D, '.fna.gz')
samps = data.frame(Name = files %>% as.character %>% basename,
                   Fasta = files,
                   Domain = 'Bacteria',
                   Taxid = taxid) %>%
    mutate(Name = gsub('\\.fna\\.gz$', '', Name),
           Fasta = gsub('/+', '/', Fasta))
samps

# writing file
outfile = file.path(D, 'samples.txt')
write_table(samps, outfile)

Name,Fasta,Domain,Taxid
<chr>,<chr>,<fct>,<dbl>
GCA_000154565.1_ASM15456v1_genomic,/ebio/abt3_projects/software/dev/ll_pipelines/llprimer/experiments/HMP_most-wanted/v0.3/Anaerotruncus/genbank/bacteria/GCA_000154565.1/GCA_000154565.1_ASM15456v1_genomic.fna.gz,Bacteria,244127
GCA_000403395.2_Anae_bact_G3_V1_genomic,/ebio/abt3_projects/software/dev/ll_pipelines/llprimer/experiments/HMP_most-wanted/v0.3/Anaerotruncus/genbank/bacteria/GCA_000403395.2/GCA_000403395.2_Anae_bact_G3_V1_genomic.fna.gz,Bacteria,244127
⋮,⋮,⋮,⋮
GCA_902363605.1_MGYG-HGUT-00131_genomic,/ebio/abt3_projects/software/dev/ll_pipelines/llprimer/experiments/HMP_most-wanted/v0.3/Anaerotruncus/genbank/bacteria/GCA_902363605.1/GCA_902363605.1_MGYG-HGUT-00131_genomic.fna.gz,Bacteria,244127
GCA_904384175.1_Chicken_20_mag_154_genomic,/ebio/abt3_projects/software/dev/ll_pipelines/llprimer/experiments/HMP_most-wanted/v0.3/Anaerotruncus/genbank/bacteria/GCA_904384175.1/GCA_904384175.1_Chicken_20_mag_154_genomic.fna.gz,Bacteria,244127


File written: /ebio/abt3_projects/software/dev/ll_pipelines/llprimer/experiments/HMP_most-wanted/v0.3//Anaerotruncus/genbank/samples.txt 


### LLG

#### Config

In [5]:
cat_file(file.path(work_dir, 'LLG', 'config.yaml'))

# table with genome --> fasta_file information
samples_file: /ebio/abt3_projects/software/dev/ll_pipelines/llprimer/experiments/HMP_most-wanted/v0.3/Anaerotruncus/genbank/samples.txt

# output location
output_dir: /ebio/abt3_projects/software/dev/ll_pipelines/llprimer/experiments/HMP_most-wanted/v0.3/Anaerotruncus/LLG/

# temporary file directory (your username will be added automatically)
tmp_dir: /ebio/abt3_scratch/

# batch processing of genomes for certain steps
## increase to better parallelize
batches: 2 

# Domain of genomes ('Archaea' or 'Bacteria)
## Use "Skip" if provided as a "Domain" column in the genome table
Domain: Skip

# software parameters
# Use "Skip" to skip any of these steps. If no params for rule, use ""
# dRep MAGs are not further analyzed, but you can de-rep & then use the de-rep genome table as input.
params:
  ionice: -c 3
  # assembly assessment
  seqkit: ""
  quast: Skip #""
  multiqc_on_quast: "" 
  checkm: ""
  # de-replication (requires checkm)
  drep: -

#### Run

```
(snakemake) @ rick:/ebio/abt3_projects/software/dev/ll_pipelines/llg
$ screen -L -S llg ./snakemake_sge.sh /ebio/abt3_projects/software/dev/ll_pipelines/llprimer/experiments/HMP_most-wanted/v0.3/Anaerotruncus/config_llg.yaml 20 -F
```

### Samples table of high-quality genomes

In [6]:
# checkM summary
checkm = file.path(work_dir, 'LLG', 'checkM', 'checkm_qa_summary.tsv') %>%
    read.delim(sep='\t') 
checkm

Bin.Id,Marker.lineage,X..genomes,X..markers,X..marker.sets,Completeness,Contamination,Strain.heterogeneity,Genome.size..bp.,X..ambiguous.bases,⋯,X0,X1,X2,X3,X4,X5.,assembly.Id,assembler.Id,taxon.Id,File
<fct>,<fct>,<int>,<int>,<int>,<dbl>,<dbl>,<dbl>,<int>,<int>,⋯,<int>,<int>,<int>,<int>,<int>,<int>,<fct>,<fct>,<lgl>,<fct>
GCA_000154565.1_ASM15456v1_genomic,o__Clostridiales (UID1212),172,263,149,97.99,0.22,0,3719688,800,⋯,3,259,1,0,0,0,|ebio|abt3_projects|software|dev|ll_pipelines|llprimer|experiments|HMP_most-wanted|Anaerotruncus|LLG_output|checkM|1|checkm|markers_qa_summary.tsv.1,markers_qa_summary.tsv.1,,/ebio/abt3_projects/software/dev/ll_pipelines/llprimer/experiments/HMP_most-wanted/Anaerotruncus/LLG_output/checkM/1/checkm/markers_qa_summary.tsv.1
GCA_000433975.1_MGS528_genomic,o__Clostridiales (UID1212),172,263,149,97.99,0.00,0,1881083,732,⋯,3,260,0,0,0,0,|ebio|abt3_projects|software|dev|ll_pipelines|llprimer|experiments|HMP_most-wanted|Anaerotruncus|LLG_output|checkM|1|checkm|markers_qa_summary.tsv.2,markers_qa_summary.tsv.2,,/ebio/abt3_projects/software/dev/ll_pipelines/llprimer/experiments/HMP_most-wanted/Anaerotruncus/LLG_output/checkM/1/checkm/markers_qa_summary.tsv.2
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋱,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
GCA_900199635.1_PRJEB19460_genomic,o__Clostridiales (UID1212),172,263,149,99.33,0.00,0,3145951,14,⋯,1,262,0,0,0,0,|ebio|abt3_projects|software|dev|ll_pipelines|llprimer|experiments|HMP_most-wanted|Anaerotruncus|LLG_output|checkM|2|checkm|markers_qa_summary.tsv.14,markers_qa_summary.tsv.14,,/ebio/abt3_projects/software/dev/ll_pipelines/llprimer/experiments/HMP_most-wanted/Anaerotruncus/LLG_output/checkM/2/checkm/markers_qa_summary.tsv.14
GCA_902363605.1_MGYG-HGUT-00131_genomic,o__Clostridiales (UID1212),172,263,149,97.99,1.34,0,3306065,265,⋯,3,257,3,0,0,0,|ebio|abt3_projects|software|dev|ll_pipelines|llprimer|experiments|HMP_most-wanted|Anaerotruncus|LLG_output|checkM|2|checkm|markers_qa_summary.tsv.15,markers_qa_summary.tsv.15,,/ebio/abt3_projects/software/dev/ll_pipelines/llprimer/experiments/HMP_most-wanted/Anaerotruncus/LLG_output/checkM/2/checkm/markers_qa_summary.tsv.15


In [7]:
# dRep summary
drep = file.path(work_dir, 'LLG', 'drep', 'checkm_markers_qa_summary.tsv') %>%
    read.delim(sep='\t') %>%
    mutate(Bin.Id = gsub('.+/', '', genome),
           Bin.Id = gsub('\\.fna$', '', Bin.Id))
drep

genome,completeness,contamination,Bin.Id
<fct>,<dbl>,<dbl>,<chr>
/ebio/abt3_scratch/nyoungblut/LLG_62325884640/genomes/GCA_000154565.1_ASM15456v1_genomic.fna,97.99,0.22,GCA_000154565.1_ASM15456v1_genomic
/ebio/abt3_scratch/nyoungblut/LLG_62325884640/genomes/GCA_000433975.1_MGS528_genomic.fna,97.99,0.00,GCA_000433975.1_MGS528_genomic
⋮,⋮,⋮,⋮
/ebio/abt3_scratch/nyoungblut/LLG_62325884640/genomes/GCA_900199635.1_PRJEB19460_genomic.fna,99.33,0.00,GCA_900199635.1_PRJEB19460_genomic
/ebio/abt3_scratch/nyoungblut/LLG_62325884640/genomes/GCA_902363605.1_MGYG-HGUT-00131_genomic.fna,97.99,1.34,GCA_902363605.1_MGYG-HGUT-00131_genomic


In [8]:
# de-replicated genomes
drep_gen = file.path(work_dir, 'LLG', 'drep', 'dereplicated_genomes.tsv') %>%
    read.delim(sep='\t')
drep_gen

Name,Fasta
<fct>,<fct>
GCA_014284405.1_ASM1428440v1_genomic,/ebio/abt3_projects/software/dev/ll_pipelines/llprimer/experiments/HMP_most-wanted/v0.3/Anaerotruncus/LLG/drep/drep/dereplicated_genomes/GCA_014284405.1_ASM1428440v1_genomic.fna
GCA_900199635.1_PRJEB19460_genomic,/ebio/abt3_projects/software/dev/ll_pipelines/llprimer/experiments/HMP_most-wanted/v0.3/Anaerotruncus/LLG/drep/drep/dereplicated_genomes/GCA_900199635.1_PRJEB19460_genomic.fna
⋮,⋮
GCA_015554285.1_ASM1555428v1_genomic,/ebio/abt3_projects/software/dev/ll_pipelines/llprimer/experiments/HMP_most-wanted/v0.3/Anaerotruncus/LLG/drep/drep/dereplicated_genomes/GCA_015554285.1_ASM1555428v1_genomic.fna
GCA_000433975.1_MGS528_genomic,/ebio/abt3_projects/software/dev/ll_pipelines/llprimer/experiments/HMP_most-wanted/v0.3/Anaerotruncus/LLG/drep/drep/dereplicated_genomes/GCA_000433975.1_MGS528_genomic.fna


In [9]:
# GTDBTk summary
tax = file.path(work_dir, 'LLG', 'gtdbtk', 'gtdbtk_bac_summary.tsv') %>%
    read.delim(, sep='\t') %>%
    separate(classification, 
             c('Domain', 'Phylum', 'Class', 'Order', 'Family', 'Genus', 'Species'),
             sep=';') %>%
    select(-note, -classification_method, -pplacer_taxonomy,
           -other_related_references.genome_id.species_name.radius.ANI.AF.)
tax

user_genome,Domain,Phylum,Class,Order,Family,Genus,Species,fastani_reference,fastani_reference_radius,⋯,fastani_af,closest_placement_reference,closest_placement_radius,closest_placement_taxonomy,closest_placement_ani,closest_placement_af,msa_percent,translation_table,red_value,warnings
<fct>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<fct>,<fct>,⋯,<fct>,<fct>,<fct>,<fct>,<fct>,<fct>,<dbl>,<int>,<fct>,<fct>
GCA_000154565.1_ASM15456v1_genomic,d__Bacteria,p__Firmicutes_A,c__Clostridia,o__Oscillospirales,f__Ruminococcaceae,g__Anaerotruncus,s__Anaerotruncus colihominis,GCF_000154565.1,95.0,⋯,1.0,GCF_000154565.1,95.0,d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Anaerotruncus;s__Anaerotruncus colihominis,100.0,1.0,97.18,11,,
GCA_000433975.1_MGS528_genomic,d__Bacteria,p__Firmicutes_A,c__Clostridia,o__Oscillospirales,f__Acutalibacteraceae,g__Eubacterium_R,s__Eubacterium_R sp000433975,GCA_000433975.1,95.0,⋯,1.0,GCA_000433975.1,95.0,d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Oscillospirales;f__Acutalibacteraceae;g__Eubacterium_R;s__Eubacterium_R sp000433975,100.0,1.0,96.57,11,,
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋱,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
GCA_900199635.1_PRJEB19460_genomic,d__Bacteria,p__Firmicutes_A,c__Clostridia,o__Oscillospirales,f__Ruminococcaceae,g__Anaerotruncus,s__Anaerotruncus massiliensis,GCF_900199635.1,95.0,⋯,1.0,GCF_900199635.1,95.0,d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Anaerotruncus;s__Anaerotruncus massiliensis,100.0,1.0,96.43,11,,
GCA_902363605.1_MGYG-HGUT-00131_genomic,d__Bacteria,p__Firmicutes_A,c__Clostridia,o__Oscillospirales,f__Ruminococcaceae,g__Anaerotruncus,s__Anaerotruncus rubiinfantis,GCF_900078395.1,95.0,⋯,0.91,GCF_900078395.1,95.0,d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Anaerotruncus;s__Anaerotruncus rubiinfantis,99.26,0.91,96.35,11,,


In [10]:
# checking overlap
cat('-- drep --\n')
overlap(basename(as.character(drep_gen$Fasta)), 
        basename(as.character(drep$genome)))
cat('-- checkm --\n')
overlap(drep$Bin.Id, checkm$Bin.Id)
cat('-- gtdbtk --\n')
overlap(drep$Bin.Id, tax$user_genome)

-- drep --
intersect(x,y): 27 
setdiff(x,y): 0 
setdiff(y,x): 4 
union(x,y): 31 
-- checkm --
intersect(x,y): 31 
setdiff(x,y): 0 
setdiff(y,x): 0 
union(x,y): 31 
-- gtdbtk --
intersect(x,y): 30 
setdiff(x,y): 1 
setdiff(y,x): 0 
union(x,y): 31 


In [11]:
# joining based on Bin.Id
drep = drep %>%
    inner_join(checkm, c('Bin.Id')) %>%
    mutate(GEN = genome %>% as.character %>% basename) %>%
    inner_join(drep_gen %>% mutate(GEN = Fasta %>% as.character %>% basename),
               by=c('GEN')) %>%
    inner_join(tax, c('Bin.Id'='user_genome')) #%>%
drep

genome,completeness,contamination,Bin.Id,Marker.lineage,X..genomes,X..markers,X..marker.sets,Completeness,Contamination,⋯,fastani_af,closest_placement_reference,closest_placement_radius,closest_placement_taxonomy,closest_placement_ani,closest_placement_af,msa_percent,translation_table,red_value,warnings
<fct>,<dbl>,<dbl>,<chr>,<fct>,<int>,<int>,<int>,<dbl>,<dbl>,⋯,<fct>,<fct>,<fct>,<fct>,<fct>,<fct>,<dbl>,<int>,<fct>,<fct>
/ebio/abt3_scratch/nyoungblut/LLG_62325884640/genomes/GCA_000154565.1_ASM15456v1_genomic.fna,97.99,0.22,GCA_000154565.1_ASM15456v1_genomic,o__Clostridiales (UID1212),172,263,149,97.99,0.22,⋯,1.0,GCF_000154565.1,95.0,d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Anaerotruncus;s__Anaerotruncus colihominis,100.0,1.0,97.18,11,,
/ebio/abt3_scratch/nyoungblut/LLG_62325884640/genomes/GCA_000433975.1_MGS528_genomic.fna,97.99,0.00,GCA_000433975.1_MGS528_genomic,o__Clostridiales (UID1212),172,263,149,97.99,0.00,⋯,1.0,GCA_000433975.1,95.0,d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Oscillospirales;f__Acutalibacteraceae;g__Eubacterium_R;s__Eubacterium_R sp000433975,100.0,1.0,96.57,11,,
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋱,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
/ebio/abt3_scratch/nyoungblut/LLG_62325884640/genomes/GCA_900199635.1_PRJEB19460_genomic.fna,99.33,0.00,GCA_900199635.1_PRJEB19460_genomic,o__Clostridiales (UID1212),172,263,149,99.33,0.00,⋯,1.0,GCF_900199635.1,95.0,d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Anaerotruncus;s__Anaerotruncus massiliensis,100.0,1.0,96.43,11,,
/ebio/abt3_scratch/nyoungblut/LLG_62325884640/genomes/GCA_902363605.1_MGYG-HGUT-00131_genomic.fna,97.99,1.34,GCA_902363605.1_MGYG-HGUT-00131_genomic,o__Clostridiales (UID1212),172,263,149,97.99,1.34,⋯,0.91,GCF_900078395.1,95.0,d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Anaerotruncus;s__Anaerotruncus rubiinfantis,99.26,0.91,96.35,11,,


In [14]:
# filtering by quality
hq_genomes = drep %>%
    filter(completeness >= 90,
           contamination < 5,
           Strain.heterogeneity < 50,
           X..contigs < 300,
           Mean.contig.length..bp. > 10000)
hq_genomes

genome,completeness,contamination,Bin.Id,Marker.lineage,X..genomes,X..markers,X..marker.sets,Completeness,Contamination,⋯,fastani_af,closest_placement_reference,closest_placement_radius,closest_placement_taxonomy,closest_placement_ani,closest_placement_af,msa_percent,translation_table,red_value,warnings
<fct>,<dbl>,<dbl>,<chr>,<fct>,<int>,<int>,<int>,<dbl>,<dbl>,⋯,<fct>,<fct>,<fct>,<fct>,<fct>,<fct>,<dbl>,<int>,<fct>,<fct>
/ebio/abt3_scratch/nyoungblut/LLG_62325884640/genomes/GCA_000154565.1_ASM15456v1_genomic.fna,97.99,0.22,GCA_000154565.1_ASM15456v1_genomic,o__Clostridiales (UID1212),172,263,149,97.99,0.22,⋯,1.0,GCF_000154565.1,95.0,d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Anaerotruncus;s__Anaerotruncus colihominis,100.0,1.0,97.18,11,,
/ebio/abt3_scratch/nyoungblut/LLG_62325884640/genomes/GCA_000433975.1_MGS528_genomic.fna,97.99,0.00,GCA_000433975.1_MGS528_genomic,o__Clostridiales (UID1212),172,263,149,97.99,0.00,⋯,1.0,GCA_000433975.1,95.0,d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Oscillospirales;f__Acutalibacteraceae;g__Eubacterium_R;s__Eubacterium_R sp000433975,100.0,1.0,96.57,11,,
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋱,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
/ebio/abt3_scratch/nyoungblut/LLG_62325884640/genomes/GCA_900199635.1_PRJEB19460_genomic.fna,99.33,0.00,GCA_900199635.1_PRJEB19460_genomic,o__Clostridiales (UID1212),172,263,149,99.33,0.00,⋯,1.0,GCF_900199635.1,95.0,d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Anaerotruncus;s__Anaerotruncus massiliensis,100.0,1.0,96.43,11,,
/ebio/abt3_scratch/nyoungblut/LLG_62325884640/genomes/GCA_902363605.1_MGYG-HGUT-00131_genomic.fna,97.99,1.34,GCA_902363605.1_MGYG-HGUT-00131_genomic,o__Clostridiales (UID1212),172,263,149,97.99,1.34,⋯,0.91,GCF_900078395.1,95.0,d__Bacteria;p__Firmicutes_A;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Anaerotruncus;s__Anaerotruncus rubiinfantis,99.26,0.91,96.35,11,,


In [15]:
# summarizing the taxonomy
df.dims(20)
hq_genomes %>%
    group_by(Family, Genus) %>%
    summarize(n_genomes = n(), .groups='drop')
df.dims()

Family,Genus,n_genomes
<chr>,<chr>,<int>
f__Acutalibacteraceae,g__Eubacterium_R,1
f__Anaerovoracaceae,g__Emergencia,1
f__Ruminococcaceae,g__Anaerotruncus,24


In [16]:
# writing samples table for LLPRIMER
outfile = file.path(work_dir, 'samples_genomes_hq.txt')
hq_genomes %>%
    select(Bin.Id, Fasta) %>%
    rename('Taxon' = Bin.Id) %>%
    mutate(Taxon = gsub('_chromosome.+', '', Taxon),
           Taxon = gsub('_bin_.+', '', Taxon),
           Taxon = gsub('_genomic', '', Taxon),
           Taxon = gsub('_annotated_assembly', '', Taxon),
           Taxid = taxid,
           Domain = domain) %>%
    write_table(outfile)

File written: /ebio/abt3_projects/software/dev/ll_pipelines/llprimer/experiments/HMP_most-wanted/v0.3//Anaerotruncus/samples_genomes_hq.txt 


# Primer design

### Config

In [17]:
F = file.path(work_dir, 'primers', 'config.yaml')
cat_file(F)

#-- I/O --#
samples_file: /ebio/abt3_projects/software/dev/ll_pipelines/llprimer/experiments/HMP_most-wanted/v0.3/Anaerotruncus/samples_genomes_hq.txt

# output location
output_dir: /ebio/abt3_projects/software/dev/ll_pipelines/llprimer/experiments/HMP_most-wanted/v0.3/Anaerotruncus/primers/

# temporary file directory (your username will be added automatically)
tmp_dir: /ebio/abt3_scratch/

#-- software parameters --#
# See the README for a description
params:
  ionice: -c 3
  cgp:
    prokka: ""
    mmseqs:
      method: cluster    # or linclust (faster)
      run: --min-seq-id 0.8 -c 0.8
    vsearch: --id 0.94
    core_genes:
      cds: --perc-genomes-cds 100 --copies-per-genome-cds 1 --max-clusters-cds 500
      rRNA: --perc-genomes-rrna 100 --copies-per-genome-rrna 10 --max-clusters-rrna 500
    align:
      method: linsi
      params: --auto --maxiterate 1000
    primer3:
      consensus: --consensus-threshold 0.34
      number: --num-raw-primers 1000 --num-final-primers 10
     

### Run

```
(snakemake) @ rick:/ebio/abt3_projects/software/dev/ll_pipelines/llprimer
$ screen -L -S llprimer-An ./snakemake_sge.sh /ebio/abt3_projects/software/dev/ll_pipelines/llprimer/experiments/HMP_most-wanted/v0.3/Anaerotruncus/primers/config.yaml 30 --notemp
```

## Summary

### Primers

In [24]:
primer_info = read.delim(file.path(work_dir, 'primers', 'cgp', 'primers_final_info.tsv'), sep='\t')
primer_info %>% unique_n('primers', primer_set)
primer_info

No. of unique primers: 10 


gene_type,cluster_id,primer_set,amplicon_size_consensus,amplicon_size_avg,amplicon_size_sd,primer_id,primer_type,sequence,length,⋯,position_start,position_end,Tm_avg,Tm_sd,GC_avg,GC_sd,hairpin_avg,hairpin_sd,homodimer_avg,homodimer_sd
<fct>,<int>,<int>,<int>,<dbl>,<dbl>,<fct>,<fct>,<fct>,<int>,⋯,<int>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
cds,2,73,109,109,0,73f,PRIMER_LEFT,TGAAAGTHAGASCTTCYGTDAA,22,⋯,1,23,56.23671,1.915052,37.12121,3.787879,7.511429,16.79607,-27.56715,2.442498e+01
cds,2,73,109,109,0,73r,PRIMER_RIGHT,CCYTGWCKCTGTTTRTGCTT,20,⋯,90,110,57.99806,2.451058,47.50000,4.330127,0.000000,0.00000,-33.75316,1.421085e-14
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋱,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
cds,2,925,107,107,0,925f,PRIMER_LEFT,GAAAGTHAGASCTTCYGTDA,20,⋯,2,22,53.48922,2.087977,40.83333,4.166667,7.511429,16.79607,-28.87287,2.245080e+01
cds,2,925,107,107,0,925r,PRIMER_RIGHT,CYTGWCKCTGTTTRTGCTT,19,⋯,90,109,55.29904,2.585134,44.73684,4.558028,0.000000,0.00000,-33.75316,1.421085e-14


### Gene clusters

In [25]:
# target gene info
gene_annot = read.delim(file.path(work_dir, 'primers', 'cgp', 'core_clusters_info.tsv'), 
                        sep='\t') %>%
    semi_join(primer_info, c('cluster_id')) 
gene_annot %>% distinct(gene_type, cluster_id) %>% nrow %>% cat('No. of clusters', ., '\n')
gene_annot

No. of clusters 1 


gene_type,cluster_id,seq_uuid,seq_orig_name,contig_id,taxon,start,end,score,strand,annotation,cluster_name,clust_id
<fct>,<int>,<fct>,<fct>,<fct>,<fct>,<int>,<int>,<fct>,<fct>,<fct>,<fct>,<int>
cds,2,95cf4c86c7cf4893b51f3ba76fc4d9e9,KHIIGKDO_03304,DS544172.1,GCA_000154565.1_ASM15456v1,5547,5660,.,-,50S ribosomal protein L36,492e52027403480f8ebe523409d9267f,2
cds,2,663b3413e3c644b0bf9751a313c1cb7a,AAEGJLIP_00392,HF999344.1,GCA_000433975.1_MGS528,28097,28210,.,+,50S ribosomal protein L36,492e52027403480f8ebe523409d9267f,2
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
cds,2,5c515b50cccc4c8cb8d4d071d478865d,HDMAFNJB_00208,LT962687.1,GCA_900199635.1_PRJEB19460,100615,100728,.,+,50S ribosomal protein L36,492e52027403480f8ebe523409d9267f,2
cds,2,1f2a10cd3735420c99aa8058f1c4a798,LFEHCDIH_00979,CABJAH010000003.1,GCA_902363605.1_MGYG-HGUT-00131,74458,74571,.,+,50S ribosomal protein L36,492e52027403480f8ebe523409d9267f,2


### Gene cluster annotations

In [26]:
# non-target cds gene annotations
gene_nontarget = read.delim(file.path(work_dir, 'primers', 'cgp', 'nontarget', 'cds_blastx.tsv'), 
                        sep='\t') %>%
    semi_join(primer_info, c('cluster_id')) 
gene_nontarget

cluster_id,query,subject,subject_name,pident,length,mismatch,qstart,qend,sstart,send,evalue,slen,qlen,sscinames,staxids,pident_rank
<int>,<fct>,<fct>,<fct>,<dbl>,<int>,<int>,<int>,<int>,<int>,<int>,<dbl>,<int>,<int>,<fct>,<fct>,<int>
2,492e52027403480f8ebe523409d9267f,WP_024729495.1,MULTISPECIES: 50S ribosomal protein L36 [Ruminococcaceae],100.000,37,0,1,111,1,37,4.88e-16,37,114,Anaerotruncus colihominis;Oscillospiraceae;Anaerotruncus colihominis DSM 17241;Provencibacterium massiliense;Harryflintia acetispora;Oscillospiraceae bacterium;Clostridiales bacterium;bacterium 1XD42-1;bacterium 1XD42-8;Oscillospiraceae bacterium NSJ-64,169435;216572;445972;1841868;1849041;1898205;1898207;2320101;2320102;2763678,30
2,492e52027403480f8ebe523409d9267f,MBC8570276.1,,97.297,37,1,1,111,1,37,7.32e-16,37,114,Oscillospiraceae bacterium NSJ-54,2763677,1
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
2,492e52027403480f8ebe523409d9267f,EFB76111.1,ribosomal protein L36 [Subdoligranulum variabile DSM 15176],89.189,37,4,1,111,10,46,6.75e-15,46,114,Subdoligranulum variabile DSM 15176,411471,28
2,492e52027403480f8ebe523409d9267f,WP_073261210.1,MULTISPECIES: 50S ribosomal protein L36 [Clostridiales],91.892,37,3,1,111,1,37,6.89e-15,37,114,Clostridium sp. ATCC 29733;Eubacteriales;Bittarella massiliensis,1507;186802;1720313,20


### Cluster taxonomic uniqueness

In [27]:
# closest seqID
gene_nontarget %>%
    filter(pident_rank == 1) %>%
    .$pident %>% summary_x('seq-ID to closest non-target')

Unnamed: 0,Min.,1st Qu.,Median,Mean,3rd Qu.,Max.,sd,sd_err_of_mean
seq-ID to closest non-target,97.297,97.297,97.297,97.297,97.297,97.297,0,0


### Primer quality

In [28]:
# summary
primer_info %>% unique_n('primer sets', primer_set)
primer_info %>% distinct(gene_type, cluster_id) %>% nrow %>% cat('No. of unique clusters:', ., '\n')

No. of unique primer sets: 10 
No. of unique clusters: 1 


In [29]:
# primers per cluster
primer_info %>%
    distinct(cluster_id, primer_set) %>%
    group_by(cluster_id) %>%
    summarize(n_primer_sets = n(), .groups='drop') %>%
    .$n_primer_sets %>% summary_x('primer sets per cluster')

Unnamed: 0,Min.,1st Qu.,Median,Mean,3rd Qu.,Max.,sd,sd_err_of_mean
primer sets per cluster,10,10,10,10,10,10,0,0


In [30]:
# primer quality
primer_info %>% filter(primer_type == 'PRIMER_LEFT') %>% .$amplicon_size_avg %>% summary_x('mean amplicon size')
primer_info %>% .$degeneracy %>% summary_x('degeneracy')
primer_info %>% .$degeneracy_3prime %>% summary_x('degeneracy (3-prime)')

Unnamed: 0,Min.,1st Qu.,Median,Mean,3rd Qu.,Max.,sd,sd_err_of_mean
mean amplicon size,103,107.25,109,108.1,109.75,110,2.588,1.057


Unnamed: 0,Min.,1st Qu.,Median,Mean,3rd Qu.,Max.,sd,sd_err_of_mean
degeneracy,12,16,20,24.4,36,36,10.121,4.132


Unnamed: 0,Min.,1st Qu.,Median,Mean,3rd Qu.,Max.,sd,sd_err_of_mean
degeneracy (3-prime),1,1,2,2.6,3,6,1.855,0.757


In [31]:
# adding non-target pident info
tmp1 = gene_nontarget %>%
    filter(pident_rank == 1) %>%
    distinct(cluster_id, pident, sscinames) %>%
    mutate(gene_type = 'cds')

primer_info_j = primer_info %>%
    inner_join(tmp1, c('gene_type', 'cluster_id'))
primer_info_j

gene_type,cluster_id,primer_set,amplicon_size_consensus,amplicon_size_avg,amplicon_size_sd,primer_id,primer_type,sequence,length,⋯,Tm_avg,Tm_sd,GC_avg,GC_sd,hairpin_avg,hairpin_sd,homodimer_avg,homodimer_sd,pident,sscinames
<chr>,<int>,<int>,<int>,<dbl>,<dbl>,<fct>,<fct>,<fct>,<int>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<fct>
cds,2,73,109,109,0,73f,PRIMER_LEFT,TGAAAGTHAGASCTTCYGTDAA,22,⋯,56.23671,1.915052,37.12121,3.787879,7.511429,16.79607,-27.56715,24.42498,97.297,Oscillospiraceae bacterium NSJ-54
cds,2,73,109,109,0,73f,PRIMER_LEFT,TGAAAGTHAGASCTTCYGTDAA,22,⋯,56.23671,1.915052,37.12121,3.787879,7.511429,16.79607,-27.56715,24.42498,97.297,Oscillospiraceae;Hydrogenoanaerobacterium saccharovorans;Anaerotruncus sp. G3(2012);Oscillospiraceae bacterium;Marasmitruncus massiliensis;Anaerotruncus sp. X29;Anaerotruncus sp. 1XD22-93;Anaerotruncus sp. 1XD42-93
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋱,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
cds,2,925,107,107,0,925r,PRIMER_RIGHT,CYTGWCKCTGTTTRTGCTT,19,⋯,55.29904,2.585134,44.73684,4.558028,0,0,-33.75316,1.421085e-14,97.297,Eubacteriales;Candidatus Soleaferrea massiliensis AP7;Anaerotruncus massiliensis;Anaerotruncus rubiinfantis;Anaerotruncus sp. AF02-27;Anaerotruncus sp. 22A2-44;Oscillospiraceae bacterium;Clostridiaceae bacterium NSJ-31
cds,2,925,107,107,0,925r,PRIMER_RIGHT,CYTGWCKCTGTTTRTGCTT,19,⋯,55.29904,2.585134,44.73684,4.558028,0,0,-33.75316,1.421085e-14,97.297,Acetanaerobacterium elongatum


In [32]:
# arrange by "best" primers
primer_info_jf = primer_info_j %>%
    group_by(primer_set) %>%
    mutate(max_degeneracy_3prime = max(degeneracy_3prime),  # max of [Fwd,Rev]
           max_degeneracy = max(degeneracy)) %>%
    ungroup() %>%
    arrange(pident, max_degeneracy_3prime, max_degeneracy) %>%
    head(n=30)

df.dims(30,40)
primer_info_jf
df.dims()

gene_type,cluster_id,primer_set,amplicon_size_consensus,amplicon_size_avg,amplicon_size_sd,primer_id,primer_type,sequence,length,degeneracy,degeneracy_3prime,position_start,position_end,Tm_avg,Tm_sd,GC_avg,GC_sd,hairpin_avg,hairpin_sd,homodimer_avg,homodimer_sd,pident,sscinames,max_degeneracy_3prime,max_degeneracy
<chr>,<int>,<int>,<int>,<dbl>,<dbl>,<fct>,<fct>,<fct>,<int>,<int>,<int>,<int>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<fct>,<int>,<int>
cds,2,546,103,103,0,546f,PRIMER_LEFT,ATGAAAGTHAGASCTTCYGT,20,12,2,0,20,54.37386,1.763376,39.16667,3.435921,7.511429,16.79607,-27.56715,24.42498,97.297,Oscillospiraceae bacterium NSJ-54,2,24
cds,2,546,103,103,0,546f,PRIMER_LEFT,ATGAAAGTHAGASCTTCYGT,20,12,2,0,20,54.37386,1.763376,39.16667,3.435921,7.511429,16.79607,-27.56715,24.42498,97.297,Oscillospiraceae;Hydrogenoanaerobacterium saccharovorans;Anaerotruncus sp. G3(2012);Oscillospiraceae bacterium;Marasmitruncus massiliensis;Anaerotruncus sp. X29;Anaerotruncus sp. 1XD22-93;Anaerotruncus sp. 1XD42-93,2,24
cds,2,546,103,103,0,546f,PRIMER_LEFT,ATGAAAGTHAGASCTTCYGT,20,12,2,0,20,54.37386,1.763376,39.16667,3.435921,7.511429,16.79607,-27.56715,24.42498,97.297,Oscillospiraceae bacterium,2,24
cds,2,546,103,103,0,546f,PRIMER_LEFT,ATGAAAGTHAGASCTTCYGT,20,12,2,0,20,54.37386,1.763376,39.16667,3.435921,7.511429,16.79607,-27.56715,24.42498,97.297,Clostridium sp. CAG:169;Clostridium sp. CAG:242;Negativibacillus massiliensis,2,24
cds,2,546,103,103,0,546f,PRIMER_LEFT,ATGAAAGTHAGASCTTCYGT,20,12,2,0,20,54.37386,1.763376,39.16667,3.435921,7.511429,16.79607,-27.56715,24.42498,97.297,Ruminococcus flavefaciens;Anaerofustis stercorihominis;Oscillospiraceae bacterium;Clostridiales bacterium,2,24
cds,2,546,103,103,0,546f,PRIMER_LEFT,ATGAAAGTHAGASCTTCYGT,20,12,2,0,20,54.37386,1.763376,39.16667,3.435921,7.511429,16.79607,-27.56715,24.42498,97.297,Eubacteriales;Candidatus Soleaferrea massiliensis AP7;Anaerotruncus massiliensis;Anaerotruncus rubiinfantis;Anaerotruncus sp. AF02-27;Anaerotruncus sp. 22A2-44;Oscillospiraceae bacterium;Clostridiaceae bacterium NSJ-31,2,24
cds,2,546,103,103,0,546f,PRIMER_LEFT,ATGAAAGTHAGASCTTCYGT,20,12,2,0,20,54.37386,1.763376,39.16667,3.435921,7.511429,16.79607,-27.56715,24.42498,97.297,Acetanaerobacterium elongatum,2,24
cds,2,546,103,103,0,546r,PRIMER_RIGHT,KCTGTTTRTGCTTBGGRTTT,20,24,2,83,103,56.37333,2.279991,40.83333,4.930066,0.0,0.0,-33.75316,2.131628e-14,97.297,Oscillospiraceae bacterium NSJ-54,2,24
cds,2,546,103,103,0,546r,PRIMER_RIGHT,KCTGTTTRTGCTTBGGRTTT,20,24,2,83,103,56.37333,2.279991,40.83333,4.930066,0.0,0.0,-33.75316,2.131628e-14,97.297,Oscillospiraceae;Hydrogenoanaerobacterium saccharovorans;Anaerotruncus sp. G3(2012);Oscillospiraceae bacterium;Marasmitruncus massiliensis;Anaerotruncus sp. X29;Anaerotruncus sp. 1XD22-93;Anaerotruncus sp. 1XD42-93,2,24
cds,2,546,103,103,0,546r,PRIMER_RIGHT,KCTGTTTRTGCTTBGGRTTT,20,24,2,83,103,56.37333,2.279991,40.83333,4.930066,0.0,0.0,-33.75316,2.131628e-14,97.297,Oscillospiraceae bacterium,2,24


#### w/ gene annotations

In [33]:
# joining
primer_info_jfj = primer_info_jf %>%
    left_join(gene_annot %>% distinct(gene_type, cluster_id, annotation),
              c('gene_type', 'cluster_id'))

df.dims(30,50)
primer_info_jfj
df.dims()

gene_type,cluster_id,primer_set,amplicon_size_consensus,amplicon_size_avg,amplicon_size_sd,primer_id,primer_type,sequence,length,degeneracy,degeneracy_3prime,position_start,position_end,Tm_avg,Tm_sd,GC_avg,GC_sd,hairpin_avg,hairpin_sd,homodimer_avg,homodimer_sd,pident,sscinames,max_degeneracy_3prime,max_degeneracy,annotation
<chr>,<int>,<int>,<int>,<dbl>,<dbl>,<fct>,<fct>,<fct>,<int>,<int>,<int>,<int>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<fct>,<int>,<int>,<fct>
cds,2,546,103,103,0,546f,PRIMER_LEFT,ATGAAAGTHAGASCTTCYGT,20,12,2,0,20,54.37386,1.763376,39.16667,3.435921,7.511429,16.79607,-27.56715,24.42498,97.297,Oscillospiraceae bacterium NSJ-54,2,24,50S ribosomal protein L36
cds,2,546,103,103,0,546f,PRIMER_LEFT,ATGAAAGTHAGASCTTCYGT,20,12,2,0,20,54.37386,1.763376,39.16667,3.435921,7.511429,16.79607,-27.56715,24.42498,97.297,Oscillospiraceae;Hydrogenoanaerobacterium saccharovorans;Anaerotruncus sp. G3(2012);Oscillospiraceae bacterium;Marasmitruncus massiliensis;Anaerotruncus sp. X29;Anaerotruncus sp. 1XD22-93;Anaerotruncus sp. 1XD42-93,2,24,50S ribosomal protein L36
cds,2,546,103,103,0,546f,PRIMER_LEFT,ATGAAAGTHAGASCTTCYGT,20,12,2,0,20,54.37386,1.763376,39.16667,3.435921,7.511429,16.79607,-27.56715,24.42498,97.297,Oscillospiraceae bacterium,2,24,50S ribosomal protein L36
cds,2,546,103,103,0,546f,PRIMER_LEFT,ATGAAAGTHAGASCTTCYGT,20,12,2,0,20,54.37386,1.763376,39.16667,3.435921,7.511429,16.79607,-27.56715,24.42498,97.297,Clostridium sp. CAG:169;Clostridium sp. CAG:242;Negativibacillus massiliensis,2,24,50S ribosomal protein L36
cds,2,546,103,103,0,546f,PRIMER_LEFT,ATGAAAGTHAGASCTTCYGT,20,12,2,0,20,54.37386,1.763376,39.16667,3.435921,7.511429,16.79607,-27.56715,24.42498,97.297,Ruminococcus flavefaciens;Anaerofustis stercorihominis;Oscillospiraceae bacterium;Clostridiales bacterium,2,24,50S ribosomal protein L36
cds,2,546,103,103,0,546f,PRIMER_LEFT,ATGAAAGTHAGASCTTCYGT,20,12,2,0,20,54.37386,1.763376,39.16667,3.435921,7.511429,16.79607,-27.56715,24.42498,97.297,Eubacteriales;Candidatus Soleaferrea massiliensis AP7;Anaerotruncus massiliensis;Anaerotruncus rubiinfantis;Anaerotruncus sp. AF02-27;Anaerotruncus sp. 22A2-44;Oscillospiraceae bacterium;Clostridiaceae bacterium NSJ-31,2,24,50S ribosomal protein L36
cds,2,546,103,103,0,546f,PRIMER_LEFT,ATGAAAGTHAGASCTTCYGT,20,12,2,0,20,54.37386,1.763376,39.16667,3.435921,7.511429,16.79607,-27.56715,24.42498,97.297,Acetanaerobacterium elongatum,2,24,50S ribosomal protein L36
cds,2,546,103,103,0,546r,PRIMER_RIGHT,KCTGTTTRTGCTTBGGRTTT,20,24,2,83,103,56.37333,2.279991,40.83333,4.930066,0.0,0.0,-33.75316,2.131628e-14,97.297,Oscillospiraceae bacterium NSJ-54,2,24,50S ribosomal protein L36
cds,2,546,103,103,0,546r,PRIMER_RIGHT,KCTGTTTRTGCTTBGGRTTT,20,24,2,83,103,56.37333,2.279991,40.83333,4.930066,0.0,0.0,-33.75316,2.131628e-14,97.297,Oscillospiraceae;Hydrogenoanaerobacterium saccharovorans;Anaerotruncus sp. G3(2012);Oscillospiraceae bacterium;Marasmitruncus massiliensis;Anaerotruncus sp. X29;Anaerotruncus sp. 1XD22-93;Anaerotruncus sp. 1XD42-93,2,24,50S ribosomal protein L36
cds,2,546,103,103,0,546r,PRIMER_RIGHT,KCTGTTTRTGCTTBGGRTTT,20,24,2,83,103,56.37333,2.279991,40.83333,4.930066,0.0,0.0,-33.75316,2.131628e-14,97.297,Oscillospiraceae bacterium,2,24,50S ribosomal protein L36


#### Writing table

In [34]:
F = file.path(work_dir, 'best_primers.tsv')
write_table(primer_info_jfj, F)

File written: /ebio/abt3_projects/software/dev/ll_pipelines/llprimer/experiments/HMP_most-wanted/v0.3//Anaerotruncus/best_primers.tsv 


# sessionInfo

In [35]:
pipelineInfo('/ebio/abt3_projects/software/dev/ll_pipelines/llg/')

LLG
===

Ley Lab Genome analysis pipeline (LLG)

* Version: 0.2.4
* Authors:
  * Nick Youngblut <nyoungb2@gmail.com>
* Maintainers:
  * Nick Youngblut <nyoungb2@gmail.com>

--- conda envs ---
==> /ebio/abt3_projects/software/dev/ll_pipelines/llg//bin/envs/gtdbtk.yaml <==
channels:
- conda-forge
- bioconda
dependencies:
- pigz
- conda-forge::networkx
- bioconda::gtdbtk

==> /ebio/abt3_projects/software/dev/ll_pipelines/llg//bin/envs/checkm.yaml <==
channels:
- bioconda
dependencies:
- python=2.7
- pigz
- bioconda::prodigal
- bioconda::pplacer
- bioconda::checkm-genome

==> /ebio/abt3_projects/software/dev/ll_pipelines/llg//bin/envs/quast.yaml <==
channels:
- conda-forge
- bioconda
dependencies:
- bioconda::seqkit
- bioconda::quast>=5.0.0

==> /ebio/abt3_projects/software/dev/ll_pipelines/llg//bin/envs/sourmash.yaml <==
channels:
- conda-forge
- bioconda
dependencies:
- bioconda::sourmash=2.0.0a4

==> /ebio/abt3_projects/software/dev/ll_pipelines/llg//bin/envs/fastqc.yaml <==
channels:
-

In [36]:
pipelineInfo('/ebio/abt3_projects/software/dev/ll_pipelines/llprimer/')

LLPRIMER

Ley Lab Primer generation pipeline (LLPRIMER)

* Version: 0.3.0
* Authors:
  * Nick Youngblut <nyoungb2@gmail.com>
* Maintainers:
  * Nick Youngblut <nyoungb2@gmail.com>

--- conda envs ---
==> /ebio/abt3_projects/software/dev/ll_pipelines/llprimer//bin/envs/pdp.yaml <==
channels:
- conda-forge
- bioconda
dependencies:
- python=3.7
- intervaltree
- prodigal
- blast
- bedtools
- mafft
- mummer=3.23
- emboss
- primer3=1.1.4
- biopython<1.78
- pybedtools
- joblib
- tqdm
- openpyxl

==> /ebio/abt3_projects/software/dev/ll_pipelines/llprimer//bin/envs/genes.yaml <==
channels:
- bioconda
dependencies:
- pigz
- python=3
- numpy
- pyfaidx
- bioconda::seqkit
- bioconda::fasta-splitter
- bioconda::vsearch
- bioconda::mmseqs2
==> /ebio/abt3_projects/software/dev/ll_pipelines/llprimer//bin/envs/aln.yaml <==
channels:
- bioconda
- conda-forge
dependencies:
- pigz
- bioconda::revtrans
- bioconda::kalign3
- bioconda::mafft

==> /ebio/abt3_projects/software/dev/ll_pipelines/llprimer//bin/env

In [37]:
sessionInfo()

R version 3.6.3 (2020-02-29)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS/LAPACK: /ebio/abt3_projects/Georg_animal_feces/envs/tidyverse/lib/libopenblasp-r0.3.9.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] LeyLabRMisc_0.1.6 ggplot2_3.3.1     tidyr_1.1.0       dplyr_1.0.0      

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6     magrittr_1.5     munsell_0.5.0    tidyselect_1.1.0
 [5] uuid_0.1-4       colorspace_1.4-1 R6_2.4.1         rlang_0.4.6     
 [9] tools_3.6.3