Skip to content

09_GENOME_DEREPLICATION

eolesin edited this page Jun 8, 2021 · 21 revisions

Run anvi-summarize on the dastool consensus bins

Do this to get the genomes out of anvio and into FASTA format.

Rename the bin contigs

Include the original sample name in addition to the bin info. Within the ANVI-SUMMARIZE output folder:

for i in `cat AMOR_2020_Good`; 
    do 
    bin_names=$(ls ${i}/bin_by_bin)
    for j in $bin_names; do sed -i "s/>/>${i}_/g" ${i}/bin_by_bin/${j}/${j}-contigs.fa; done
done

Create full file path list of all genomes.

# Within the ANVI-SUMMARIZE output folder:
for i in pwd; do find ~+ -type f -name "*-contigs.fa"; done >> genome_paths

Clone this wiki locally