Producing 3G data DeepSimulator installation git clone https://github.com/lykaust15/DeepSimulator.git cd ./DeepSimulator/ ./install.sh CAMISIM INSTALLATION Pre-requirements pip install biopython pip install biom pip install matplotlib pip install biom-format git clone https://github.com/CAMI-challenge/CAMISIM pip install scikit-learn==0.22.1 wget https://github.com/bcgsc/NanoSim/releases/download/v2.5.1/NanoSim-v2.5.1.tar.gz /usr/lib/x86_64-linux-gnu sudo ln -s libncursesw.so.6.2 libncursesw.so.5 Genomes to make up community export ref_viruses_rep_genomes=/remote-storage/DBs/blastdb/ref_viruses_rep_genomes blastdbcmd -info -db $ref_viruses_rep_genomes blastdbcmd -db $ref_viruses_rep_genomes -entry 89902998 > ~/CAMISIM/viruses/Ippy01.fasta blastdbcmd -db $ref_viruses_rep_genomes -entry 89902995 > ~/CAMISIM/viruses/Ippy02.fasta blastdbcmd -db $ref_viruses_rep_genomes -entry 9626732 > ~/CAMISIM/viruses/Hepatitis.fasta esearch -db nucleotide -query "AE017125" | efetch -format fasta > CAMISIM/viruses/HelicobacterHepaticus.fasta esearch -db nucleotide -query "NC_007905.1" | efetch -format xml | xtract -pattern Dbtag -element Object-id_id [to obtain the taxon id from NCBI ID] emilio@frodo:~/CAMISIM/viruses$ more virus_metadata.tsv genome_ID OTU NCBI_ID novelty_category Ippy01 1 55096 Ippy virus segment S, complete sequence Ippy02 2 55096 Ippy virus segment L, complete sequence HepatA 3 12092 Hepatitis A virus, complete genome Helico 4 51449 Helicobacter hepaticus ATCC 51449, complete genome emilio@frodo:~/CAMISIM/viruses$ more virus_genome_to_id.tsv Ippy01 /home/emilio/CAMISIM/viruses/genomes/Ippy01.fasta Ippy02 /home/emilio/CAMISIM/viruses/genomes/Ippy02.fasta HepatA /home/emilio/CAMISIM/viruses/genomes/HepatitisA.fasta Helico /home/emilio/CAMISIM/viruses/genomes/HelicobacterHepaticus.fasta Configuration file for NANOsim data more viruses/virus_mini_config-NANO.ini [Main] seed=632741178 phase=0 max_processors=16 dataset_id=RL output_directory=viruses_out_nano_small temp_directory=/tmp gsa=True pooled_gsa=True anonymous=True compress=1 [ReadSimulator] readsim=/usr/share/NANOPORE-PKGs/NanoSim/src/simulator.py error_profiles=tools/nanosim_profile samtools=tools/samtools-1.3/samtools profile= size=0.1 type=nanosim3 fragments_size_mean=500 fragment_size_standard_deviation=27 [CommunityDesign] ncbi_taxdump=tools/ncbi-taxonomy_20170222.tar.gz strain_simulation_template=scripts/StrainSimulationWrapper/sgEvolver/simulation_dir number_of_samples=1 [community0] metadata=viruses/virus_metadata.tsv id_to_genome_file=viruses/virus_genome_to_id.tsv id_to_gff_file= genomes_total=4 genomes_real=4 max_strains_per_otu=1 ratio=1 mode=differential log_mu=1 log_sigma=2 gauss_mu=1 gauss_sigma=1 view=False Running the simulator python metagenomesimulation.py viruses/virus_mini_config-NANO.ini When finished, let's move to the viruses_out_nano_small folder where you will find the taxonomic_profile_0.txt : more taxonomic_profile_0.txt @SampleID: @Version:0.9.1 @Ranks:superkingdom|phylum|class|order|family|genus|species|strain @@TAXID RANK TAXPATH TAXPATHSN PERCENTAGE _CAMI_genomeID _CAMI_OTU 10239 superkingdom 10239 Viruses 83.6522 2759 superkingdom 2759 Eukaryota 16.3478 35493 phylum 2759|35493 Eukaryota|Streptophyta 16.3478 4447 class 2759|35493|4447 Eukaryota|Streptophyta|Liliopsida 16.3478 464095 order 10239|||464095 Viruses|||Picornavirales 40.6404 73496 order 2759|35493|4447|73496 Eukaryota|Streptophyta|Liliopsida|Asparagales 16.3478 11617 family 10239||||11617 Viruses||||Arenaviridae 43.0118 12058 family 10239|||464095|12058 Viruses|||Picornavirales|Picornaviridae 40.6404 4668 family 2759|35493|4447|73496|4668 Eukaryota|Streptophyta|Liliopsida|Asparagales|Amaryllidaceae 16.3478 1653394 genus 10239||||11617|1653394 Viruses||||Arenaviridae|Mammarenavirus 43.0118 12091 genus 10239|||464095|12058|12091 Viruses|||Picornavirales|Picornaviridae|Hepatovirus 40.6404 51449 genus 2759|35493|4447|73496|4668|51449 Eukaryota|Streptophyta|Liliopsida|Asparagales|Amaryllidaceae|Gilliesia 16.3478 55096 species 10239||||11617|1653394|55096 Viruses||||Arenaviridae|Mammarenavirus|Ippy mammarenavirus 43.0118 12092 species 10239|||464095|12058|12091|12092 Viruses|||Picornavirales|Picornaviridae|Hepatovirus|Hepatovirus A 40.6404 55096.1 strain 10239||||11617|1653394|55096|55096.1 Viruses||||Arenaviridae|Mammarenavirus|Ippy mammarenavirus|Ippy mammarenavirus strain 39 .8704 Ippy01 1 55096.2 strain 10239||||11617|1653394|55096|55096.2 Viruses||||Arenaviridae|Mammarenavirus|Ippy mammarenavirus|Ippy mammarenavirus strain 3. 1414 Ippy02 2 12092.1 strain 10239|||464095|12058|12091|12092|12092.1 Viruses|||Picornavirales|Picornaviridae|Hepatovirus|Hepatovirus A|Hepatovirus A st rain 40.6404 HepatA 3 51449.1 strain 2759|35493|4447|73496|4668|51449||51449.1 Eukaryota|Streptophyta|Liliopsida|Asparagales|Amaryllidaceae|Gilliesia||Gilliesia strain 16.3478 Helico 4 The folder distributions contains the distribution file more distribution_0.txt Ippy01 0.3987040526398704 Ippy02 0.031414113703141414 HepatA 0.4064039863406404 Helico 0.16347784731634776 and DATE_sample_0/reads contains the anonymous_reads.fq.gz Execute the following commands to prepare the data file for the signal simulator: gunzip anonymous_reads.fq.gz seqkit fq2fa anonymous_reads.fq > anonymous_reads.fasta Change folder to where the DeepSimulator is installed and run the following command to create the fast5 files: ./deep_simulator -i anonymous_reads.fasta -o out -B [1 or 2, depending on the CPU/GPU base caller to use] Being the deep_simulator able to handle multi fasta files, another way to run the test it is to concatenate the viral fasta files to a unique file and use it as input: cat ../CAMISIM/viruses/genomes/*.fasta > multi_viruses.fasta ./deep_simulator.sh -i multi_viruses.fasta -o out_multifasta -B [1 or 2, depending on the CPU/GPU base caller to use] Note: While the first approach could take a very long running time, 15h in a common PC depending on the machine you are using, the second one will require about 10 minutes for completion. Run Vir-MinION pipeline using out_multifasta as input file. NB.: it is suggested to create a new folder for each execution of Ver-MinION, in order to keep separated the results of each execution) mkdir -p SimulatedData mv out_multifasta SimulatedData cd SimulatedData time VirMinION-Pipe_V0.1.sh out_multifasta data_out dna_r9.4.1_450bps_hac.cfg EXP-NBD104 16 ALL At the end of the execution, the user will have a folder structure similar to that one reported: (base) virminion@frodo:~/Simulated$ tree . ├── ass_flye_nano-corr │   ├── 00-assembly │   │   └── draft_assembly.fasta │   ├── flye.log │   └── params.json ├── ass_flye_nano-raw │   ├── 00-assembly │   │   └── draft_assembly.fasta │   ├── flye.log │   └── params.json ├── ass_spades ├── barcode01_fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0_consensus │   ├── logfile.txt │   └── sorted.fastq ├── barcode01_fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0.fasta ├── barcode01_fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0.fastq ├── barcode01_fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0_MegaAss │   ├── checkpoints.txt │   ├── done │   ├── final.contigs.fa │   ├── intermediate_contigs │   │   ├── k21.addi.fa │   │   ├── k21.addi.fa.info │   │   ├── k21.bubble_seq.fa │   │   ├── k21.bubble_seq.fa.info │   │   ├── k21.contigs.fa │   │   ├── k21.contigs.fa.info │   │   ├── k21.final.contigs.fa │   │   ├── k21.final.contigs.fa.info │   │   ├── k21.local.fa │   │   ├── k21.local.fa.info │   │   ├── k41.addi.fa │   │   ├── k41.addi.fa.info │   │   ├── k41.bubble_seq.fa │   │   ├── k41.bubble_seq.fa.info │   │   ├── k41.contigs.fa │   │   ├── k41.contigs.fa.info │   │   ├── k41.final.contigs.fa │   │   ├── k41.final.contigs.fa.info │   │   ├── k41.local.fa │   │   ├── k41.local.fa.info │   │   ├── k61.addi.fa │   │   ├── k61.addi.fa.info │   │   ├── k61.bubble_seq.fa │   │   ├── k61.bubble_seq.fa.info │   │   ├── k61.contigs.fa │   │   ├── k61.contigs.fa.info │   │   ├── k61.final.contigs.fa │   │   ├── k61.final.contigs.fa.info │   │   ├── k61.local.fa │   │   ├── k61.local.fa.info │   │   ├── k81.addi.fa │   │   ├── k81.addi.fa.info │   │   ├── k81.bubble_seq.fa │   │   ├── k81.bubble_seq.fa.info │   │   ├── k81.contigs.fa │   │   ├── k81.contigs.fa.info │   │   ├── k81.final.contigs.fa │   │   ├── k81.final.contigs.fa.info │   │   ├── k81.local.fa │   │   └── k81.local.fa.info │   ├── log │   └── options.json ├── multi_viruses_Demultiplexed_Guppy │   ├── barcode01_fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0.fastq │   ├── barcoding_summary.txt │   ├── read_processor_log-2022-05-19_14-43-50.log │   ├── read_processor_log-2022-05-19_14-48-32.log │   ├── read_processor_log-2022-05-19_14-49-10.log │   ├── read_processor_log-2022-05-19_14-49-41.log │   ├── read_processor_log-2022-05-19_14-51-29.log │   ├── read_processor_log-2022-05-19_14-52-40.log │   └── unclassified │   └── fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0.fastq ├── multi_viruses_fast5 │   └── barcode01 │   ├── signal_0_552d9ab0-322f-47ce-b5d0-5687492a447c.fast5 │   ├── signal_100_c8c49c88-a808-402f-82ab-d55da477bbd0.fast5 │   ├── signal_101_3eac6272-f085-4705-a2cb-e6c2e74da939.fast5 │   ├── signal_94_a6aabc00-2571-402c-8399-a53f9ffc764f.fast5 ... ... ... │   ├── signal_95_a0dea0e9-62a4-401f-a6e3-5aaa2d7c27df.fast5 │   ├── signal_96_375f2564-e129-426e-88d2-22f6a91317a3.fast5 │   ├── signal_97_23cbe9b3-ce65-4acf-9478-bab583ff372d.fast5 │   ├── signal_98_fe1510f7-64a9-4473-b3b4-6bbc28e94cdc.fast5 │   ├── signal_99_c0345e77-4078-4eec-82d9-80c6c787621e.fast5 │   └── signal_9_beb6fe9f-3a6d-4250-bd7e-f6f869bcbeb0.fast5 ├── multi_viruses_out │   ├── fail │   │   └── fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0_0.fastq.gz │   ├── guppy_basecaller_log-2022-05-19_14-37-26.log │   ├── pass │   │   └── fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0_0.fastq.gz │   ├── sequencing_summary.txt │   └── sequencing_telemetry.js ├── NanoFiltOut │   └── multi_viruses_filtered.fastq ├── NGSpecies_consensus │   ├── 1 │   │   ├── cluster_origins.csv │   │   └── pre_clusters.csv │   ├── consensus_reference_0.fasta │   ├── consensus_reference_0.fasta.fai │   ├── consensus_reference_0.fasta.mmi │   ├── consensus_reference_10.fasta │   ├── consensus_reference_10.fasta.fai │   ├── consensus_reference_10.fasta.mmi │   ├── consensus_reference_8.fasta │   ├── consensus_reference_8.fasta.fai │   ├── consensus_reference_8.fasta.mmi │   ├── final_cluster_origins.tsv │   ├── final_clusters.tsv │   ├── logfile.txt │   ├── medaka_cl_id_0 │   │   ├── calls_to_draft.bam │   │   ├── calls_to_draft.bam.bai │   │   ├── consensus.fasta │   │   ├── consensus_probs.hdf │   │   ├── stderr.txt │   │   └── stdout.txt │   ├── medaka_cl_id_10 │   │   ├── calls_to_draft.bam │   │   ├── calls_to_draft.bam.bai │   │   ├── consensus.fasta │   │   ├── consensus_probs.hdf │   │   ├── stderr.txt │   │   └── stdout.txt │   ├── medaka_cl_id_8 │   │   ├── calls_to_draft.bam │   │   ├── calls_to_draft.bam.bai │   │   ├── consensus.fasta │   │   ├── consensus_probs.hdf │   │   ├── stderr.txt │   │   └── stdout.txt │   ├── reads_to_consensus_0.fastq │   ├── reads_to_consensus_10.fastq │   ├── reads_to_consensus_8.fastq │   └── sorted.fastq ├── TaxoAss │   ├── krakViral_class.outbarcode01_fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0_MegaAss │   ├── krakViral.kronabarcode01_fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0_MegaAss │   ├── krakViral.krona.htmlbarcode01_fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0_MegaAss │   ├── krakViral.outbarcode01_fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0_MegaAss │   ├── krakViral_report.outbarcode01_fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0_MegaAss │   ├── krakViral_unclass.outbarcode01_fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0_MegaAss │   ├── reads_kaiju.kronabarcode01_fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0_MegaAss │   ├── reads_kaiju.kron.htmlbarcode01_fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0_MegaAss │   └── readskaiju.outbarcode01_fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0_MegaAss ├── TaxoClust │   ├── krakViral_class.outNGSpecies_consensus_medaka_cl_id_0_Clust │   ├── krakViral_class.outNGSpecies_consensus_medaka_cl_id_10_Clust │   ├── krakViral_class.outNGSpecies_consensus_medaka_cl_id_8_Clust │   ├── krakViral.krona.htmlNGSpecies_consensus_medaka_cl_id_0_Clust │   ├── krakViral.krona.htmlNGSpecies_consensus_medaka_cl_id_10_Clust │   ├── krakViral.krona.htmlNGSpecies_consensus_medaka_cl_id_8_Clust │   ├── krakViral.kronaNGSpecies_consensus_medaka_cl_id_0_Clust │   ├── krakViral.kronaNGSpecies_consensus_medaka_cl_id_10_Clust │   ├── krakViral.kronaNGSpecies_consensus_medaka_cl_id_8_Clust │   ├── krakViral.outNGSpecies_consensus_medaka_cl_id_0_Clust │   ├── krakViral.outNGSpecies_consensus_medaka_cl_id_10_Clust │   ├── krakViral.outNGSpecies_consensus_medaka_cl_id_8_Clust │   ├── krakViral_report.outNGSpecies_consensus_medaka_cl_id_0_Clust │   ├── krakViral_report.outNGSpecies_consensus_medaka_cl_id_10_Clust │   ├── krakViral_report.outNGSpecies_consensus_medaka_cl_id_8_Clust │   ├── krakViral_unclass.outNGSpecies_consensus_medaka_cl_id_0_Clust │   ├── krakViral_unclass.outNGSpecies_consensus_medaka_cl_id_10_Clust │   ├── krakViral_unclass.outNGSpecies_consensus_medaka_cl_id_8_Clust │   ├── reads_kaiju.kronaNGSpecies_consensus_medaka_cl_id_0_Clust │   ├── reads_kaiju.kronaNGSpecies_consensus_medaka_cl_id_10_Clust │   ├── reads_kaiju.kronaNGSpecies_consensus_medaka_cl_id_8_Clust │   ├── reads_kaiju.kron.htmlNGSpecies_consensus_medaka_cl_id_0_Clust │   ├── reads_kaiju.kron.htmlNGSpecies_consensus_medaka_cl_id_10_Clust │   ├── reads_kaiju.kron.htmlNGSpecies_consensus_medaka_cl_id_8_Clust │   ├── readskaiju.outNGSpecies_consensus_medaka_cl_id_0_Clust │   ├── readskaiju.outNGSpecies_consensus_medaka_cl_id_10_Clust │   └── readskaiju.outNGSpecies_consensus_medaka_cl_id_8_Clust ├── taxonomy.log ├── TaxoRead    ├── krakViral_class.outbarcode01_fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0_Read    ├── krakViral.kronabarcode01_fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0_Read    ├── krakViral.krona.htmlbarcode01_fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0_Read    ├── krakViral.outbarcode01_fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0_Read    ├── krakViral_report.outbarcode01_fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0_Read    ├── krakViral_unclass.outbarcode01_fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0_Read    ├── reads_kaiju.kronabarcode01_fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0_Read    ├── reads_kaiju.kron.htmlbarcode01_fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0_Read    └── readskaiju.outbarcode01_fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0_Read The taxonomic classification in html format are saved in the following folders. NB: the finale TAG of the filename (suffix file name) indicates the origin of the data file-strategy. ./TaxoClust/reads_kaiju.kron.htmlNGSpecies_consensus_medaka_cl_id_0_Clust ./TaxoClust/reads_kaiju.kron.htmlNGSpecies_consensus_medaka_cl_id_10_Clust ./TaxoClust/krakViral.krona.htmlNGSpecies_consensus_medaka_cl_id_8_Clust ./TaxoClust/krakViral.krona.htmlNGSpecies_consensus_medaka_cl_id_0_Clust ./TaxoClust/krakViral.krona.htmlNGSpecies_consensus_medaka_cl_id_10_Clust ./TaxoClust/reads_kaiju.kron.htmlNGSpecies_consensus_medaka_cl_id_8_Clust ./TaxoAss/reads_kaiju.kron.htmlbarcode01_fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0_MegaAss ./TaxoAss/krakViral.krona.htmlbarcode01_fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0_MegaAss ./TaxoRead/krakViral.krona.htmlbarcode01_fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0_Read ./TaxoRead/reads_kaiju.kron.htmlbarcode01_fastq_runid_c2d19c211888bc09d8e077df271f325c911c1010_0_Read