Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using spike-in funcitonality #27

Closed
zuta-osa77 opened this issue Dec 8, 2021 · 3 comments
Closed

using spike-in funcitonality #27

zuta-osa77 opened this issue Dec 8, 2021 · 3 comments

Comments

@zuta-osa77
Copy link

Hello do you have an example of using the -spk or --spikeIn functionality in miRge3.0? In particular, an example of calling and how the bowtie index library looks.

Specifically

a) I have put the bowtie indices for the separate spike-in sequences where the human indices are located as instructed, but annotation.report.csv still reports 0 reads for the Spike-In category?

b) can > 1 spike-in sequences be used?

thanks

@arunhpatil
Copy link
Collaborator

@zuta-osa77 ,

Yes, we have bowtie index data available for spike-in in the libraries (described below), and providing the parameter -spk will map the reads to spike-in data along with other databases. If you see 0 reads in the annotation reports suggests that the input FASTQ files may have different spiked-in reads or may not contain any.

If you do list directory command, the index files for spike-in is part of other listing (only showing the spike-in index files below) :
ls miRge3_Lib/human/index.Libs/

human_spike-in.1.ebwt      
human_spike-in.2.ebwt      
human_spike-in.3.ebwt      
human_spike-in.4.ebwt      
human_spike-in.rev.1.ebwt  
human_spike-in.rev.2.ebwt

Now, if you want to see the contents of it, you can use the following command (only showing the first four lines of output here):
bowtie-inspect miRge3_Lib/human/index.Libs/human_spike-in | head

>Hafner_Cal01
GTCCCACTCCGTAGATCTGTTC
>Hafner_Cal02
GATGTAACGAGTTGGAATGCAA

You may use less instead of head to browse more.

An example usage would be:
miRge3.0 -s SRR772403.fastq -lib /mnt/d/Halushka_lab/Arun/datasets/miRge3_Lib -on human -db miRGeneDB -o output_dir -a illumina -spk

If you think you need to replace spike-in data already indexed here then, you may follow the commands below:
NOTE: If you want to append to the existing library, then the instructions are provided in the next section: append*.

  1. rm -rf human_spike-in.* (remove the existing files with prefix human_spike-in)
  2. bowtie-build [options]* <reference_in> <ebwt_outfile_base> (Command to create new index files)
    example: bowtie-build --threads 6 my_spikein.fasta human_spike-in (command to create, example)
    my_spikein.fasta: should contain spike-in sequences in FASTA format.

This will create the bowtie ebwt files with name prefix human_spike-in and it is mandatory to retain this name format in the library directory.

append* :
To append to an existing spike-in library:

  1. bowtie-inspect miRge3_Lib/human/index.Libs/human_spike-in > my_spikein.fasta - This command will create a copy of the sequence from ebwt file to fasta file named as my_spikein.fasta
  2. append your own sequences at the bottom of this file , you could cat inputFile.fasta >> my_spikein.fasta or if you are familiar vim editior you could open it and append at the bottom or simply open in a notepad editor (gedit like editor) and append at the bottom. Please make sure, the file has to be in FASTA format.
  3. Index the files back to human_spike-in using the bowtie-build command shown earlier.

I hope this is clear, let me know if you need any additional information.

Thank you,
Arun.

@zuta-osa77
Copy link
Author

Thanks Arun, the appending process worked!

By having all the spike-in reads aggregated however we cannot resolve different concentrations of multiple spike-in sequences (eg for RNA isolation control). In this case we would need to generate counts for each spike-in sequence -- I can do this manually with bowtie but I'm wondering if there's a hack within the miRge3.0 space to carry this out.

@arunhpatil
Copy link
Collaborator

@zuta-osa77,

I wonder if you have different concentrations of spike-in across conditions, then that should reflect in the mapped.csv file.
see, for example, the column spike-in is null and represents 16 counts for isomiR for sequence TCCCTGATACCCTAACTTGTGA

Sequence,annotFlag,exact miRNA,hairpin miRNA,mature tRNA,primary tRNA,snoRNA,rRNA,ncrna others,mRNA,isomiR miRNA,`spike-in`,SRR772403
TCCCTGATACCCTAACTTGTGA,1,,,,,,,,,Hsa-Mir-10-P3b_5p,,16

Then you could use this information across conditions and spike-in counts to your advantage.

Thank you,
Arun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants