GitHub - SocialfiPanda/RaSs: RaSs is a benchmark for RNAseq mapping and pair-end read generator.

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
doc		doc
includes		includes
src/edu/unc/csbio		src/edu/unc/csbio
.gitattributes		.gitattributes
.gitignore		.gitignore
abundance.txt		abundance.txt
config.txt		config.txt
config3.txt		config3.txt
readme.txt		readme.txt

Repository files navigation

My Related Publication
        http://bioinformatics.oxfordjournals.org/content/29/13/i291.full.pdf

[Manual of RNAseqSim] 

NAME
	RNAseqSim - Simulate RNAseq Reads

SYNOPSIS
	java -jar RNAseqSim-version.jar [CONFIG] [OPTIONS]

DESCRIPTION
	RNAseqSim creates two FastQ files based on the user-defined settings.

CONFIG
	A text file that contains configuration RNAseqSim requires to generate reads.
	The avaiable settings in CONFIG file are the same as those in the OPTIONS.
	If CONFIG is not specified, "config.txt" will be used.

OPTIONS
	The key-value strings that specify configuration.
	OPTIONS are represeted in "-key=value" format. The available keys and values
	are listed in DETAILS.
	
DETAILS			
    	GTF_File
    		The required input file name of the annotation file in GTF(2.2) 
		format. 
    		It contains the exons, transcripts and genes annotation data.
		
	FASTA_File
		The required input file name of the genome sequence in Fasta format.
		It contains the sequence data of some chromosomes or the entire genome.
	
	Chromosome 
		The chromosome or chromosome patterns that are used.
		e.g., chr1 or 1 for exact matching, and [[0-9]* for fuzzy matching.

	Chromosome_Matching
		The matching method of chromosomes. 
		Available values are "Exact" and "Fuzzy".
		If "Exact" is used, the program will work on the chromosome that has
		the exact name of the Chromosome value. Otherwise, it will use regular
		expression to match the Chromosome value.
	
	Blacklist
		The file name of a list of ensembl transcript ids that are not included 
		in the simulation.
		It is a blank string by default.				

	Abundance_File 
		The file name of transcript abundance. Each line in the file contains 
		a transcriptId (consistent with the ones in GTF_File) and an abundance 
		value. The two fields are separated by a blank space.		
		If specified, the abundance value will be used to generate reads; 
		otherwise, the program will dump the abundance values randomly generated
		to this file. The default name is "abundance.txt".

	Abundance_Overwritten
		A switch that specifies whether the abundance file is overwritten if
		it exists.
		Available values are "Yes" and "No"(default).
	
	Expressed_Transcript_Percentage
		If "Abundance_File" is not specified, the program will randomly select 
		this percentage	of transcripts to generate abundance values.
	
	SV_Allowed
		A switch that specifies whether structure variants are included.
		Available values are "Yes" and "No".

	SV_File
		The input file name of structure variants in VCF(4.0) format.
					
	SV_Sample
		The name of the sample of which the structure variants in "SV_File"
		will be embeded in the sequence.

	Read_Length
		The length of each simulated read. Its default value is 100.

	Fragment_Minimum_Length
		The minimum length of a fragment simulated. Its default value is 150.
		It is better to specify a value greater than "Read_Length", because
		no read will be created for a fragment whose length is shorter than 
		"Read_Length".
	
	Fragment_Maximum_Length
		The maxium length of a fragment simulated. Its default value is 400.

	Coverage_Factor
		The coverage factor, together with the abundance of a transcript, 
		affects the number of reads generated in a unit of region.
	
	Read_ID_Prefix
		The prefix of each read ID.

	Flip_And_Reverse
		A switch that specifies whether to flip and reverse one of the ends
		of the reads. 
		Available values are "Yes" and "No".

	Quality_Generator 
		The quality score generator in use. perfect or real.
		Available values are "Perfect" for the maximum(best) quality score for 
		each read, and "Real" for sampling real Fastq files for quality score.
	
	Real_Quality_Score_Fastq_1
		The real Fastq file for generating quality score for the first pair end 
		read if "Quality_Generator" is "Real".

	Real_Quality_Score_Fastq_2
		The real Fastq file for generating quality score for the second pair end
		read if "Quality_Generator" is "Real".

	Unknown_Factor
		The probability of changing a base-pair. It is a factor of simulating
		random noise, unknown mismatches, RNA Editing, and etc.

	Output_Fastq_1
		The output of the first pair end reads.
	
	Output_Fastq_2
		The output of the second pair end reads.

	Input_Buffer_Size
		The size of input file buffer.
	
	Output_Buffer_Size
		The size of the output file buffer.
	
	Process_Window_Size
		The size of the window on the reference coordinate processed each time					
	
OUTPUT
	Two files, specified in "Output_Fastq_1" and "Output_Fastq_2", are the main
	output that contain all reads.

	If the abundance file does not exist, it will be created with the abundance
	value the program randomly chooses.
	
	An .INX file will also be built for fast accessing the "FASTA_File".
		
AUTHORS
	Shunping Huang <sphuang@cs.unc.edu>
	Jack Wang <zhew@live.unc.edu>

VERSION
	1.3