-
Notifications
You must be signed in to change notification settings - Fork 1
collection of command-line functions used to perform multiple small frequently required analysis
License
mnshgl0110/hometools
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
usage: Collections of command-line functions to perform common pre-processing and analysis functions. [-h] {getchr,sampfa,exseq,getscaf,seqsize,filsize,subnuc,basrat,genome_ranges,get_homopoly,asstat,shannon,fachrid,faline,bamcov,pbamrc,splitbam,mapbp,bam2coords,ppileup,runsyri,syriidx,plthist,plotal,pltbar,asmreads,gfatofa,gfftrans,gffsort,vcfdp,getcol,smprow} ... positional arguments: {getchr,sampfa,exseq,getscaf,seqsize,filsize,subnuc,basrat,genome_ranges,get_homopoly,asstat,shannon,fachrid,faline,bamcov,pbamrc,splitbam,mapbp,bam2coords,ppileup,runsyri,syriidx,plthist,plotal,pltbar,asmreads,gfatofa,gfftrans,gffsort,vcfdp,getcol,smprow} getchr FASTA: Get specific chromosomes from the fasta file sampfa FASTA: Sample random sequences from a fasta file exseq FASTA: extract sequence from fasta getscaf FASTA: generate scaffolds from a given genome seqsize FASTA: get size of dna sequences in a fasta file filsize FASTA: filter out smaller molecules subnuc FASTA: Change character (in all sequences) in the fasta file basrat FASTA: Calculate the ratio of every base in the genome genome_ranges FASTA: Get a list of genomic ranges of a given size get_homopoly FASTA: Find homopolymeric regions in a given fasta file asstat FASTA: Get N50 values for the given list of chromosomes shannon FASTA: Get Shanon entropy across the length of the chromosomes using sliding windows fachrid FASTA: Change chromosome IDs faline FASTA: Convert fasta file from single line to multi line or vice-versa bamcov BAM: Get mean read-depth for chromosomes from a BAM file pbamrc BAM: Run bam-readcount in a parallel manner by dividing the input bed file. splitbam BAM: Split a BAM files based on TAG value. BAM file must be sorted using the TAG. mapbp BAM: For a given reference coordinate get the corresponding base and position in the reads/segments mapping the reference position bam2coords BAM: Convert BAM/SAM file to alignment coords ppileup BAM: Currently it is slower than just running mpileup on 1 CPU. Might be possible to optimize later. Run samtools mpileup in parallel when pileup is required for specific positions by dividing the input bed file. runsyri syri: Parser to align and run syri on two genomes syriidx syri: Generates index for syri.out. Filters non- SR annotations, then bgzip, then tabix index plthist Plot: Takes frequency output (like from uniq -c) and generates a histogram plot plotal Plot: Visualise pairwise-whole genome alignments between multiple genomes pltbar Plot: Generate barplot. Input: a two column file with first column as features and second column as values asmreads GFA: For a given genomic region, get reads that constitute the corresponding assembly graph gfatofa GFA: Convert a gfa file to a fasta file gfftrans GFF: Get transcriptome (gene sequence) for all genes in a gff file. WARNING: THIS FUNCTION MIGHT HAVE BUGS. gffsort GFF: Sort a GFF file based on the gene start positions vcfdp VCF: Get DP and DP4 values from a VCF file. getcol Table:Select columns from a TSV or CSV file using column names smprow Table:Select random rows from a text file optional arguments: -h, --help show this help message and exit
About
collection of command-line functions used to perform multiple small frequently required analysis
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published