Skip to content
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Fasta Utilities

A collection of scripts developed to interact with FASTA, FASTQ and SAM files. All the scripts use the ReadFastx module I wrote, which reads either a FASTA or FASTQ file by record. It also uses the FileBar module, which gives a terminal progress bar on a file as it is processed. ReadSam is the SAM equivalent.


  • - takes a bed, SAM, or wiggle file and creates a big version of it to upload to ucsc
  • - converts FASTQ to FASTA
  • - Converts a SAM format to FASTQ format
  • - converts mate pair reads to paired end orientation


  • - fixes the FASTQ header by removing spaces and optionally appending a suffix
  • - takes a file with tab delimited mappings and substitutes each of the first terms for each of the second terms
  • - renames FASTA files from ncbi into uscs nomenclature chr##
  • - reads a FASTA file and ensures all of the names are unique
  • - limits FASTA lines to 80 characters


  • - bisulfite converts the sequences given to it
  • - merges all the input records into one record, by default uses the name of the first record, but can be changed with -name
  • - takes sequences and reverse complements them
  • - trims a fastx file to x bp
  • - takes two files of reads sorted by header, and outputs two files containing those reads which have pairs
  • - gets the pairs of the files in the first file from the second file, pairs are matched by header name
  • - applies the given regex to the FASTA headers or sequence
  • - removes ambiguity codes from FASTA files
  • - splice a FASTA file given a gff file
  • - splits a multi FASTA file into multiple files, can split in different ways
  • - subsets a FASTA file
  • - translate a FASTA cDNA to protein
  • - create a random FASTA file
  • - generate a consensus FASTA file from a bam file


  • - filters alignments from a bam or SAM file
  • - filters aligned reads from a file by mapping with bowtie2
  • - selects FASTA records which match or don't match a pattern
  • - reads a list of headers, and a fastx file and outputs records which are in the list
  • - returns the sequences with lengths between the values specified by -low and -high
  • - sorts a FASTA file using gnu sort, can sort by header, sequence, length ect.


  • - gets the average coverage per sequence from a bam file
  • - length of each record
  • - takes a file of FASTA lengths, or a FASTA or FASTQ file directly, and calculates the nX of the file, by default N50
  • - counts the number of CpGs in a FASTA file
  • - get the within group bitscore distance of all the records in a FASTA file using blast
  • - emulates unix head for FASTA and FASTQ files
  • - emulates unix tail for FASTA and FASTQ files
  • - calculates percent GC for each FASTA record in a file, as well as the total GC content
  • - gets the sequence lengths from a SAM file
  • - gets the total size of a FASTA file and the number of sequences

Bed scripts

  • - takes a file with the chromosome and location and a file of chromosome sizes, and converts the coordinates to an absolute scale for plotting
  • - converts a bed file to a igv snapshot script
  • - Combine bed files
  • - converts a gff file to a bed file


  • - given the input filename and the output filename, figures out the last line using tail, then greps for that header in the input, and works out the percentage that way
  • - gets sequence information from gi numbers from a blast results file
  • - Download a number of sequences from an entrez query
  • - download FASTA files from NCBI and outputs a FASTA file
  • - downloads the sra sequences from NCBI using aspera and outputs a FASTQ file
  • - remaps FASTA sequences from the first file to FASTA sequences from the second file, matches by hashing the sequence
  • - parses a mpileup file and gets the base counts
  • - Rename a file, changing any references to the old name in the file to the new name


Install with optional prefix, omit the prefix if you want to install system-wide.

perl Makefile.PL PREFIX=$HOME
make install


A collection of scripts developed to interact with fasta, fastq and sam/bam files.



No releases published


No packages published


You can’t perform that action at this time.