Error correction for Oxford Nanopore data
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.



Error correction for oxford nanopore reads

        Blast to be in path
        SGE or similar scheduler

        Clone the repository to a shared filesysem on a cluster
        >git clone
        >cd nanocorr        
        Create a virtual environment to install python dependencies

        >virtualenv nanocorr_ve
        >source nanocorr_ve/bin/activate
        install the following packages using pip:

            pip install git+
            pip install numpy
            pip install h5py
            pip install git+
            pip install git+
            pip install git+
            pip install git+
        #Finally install the nanocorr package itself
        > python install

        Make sure you are in the virtualenv
        >source nanocorr/nanocorr_ve/bin/activate

        Partition your reads for distributed processing
        >python 100 500 nanopore_reads.fa
        A series of directories will be created by the partitioning
        [0001,0002,...]. In each directory run the script
        on SGE or similar system that sets SGE_TASK_ID environment
        variable. Set the -t parameter to the number of files in the 
        >qsub -cwd -v PATH,LD_LIBRARY_PATH -t 1:500 -j y -o nanocorr_out /path/to/ query.fa reference.fa
        The query file will be "blasted" against each previously partitioned read.        
        This query file can be anything useful for correction. 
        Illumina data is what is used right now.
        The corrected reads will be in the resulting "fa" files in the partition
        If you supply a reference genome, the corrected reads will be blasted
        against that and a ".refblast6.q" file will be created for each partition.
        This will be the corrected reads aligned to the reference. Just make sure
        the blast db has been created for the reference.

Non-SGE Environment:
        If you don't have SGE installed you can use GNU parallel to run nanocorr on
        a single machine. Although not the recommended method,
        as alignment can be very compute intensive, for small genomes 
        (bacteria), this method can be tractable.

        For each of the directories created by the partition script (0001..000N),
        cd into the directory and run:

        $>for j in {1..500}; do 
              echo "SGE_TASK_ID=$j TMPDIR=/tmp query.fa reference.fa"; 
          done  | parallel -j <# of compute cores>