Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full run command & expected output #21

Closed
slimsuite opened this issue Jul 18, 2017 · 18 comments
Closed

Full run command & expected output #21

slimsuite opened this issue Jul 18, 2017 · 18 comments

Comments

@slimsuite
Copy link

Is there anywhere with a full run command including all the parameters that need to be set? Also, what is the nature and format of the output?

The -h information is pretty limited. I can see that I need to set -l seed_length_cutoff -ml merging_length_cutoff but don't really know how these are used to make an educated guess as to what to set.

@esolares
Copy link
Collaborator

Hi,

I recommend reading through the readme file. I have copied and pasted an excerpt for you here:

-l: controls the length cutoff for anchor contigs. A good rule of thumb is to start with the N50 of the self assembly. E.g. if the N50 of your self assembly is 2Mb then use 2000000 as your cutoff. Lowering this value may lead to more merging but may increase the probability of mis-joins.

-ml: controls the minimum alignment length to be considered for merging. This is especially helpful for repeat-rich genomes. Default is 0 but higher values (>5000) are recommended.

added note: we recommend using a higher -ml value based on what the expected repeat lengths will be. It is recommended for the length to be larger so that it can span an entire repeat.

Most of the time you will want to only modify the -l and -ml parameters.

I also recommend reading through our paper also. link below:
http://nar.oxfordjournals.org/content/early/2016/07/25/nar.gkw654.full

If you still have questions after reading through the paper, please feel free to follow up with us again.

Thank you

@esolares
Copy link
Collaborator

Also to answer the first two questions.

The full run command is also listed in the readme.
Where L and M are positive integer values
quickmerge -d out.rq.delta -q hybrid_assembly.fasta -r self_assembly.fasta -hco 5.0 -c 1.5 -l $L -ml $M

The expected output is a fasta file named merge.fasta

There are also summary files:

aln_summary.tsv
anchor_summary.txt
summaryOut.txt

These contain summary information of alignments and overlaps.

@KevinMcKernan
Copy link

I'm getting a merged fasta file but all of the summary files are empty. I also dont see the -d out.JLXCBDrx.delta folder?

./quickmerge -d /home/tools/quickmerge/quickmerge/out.JLxCBDrx.delta -q /home/genome/V6_Paper_3.8Mb_181018/Jamaican_Lion_polished_V6_181018.merged.fa -r /home/genome/CBDrx/CBDrx/CBDrx_consensus_polished.contigs.fasta -hco 5.0 -c 1.5 -l 3800000 -ml 5000 -p JL_X_CBDrx

@KevinMcKernan
Copy link

/home/tools/quickmerge/quickmerge$ ls
LICENSE aln_summary_JL_X_CBDrx.tsv merge_wrapper.py param_summary_JL_X_CBDrx.txt
MUMmer3.23 anchor_summary_JL_X_CBDrx.txt merged_JL_X_CBDrx.fasta quast_results
README.md make_merger.sh merger quickmerge
/home/tools/quickmerge/quickmerge$ more aln_summary_JL_X_CBDrx.tsv
REF QUERY REF-LEN Q-LEN REF-ST REF-END Q-ST Q-END
/home/tools/quickmerge/quickmerge$ more anchor_summary_JL_X_CBDrx.txt
REF_NAME Q_NAME REF_LENGTH Q_LENGTH REF-ST REF-END Q-ST Q-END
/home/tools/quickmerge/quickmerge$ more param_summary_JL_X_CBDrx.txt
REF QUERY REF_START REF_END Q_START Q_END ORIENTATION INNIE(1/0) OVERLAP_LEN OVERLAP_PROP NO_OVERLAP_
AT_ENDS OVERHANG

@mahulchak
Copy link
Owner

mahulchak commented Dec 9, 2018 via email

@KevinMcKernan
Copy link

KevinMcKernan commented Dec 9, 2018

nucmer doesn't fire on the command line. I have MUMmer installed and can fire it off with ./nucmer.
PATH problem?
/home/tools/quickmerge/quickmerge/MUMmer3.23$ ./nucmer

USAGE: nucmer [options]

Try './nucmer -h' for more information.

@KevinMcKernan
Copy link

nucmer is now running on command line but the results are the same as above. How do I test the delta-filter

@KevinMcKernan
Copy link

I tried using the python script and it may have provided more diagnostics.

/home/tools/quickmerge/quickmerge$ python merge_wrapper.py /home/genome/V6_Paper_3.8Mb_181018/Jamaican_Lion_polished_V6_181018.merged.fa /home/genome/CBDrx/CBDrx/CBDrx_consensus_polished.contigs.fasta
1: PREPARING DATA
2,3: RUNNING mummer AND CREATING CLUSTERS

reading input file "out.ntref" of length 746105410

construct suffix tree for sequence of length 746105410

(maximum reference length is 536870908)

(maximum query length is 4294967295)

process 7461054 characters per dot

/usr/bin/mummer: suffix tree construction failed: textlen=746105410 larger than maximal textlen=536870908
ERROR: mummer and/or mgaps returned non-zero
ERROR: Could not parse delta file, out.delta
error no: 400
Traceback (most recent call last):
File "merge_wrapper.py", line 176, in
subprocess.call(mergercall)
File "/usr/lib/python2.7/subprocess.py", line 172, in call
return Popen(*popenargs, **kwargs).wait()
File "/usr/lib/python2.7/subprocess.py", line 394, in init
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1047, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

@mahulchak
Copy link
Owner

mahulchak commented Dec 9, 2018 via email

@esolares
Copy link
Collaborator

esolares commented Dec 9, 2018 via email

@KevinMcKernan
Copy link

drwxrwxr-x 6 ubuntu ubuntu 4096 Dec 9 16:13 .
drwxrwxr-x 3 ubuntu ubuntu 4096 Dec 9 14:28 ..
drwxrwxr-x 8 ubuntu ubuntu 4096 Dec 9 14:28 .git
-rw-rw-r-- 1 ubuntu ubuntu 97 Dec 9 14:28 .quickmergerc
-rw-rw-r-- 1 ubuntu ubuntu 35142 Dec 9 14:28 LICENSE
drwxrwxr-x 6 ubuntu ubuntu 4096 Dec 9 14:28 MUMmer3.23
-rw-rw-r-- 1 ubuntu ubuntu 7125 Dec 9 14:28 README.md
-rw-rw-r-- 1 ubuntu ubuntu 50 Dec 9 16:13 aln_summary_out.tsv
-rw-rw-r-- 1 ubuntu ubuntu 62 Dec 9 16:13 anchor_summary_out.txt
-rw-rw-r-- 1 ubuntu ubuntu 1333462610 Dec 9 16:12 hybrid_oneline.fa
-rw-rw-r-- 1 ubuntu ubuntu 823 Dec 9 14:28 make_merger.sh
-rwxrwxr-x 1 ubuntu ubuntu 6434 Dec 9 14:28 merge_wrapper.py
-rw-rw-r-- 1 ubuntu ubuntu 1333462610 Dec 9 16:13 merged_out.fasta
drwxrwxr-x 2 ubuntu ubuntu 4096 Dec 9 14:28 merger
-rw-rw-r-- 1 ubuntu ubuntu 69 Dec 9 16:13 nucmer.error
-rw-rw-r-- 1 ubuntu ubuntu 1 Dec 9 16:13 out.mgaps
-rw-rw-r-- 1 ubuntu ubuntu 758540563 Dec 9 16:13 out.ntref
-rw-rw-r-- 1 ubuntu ubuntu 0 Dec 9 16:13 out.rq.delta
-rw-rw-r-- 1 ubuntu ubuntu 118 Dec 9 16:13 param_summary_out.txt
drwxrwxr-x 4 ubuntu ubuntu 4096 Dec 9 14:51 quast_results
lrwxrwxrwx 1 ubuntu ubuntu 17 Dec 9 14:28 quickmerge -> merger/quickmerge
-rw-rw-r-- 1 ubuntu ubuntu 746129254 Dec 9 16:13 self_oneline.fa

@KevinMcKernan
Copy link

We do have another copy of MUMmer installed.

@esolares
Copy link
Collaborator

esolares commented Dec 9, 2018 via email

@esolares
Copy link
Collaborator

esolares commented Dec 9, 2018 via email

@KevinMcKernan
Copy link

KevinMcKernan commented Dec 9, 2018

Thank you! It seems to be running now. How much compute is required for 1Gb X 1Gb genome?

@esolares
Copy link
Collaborator

esolares commented Dec 9, 2018 via email

@KevinMcKernan
Copy link

I'll let it go overnight. Same box is assembling the organelle genomes on Canu.
If its not done tomorrow AM, i'll fire it off on another box.

@esolares
Copy link
Collaborator

esolares commented Dec 10, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants