-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gzip and core dumping #39
Comments
Hi Pavel,
This is likely two problems with the input. Something is causing gzip to
think there is an unexpected end of file. This could be a hidden character
or some other error created during the initial file creation. Not sure.
Rainbow is giving you an error because it appears that these reads were
trimmed and are not uniform length which is a requirement for both ddocent
and rainbow for assembly.
Hope that helps,
Jon
…--
Jon Puritz, PhD
Assistant Professor
Department of Biological Sciences
University of Rhode Island
120 Flagg Road, Kingston, RI 02881
Webpage: MarineEvoEco.com
Email:
jpuritz@gmail.com
Cell: 401-338-8739
Work: 401-874-9020
"The most valuable of all talents is that of never using two words when one
will do.” -Thomas Jefferson
On July 14, 2018 at 10:41:49 PM, Pavel V. Dimens (notifications@github.com) wrote:
I needed to run some older Illumina reads through dDocent, and I am getting
these two specific errors that are preventing the pipeline from completing:
1. The first is when initiating trimming, this error appears
Trimming reads and simultaneously assembling reference sequences
Removing the _1 character and replacing with /1 in the name of every sequence
gzip: FRS_001.F.fq.gz: unexpected end of file
gzip: FRS_002.F.fq.gz: unexpected end of file
gzip: FRS_003.F.fq.gz: unexpected end of file
gzip: FRS_004.F.fq.gz: unexpected end of file
It's unclear why this is happening, as the files were zipped (gzip) using
the default settings. The trim logs seem to indicate the trimming proceeds
to completion,
1. This may possibly be related to the first issue, but after completion
of trimming and input of assembly parameters after the gnuplot prompts,
this rainbow error appears that shuts the whole process down:
Now sit back, relax, and wait for your analysis to finish
/home/saillantslab/miniconda3/envs/ddocent/bin/dDocent: line 841:
1269 Segmentation fault (core dumped) rainbow div -i rcluster -o
rbdiv.out -f 0.5 -K 10
/home/saillantslab/miniconda3/envs/ddocent/bin/dDocent: line 882: /
100 + 1: syntax error: operand expected (error token is "/ 100 + 1")
After the process shuts down, here is a list of what's left in the working
directory, with filesizes on the left:
4096 unpaired
0 trim.log
1027337 rcluster.gz
1040 assemble.trim.log
109509 uniq.F.fasta.gz
1104453 totaluniqseq.gz
1112 cdhit.log
1255 FRS_003.trim.log
1256 FRS_001.trim.log
1256 FRS_002.trim.log
1256 FRS_004.trim.log
1264501 uniq.fasta.gz
136 sort.contig.cluster.ids
164 uniqseq.data
215 xxx
24 uniqseq.peri.data
30 rbdiv.out.gz
324 lengths.txt
32 namelist
4373229 contig.cluster.totaluniqseq
43948224 FRS_003.uniq.seqs
4415013 uniq.k.4.c.2.seqs
44420402 FRS_002.uniq.seqs
46093742 FRS_001.uniq.seqs
475 dDocent.runs
50114803 FRS_004.uniq.seqs
50720656 FRS_003.R2.fq.gz
50840802 uniq.seqs.gz
51329776 FRS_003.R.fq.gz
51936612 FRS_002.R2.fq.gz
5234827 uniq.full.fasta
52537196 FRS_002.R.fq.gz
58284180 FRS_001.R2.fq.gz
58939121 FRS_001.R.fq.gz
60670176 FRS_003.R1.fq.gz
61136939 FRS_003.F.fq.gz
62141619 FRS_002.R1.fq.gz
62677396 FRS_002.F.fq.gz
63877600 FRS_004.R2.fq.gz
64686395 FRS_004.R.fq.gz
69506759 FRS_001.R1.fq.gz
70096018 FRS_001.F.fq.gz
76271641 FRS_004.R1.fq.gz
76923721 FRS_004.F.fq.gz
890 xxx.clstr
9863900 uniqCperindv
9939 dDocent_main.LOG
To be honest, this is old data that someone else (years ago) has
preprocessed somewhat to remove UMI elements, so it's unclear to me if the
issue is with the software, or the inputs I am giving the dDocent. For the
sake of being thorough, here is what the format of the input fasta files
looks like:
@cluster_1562 CATCTCCT
ATGAAGGGAACTACATTTCCCATATTTCATGAAAAGAGTGGGTGAGCATGATGTTTTCACACCAACTTTCAGGTGTCGTTC
+
?BB@?A:3@??A=>@eeb?<??=>EEBA@BB@DC===AAB;@ab=B:7B@<9@DEEBA636?:A=>DEBA:A@9@A=0EB?
@cluster_1579 GATATGGT
TTGCGAAGCATCTAGTATTGTCACACTCCGTTACTCAACACTATGTATGATGCGCTTTTCTGTGATATCTCGTGGTACTCTTTTT
+
A<->:03<,4.5@1692=2/119>:=400/4B0/592<4=.26244.55/1)/B74;3;?@4:1/1-94D.<A6/;3244BDD@C
Any insight as to where the issue stems from would be appreciated. Thank
you!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#39>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEnRR2J0skWJPCTO5HbJ1P7Bhl05tK7Oks5uGqvsgaJpZM4VQFvc>
.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I needed to run some older Illumina reads through dDocent, and I am getting these two specific errors that are preventing the pipeline from completing:
It's unclear why this is happening, as the files were zipped (gzip) using the default settings. The trim logs seem to indicate the trimming proceeds to completion,
rainbow
error appears that shuts the whole process down:After the process shuts down, here is a list of what's left in the working directory, with filesizes on the left:
4096 unpaired
0 trim.log
1027337 rcluster.gz
1040 assemble.trim.log
109509 uniq.F.fasta.gz
1104453 totaluniqseq.gz
1112 cdhit.log
1255 FRS_003.trim.log
1256 FRS_001.trim.log
1256 FRS_002.trim.log
1256 FRS_004.trim.log
1264501 uniq.fasta.gz
136 sort.contig.cluster.ids
164 uniqseq.data
215 xxx
24 uniqseq.peri.data
30 rbdiv.out.gz
324 lengths.txt
32 namelist
4373229 contig.cluster.totaluniqseq
43948224 FRS_003.uniq.seqs
4415013 uniq.k.4.c.2.seqs
44420402 FRS_002.uniq.seqs
46093742 FRS_001.uniq.seqs
475 dDocent.runs
50114803 FRS_004.uniq.seqs
50720656 FRS_003.R2.fq.gz
50840802 uniq.seqs.gz
51329776 FRS_003.R.fq.gz
51936612 FRS_002.R2.fq.gz
5234827 uniq.full.fasta
52537196 FRS_002.R.fq.gz
58284180 FRS_001.R2.fq.gz
58939121 FRS_001.R.fq.gz
60670176 FRS_003.R1.fq.gz
61136939 FRS_003.F.fq.gz
62141619 FRS_002.R1.fq.gz
62677396 FRS_002.F.fq.gz
63877600 FRS_004.R2.fq.gz
64686395 FRS_004.R.fq.gz
69506759 FRS_001.R1.fq.gz
70096018 FRS_001.F.fq.gz
76271641 FRS_004.R1.fq.gz
76923721 FRS_004.F.fq.gz
890 xxx.clstr
9863900 uniqCperindv
9939 dDocent_main.LOG
To be honest, this is old data that someone else (years ago) has preprocessed somewhat to remove UMI elements, so it's unclear to me if the issue is with the software, or the inputs I am giving the dDocent. For the sake of being thorough, here is what the format of the input fasta files looks like:
Any insight as to where the issue stems from would be appreciated. Thank you!
The text was updated successfully, but these errors were encountered: