Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iu-merge-pairs error #3

Closed
nikolay12 opened this issue Sep 10, 2016 · 3 comments
Closed

iu-merge-pairs error #3

nikolay12 opened this issue Sep 10, 2016 · 3 comments

Comments

@nikolay12
Copy link

nikolay12 commented Sep 10, 2016

Here is my case. I got my V1-V3 data sequenced by an external provider. They say they used the Illumina Casava pipeline version 1.8.3.

As a start I tried to generate a config file. I followed the steps listed at https://github.com/meren/illumina-utils:

I first generated a tab file listing all the sample names and the corresponding paired end fastq files. Than I ran iu-gen-configs and I was surprised that instead of generating a single config file it generated a config file for each sample.

Than I decided to merge the paired end fastq files for each sample by using iu-merge-pairs using the --compute-qual-dicts option. When I ran it for the first sample it produced the following error:

Error: Your input FASTQ files do not seem to be generated by CASAVA 1.8. Please use --ignore-deflines parameter.

I added the parameter as requested. Than I got another error message:

$ iu-merge-pairs --compute-qual-dicts --ignore-deflines 16001_posD09_CCTAAGACACTGCATA.ini
Traceback (most recent call last):
  File "/usr/local/bin/iu-merge-pairs", line 770, in <module>
    sys.exit(merger.run())
  File "/usr/local/bin/iu-merge-pairs", line 398, in run
    tile_number = self.input_1.entry.tile_number
  File "/Library/Python/2.7/site-packages/IlluminaUtils/lib/fastqlib.py", line 82, in __getattr__
    return getattr(self, '_'.join(['process', key]))()
(...)
  File "/Library/Python/2.7/site-packages/IlluminaUtils/lib/fastqlib.py", line 82, in __getattr__
    return getattr(self, '_'.join(['process', key]))()
  File "/Library/Python/2.7/site-packages/IlluminaUtils/lib/fastqlib.py", line 73, in __getattr__
    if key in ['__str__']: 
RuntimeError: maximum recursion depth exceeded in cmp

I don't know what to do now. Can you, perhaps, advise?

@meren
Copy link
Member

meren commented Sep 10, 2016

Hi,

Can you please provide some example files? The first 1,000 lines of R1 and R2 reads for one of your samples in that dataset would be the best (i.e., example-R1.fastq and example-R2.fastq).

Thanks,

@nikolay12
Copy link
Author

Thanks for your quick reply. I'm attaching the first 1000 lines.
16001_posD09_CCTAAGACACTGCATA_R1_1000.fastq.gz
16001_posD09_CCTAAGACACTGCATA_R2_1000.fastq.gz

@meren
Copy link
Member

meren commented Sep 11, 2016

Hi,

I found what causes the error and I will add a control when I have a chance for the next version. Clearly it should be illegal to use --compute-qual-dicts flag when --ignore-deflines is used :( So your short-term solution is to not use --compute-qual-dicts. I am sorry for that.

The reason you are forced to use --ignore-deflines with these files is because their headers are not what we expect to see with CASAVA 1.8+. This is how it should look like:

@D4ZHLFP1:36:C10H4ACXX:8:2203:21201:39665 1:N:0:CCAT

and this is how yours look like:

@HWI-M04481:31:000000000-ARVP8:1:1101:7941:1899/1

If you look at the specification on the Illumina page, you will realize that the format they describe matches to the first one:

http://support.illumina.com/help/SequencingAnalysisWorkflow/Content/Vault/Informatics/Sequencing_Analysis/CASAVA/swSEQ_mCA_FASTQFiles.htm

Best,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants