Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stability.contigs.groups not sorted #451

Closed
ghost opened this issue May 4, 2018 · 6 comments

Comments

@ghost
Copy link

commented May 4, 2018

Hi,
I'm having issues with make.contigs command.
While using version 1.39.5, the output of both "stability.contigs.fasta" and "stability.contigs.groups" were sorted at the same way, but with actual 1.40.2 version, "stability.contigs.groups" remains unsorted, which leads to an error when using the pre.cluster command later. Here is my code:

mothur "#make.contigs(file=stability.files); summary.seqs()"
mothur "#screen.seqs(fasta=stability.trim.contigs.fasta, group=stability.contigs.groups, summary=stability.trim.contigs.summary, maxambig=0, maxlength=300); summary.seqs()"
mothur "#unique.seqs(fasta=stability.trim.contigs.good.fasta); count.seqs(); summary.seqs()"
mothur "#align.seqs(fasta=stability.trim.contigs.good.unique.fasta, template=$PATH/silva.seed_v132.align); filter.seqs(); unique.seqs()"
mothur "#pre.cluster(fasta=stability.trim.contigs.good.unique.filter.unique.fasta, count=ID.trim.contigs.good.count_table, diffs=2)"

And the final pre.cluster error:
Error reading fasta file...please correct.

This did not happen when using version 1.39.5 whith this same code, and I have checked that every file used as input (stability.trim.contigs.good [...] .fasta and the group_tables) are perfectly sorted in this older version, but not in the new one, except for "stability.contigs.fasta".

Any idea to solve this issue?
Thanks a lot.

@mothur-westcott

This comment has been minimized.

Copy link
Contributor

commented May 8, 2018

The group file is not required to be sorted. Could you post the logfile? I suspect there is a file mismatch happening.

@ghost

This comment has been minimized.

Copy link
Author

commented May 25, 2018

The logfile says not much in this case:

Linux version

Using ReadLine

Using Boost

Running 64Bit Version

mothur v.1.40.2
Last updated: 05/25/2018
by
Patrick D. Schloss

Department of Microbiology & Immunology

University of Michigan
http://www.mothur.org

When using, please cite:
Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41.

Distributed under the GNU General Public License

Type 'help()' for information on the commands that are available

For questions and analysis support, please visit our forum at https://www.mothur.org/forum

Type 'quit()' to exit program

Script Mode


mothur > pre.cluster(fasta=$PATH/0.15.01610.011_R1_paired.trim.contigs.good.unique.filter.unique.fasta, count=$PATH/0.15.01610.011_R1_paired.trim.contigs.good.count_table, diffs=2, processors=12)

Using 12 processors.
When using running without group information mothur can only use 1 processor, continuing.
Error reading fasta file...please correct.

mothur > quit()

mothur > quit()

Hope this helps, thank you for your answer.

@mothur-westcott

This comment has been minimized.

Copy link
Contributor

commented May 29, 2018

Does your count file have group data? Could you post your input files?

@ghost

This comment has been minimized.

Copy link
Author

commented May 29, 2018

Sure. These are heads for:

The .fasta file:

>MSQ-M01442_200_000000000-ALPHL_1_1112_20005_7942
....G-A-T-A-T-G-T-G-C-CA-G-C-A-G-C--CGC--GGTAATACGT-AGGGG--GCGA-GCGTTGTCCGGAA-TGATTGGGCGT-AAA-GG-GC-GC-GTAG-GC-GG-C-CT-G-T-TAA--G-T-C-TG-G-AG-TG-AAA-GT-C-C-TG-TTTT-CAA-G-A-T-G-G-G-A-A-T-T-G--CTTTG-GATACT-GATG-GG-C-T-TGAGT--G-C-AGGA-GAGGT-TATC-GG-AATT-C-CC-G-GTGTAG-CGGTGAAATGCGTAGAGATCG-G-G-AG-G-AACACC--AG-T-G-GC-GAA-GGC-GG--G-T-A-A--CTGGACT--GCA-ACTGACGCTG-A-GGCGCGAAAGTG-TGGG-GAGCAAACAGGATTAG-ATA--CCC-G-T-GTA-GTCC...............................................................................................................................................................................................................................................................................................
>MSQ-M01442_200_000000000-ALPHL_1_2113_9482_23839
....T-T-G-G-A-G-T-G-C-CA-G-C-A-G-C--CGC--GGTAATACGT-AGGGG--GCGA-GCGTTGTCCGGAA-TGATTGGGCGT-AAA-GG-GC-GC-GTAG-GC-GG-C-CT-G-G-TAA--G-T-T-AG-G-AG-TG-AAA-GT-C-C-TG-TTTT-CAA-G-A-T-G-G-G-A-A-T-T-G--CTTTT-AATACT-GTCG-GG-C-T-GGAGT--A-C-AGGA-GAGGA-AAGC-GG-AATT-A-CC-G-GTGTAG-CGGTGAAATGCGTAGAGATCG-G-T-AG-G-AACACC--AG-T-G-GC-GAA-GGC-GG--C-T-T-T--CTGG-ACT-GAA-ACTGACGCTG-A-GGCGCGAAAGCG-TGGG-GAGCAAACAGGATTAG-ATA--CCC-T-G-GTA-GTCC...............................................................................................................................................................................................................................................................................................

The "trim.contigs.good.count_table" (the count file)

Representative_Sequence total
MSQ-M01442_200_000000000-ALPHL_1_2111_21155_7675        1
MSQ-M01442_200_000000000-ALPHL_1_2113_24641_21567       1
MSQ-M01442_200_000000000-ALPHL_1_2111_4031_7685 1
MSQ-M01442_200_000000000-ALPHL_1_2111_4331_7691 1
MSQ-M01442_200_000000000-ALPHL_1_2104_25797_8695        1
MSQ-M01442_200_000000000-ALPHL_1_2111_23476_7701        1
MSQ-M01442_200_000000000-ALPHL_1_2102_14290_21355       2
MSQ-M01442_200_000000000-ALPHL_1_1105_18361_13834       1
MSQ-M01442_200_000000000-ALPHL_1_2108_5015_13418        1

I assume that the 2nd column has the group data.

@mothur-westcott

This comment has been minimized.

Copy link
Contributor

commented May 29, 2018

The second column is the total count, not the group info. The count file, https://www.mothur.org/wiki/Count_File, can also have group breakdowns of the total.

`Representative_Sequence total F003D000 F003D002 F003D004 F003D006 F003D008 F003D142 F003D144 F003D146 F003D148 F003D150

GQY1XT001CFHYQ 467 325 40 22 30 24 6 7 3 7 3

GQY1XT001C44N8 3677 323 132 328 318 232 579 448 426 381 510

GQY1XT001C296C 4652 356 877 754 794 284 538 361 313 0 375

GQY1XT001ARCB1 2202 203 391 220 155 308 126 33 191 289 286
`

From this you can see GQY1XT001CFHYQ represents 467 reads. 325 belong to sample F003D000, 40 to sample F003D002, ect.

Mothur should still be able to process the dataset as a single group. I will fix that issue and put up a new version in the next day or two.

@mothur-westcott

This comment has been minimized.

Copy link
Contributor

commented May 31, 2018

Can you give our latest version a try, https://github.com/mothur/mothur/releases/tag/v1.40.4?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.