New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue in identifying protein from the peptide sequence #39

Open
zrpeak opened this Issue Dec 10, 2018 · 14 comments

Comments

Projects
None yet
2 participants
@zrpeak
Copy link

zrpeak commented Dec 10, 2018

Dear developers,

I found that there are some errors in the result of peptides list. The program may identify a peptide to proteins that cannot generate the same peptide in enzyme cleavage.

For example, in my result, a peptide "GCLLYK" is identified to proteins Q9CY00, Q6P1C1, Q91Z96, Q3UHJ0, Q99J38 and A2AKQ8. However, only in Q6P1C1 there is a lysine before the peptide (...LPKGCLLYK..., Trypsin was used in the experiment). While in other proteins, the amino acids before the peptide are neither lysine nor arginine (e.g. in Q9CY00 it is ...VNMGCLLYK...). Is this an error in the program, or just the default setting treated the peptide as a result of nonspecific cleavage?

Many thanks.

@01joy

This comment has been minimized.

Copy link
Contributor

01joy commented Dec 13, 2018

Dear @zrpeak ,

Thanks for your feedback.

The default setting does not contain nonspecific cleavage peptides, so it might be an error when inferring proteins.

Could you provide me with your fasta, raw data, and parameter file, so that we can debug it?

@zrpeak

This comment has been minimized.

Copy link

zrpeak commented Dec 13, 2018

Dear @01joy ,

Thank you for the response.

The fasta file and raw data is in the attached file. I put one spectra in the MGF file and only the inferred proteins in the fasta file because the origin data is too big. The same inferring error can still be reproduced. All the parameters are kept default except a hydroxymethyl modification on cysteine is searched. The modification is the same with Hydroxymethyl[N] in the preinstalled modifications, with only the residue changed to C.

InferError.zip

@01joy

This comment has been minimized.

Copy link
Contributor

01joy commented Dec 13, 2018

Dear @zrpeak ,

I have received your data. You add Hydroxymethyl[C] by yourself?

Is Hydroxymethyl[C] set as variable modification or fixed modification?

What's the cross-linker, BS3?

@zrpeak

This comment has been minimized.

Copy link

zrpeak commented Dec 13, 2018

Dear @01joy ,

Yes, I added Hydroxymethyl[C] by myself and searched it as a variable modification. Actually this modification was originally mono-linked form of my crosslinker, and I treated it as a modification because I can search mono-linked peptide first and get some results before search of the cross-linked peptides finished (it's much slower). In this case none of crosslinker was added to the searching process. Sorry for the confusing.

p.s. I have tried to set the Hydroxymethyl[C] as modification and mono-linked form of crosslinking, and the results came the same. Both of them reported proteins that should not be identified.

@01joy

This comment has been minimized.

Copy link
Contributor

01joy commented Dec 14, 2018

Dear @zrpeak ,

I searched your data as you said, but nothing was identified for this spectrum.

Please find attached the results, help me check the parameters.

pLink_task_2018.12.14.09.46.11.zip

@zrpeak

This comment has been minimized.

Copy link

zrpeak commented Dec 14, 2018

Dear @01joy ,

I used your parameter file to repeat the search and was able to find the peptide. Maybe there's some difference in the modification setting?
Here is the text in my modification ini file.
name1611=Hydroxymethyl[C] 0
Hydroxymethyl[C]=C NORMAL 30.010565 30.010565 0 H(2)C(1)O(1)

@01joy

This comment has been minimized.

Copy link
Contributor

01joy commented Dec 14, 2018

Dear @zrpeak ,

Thanks for your feedback, I have reproduced your problem.

We will fix this bug in the next version.

@01joy

This comment has been minimized.

Copy link
Contributor

01joy commented Dec 28, 2018

Dear @zrpeak ,

We have fixed this bug, attached is the new search results.

The new version will be released on the New Year's Day of 2019.
pLink_task_2018.12.28.21.10.15.zip

@zrpeak

This comment has been minimized.

Copy link

zrpeak commented Dec 28, 2018

Dear @01joy ,

Thank you for the new year present:)

By the way, there is a bug for pLabel to read large MGF file (maybe >1GB?). It can read the spectra in the head part of the file, but when I select the spectra in the tail part, a "runtime error" window pops up. I found this bug when I tried to open the result from a MGF file with ~600000 spectra, and the error window appeared after the ~280000th spectra. I think the bug is due to the size of file but not the number of spectra, as there's no problem with a test MGF file with ~600000 spectra (the contents of spectra is one line so the file is only ~70 MB).

@01joy

This comment has been minimized.

Copy link
Contributor

01joy commented Dec 29, 2018

Dear @zrpeak ,

The new version pLink2.3.5 has been released, please download it at http://pfind.ict.ac.cn/software/pLink/index.html. pLink2.3.5 has fixed the protein inference bug.

pLabel is a 32-bit program, so it can only use at most 4GB memory, usually 2GB. The runtime error is probably due to the big size of your MGF. We will consider to fix this bug in the next version.

A typical MGF I have seen is about ~50000 spectra. What instrument did you use?

Thanks for your feedback.

@01joy 01joy added the bug label Dec 29, 2018

@zrpeak

This comment has been minimized.

Copy link

zrpeak commented Dec 30, 2018

Dear @01joy ,

The instrument is AB Sciex TripleTOF 5600.
I tried a file with fewer spectra and the bug can be reproduced.
The file size is 2.66 GB. Its number of MS2 spectra is ~109000 and number of exported precursors ~390000.
For another file of which the size is 1.47 GB, number of MS2 spectra and exported precursors are ~107000 and ~370000, there is no problem.
Windows Task Manager shows that the program didn't request much memory (only ~60 MB when opening the file and ~300 MB when opening the spectra list).
I guess that pLabel only reads the Title part in MGF file into memory, and locate the spectra in file by the title when it is chosen? And the address may not be located when the file is too large?

@zrpeak

This comment has been minimized.

Copy link

zrpeak commented Jan 2, 2019

Update: in pLabel from pLink2.3.5, The 2.66 GB file cannot be opened, with a pop-up window tells "Can not open file successfully! Make sure there is a spectrum in file."

@01joy

This comment has been minimized.

Copy link
Contributor

01joy commented Jan 2, 2019

Dear @zrpeak ,

I'm sorry that we only fixed the protein inference bug, and there is no enough time to fix the pLabel bug for the pLink2.3.5 version, which is released on the New Year's Day of 2019.

We will consider to fix the pLabel bug in the next version.

Thanks for your feedback.

@zrpeak

This comment has been minimized.

Copy link

zrpeak commented Jan 3, 2019

Dear @01joy ,

I wrote the update just because the error message is different from the 2.3.4 vesrion and may help find the bug. Sorry if it sounds like a press for update. I can still be read the results with opening the MGF file in text editor and copying only the labeled spectra to make a smaller file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment