Protein FDR calculation #1

ohickl · 2019-07-05T09:13:12Z

Hi,
I am a bit confused by the values calculated for the final protein report. I looks like this with my data for example:

Numbers of proteins before filtering
Decoy_Proteins_Before_Filtering = 241
Target_Proteins_Before_Filtering = 37016
Numbers of proteins after filtering
Decoy_Proteins_After_Filtering = 60
Target_Proteins_After_Filtering = 12462
Protein FDR = Decoy_Proteins_After_Filtering / Target_Proteins_After_Filtering
Protein_FDR = 0.96%

The ~12500 proteins with the 60 decoys are reported afterwards. But how does it end up with 0.96% decoy FDR? If it only found 241 decoys with almost 40k proteins before filtering it was already way below 1% or am I missing something?

Alo it get the following error trying to produce a pepXML file:

python2.7 /opt/sipros/Scripts/sipros_psm_tabulating.py -i /scratch/maxquant/OH/Sipros/method_test/markert_strap_brp_01/output -o /scratch/maxquant/OH/Sipros/method_test/markert_strap_brp_01/output -c /scratch/maxquant/OH/Sipros/method_test/markert_strap_brp_01/20190703_method_test.cfg -x
[Fri Jul 5 11:11:30 2019] Beginning Sipros Ensemble Tabulating (1.0.1 (Alpha))
[Step 1] Parse options and get config file: Running -> Done!
[Step 2] Generate PSM table: Running -> Done!
[Step 3] Merge Protein list: Running -> Done!
[Step 4] Generate Pepxml: Running -> Traceback (most recent call last):
File "/opt/sipros/Scripts/sipros_psm_tabulating.py", line 662, in <module> sys.exit(main())
File "/opt/sipros/Scripts/sipros_psm_tabulating.py", line 647, in main writePepxml(base_out + '.tab', config_dict, modification_dict, element_modification_list_dict, output_folder)
File "/opt/sipros/Scripts/sipros_psm_tabulating.py", line 406, in writePepxml psm_obj.score_process()
File "/opt/sipros/Scripts/sipros_psm_tabulating.py", line 348, in score_process diff = (pep.scorelist[idx1]/l1[0].scorelist[idx1]) - 1
ZeroDivisionError: float division by zero

Also are re you still actively working on Sipros Ensemble?

Love Sipros Ensemble and the results so far!

Cheers

Oskar

The text was updated successfully, but these errors were encountered:

guo-xuan · 2019-07-27T20:18:13Z

Hi Oskar, Thank you for your questions. The reason for 0.96% as the FDR is that we use half of the decoy PSMs for training a machine model. So the estimate decoy proteins should be doubled, i.e., FDR = 60*2/12462. There are a few other parameters for protein filtering, such as the minimum number of required unique peptides. Some of these 37016 proteins may only support by shared peptides, so, get grouped together and are counted just once. I hope this helps you and I am happy to answer if you have any further questions. Bests, Xuan

…

________________________________ From: 0ssH <notifications@github.com> Sent: Friday, July 5, 2019 4:13 AM To: guo-xuan/Sipros-Ensemble <Sipros-Ensemble@noreply.github.com> Cc: Subscribed <subscribed@noreply.github.com> Subject: [guo-xuan/Sipros-Ensemble] Protein FDR calculation (#1) Hi, I am a bit confused by the values calculated for the final protein report. I looks like this with my data for example: * Numbers of proteins before filtering Decoy_Proteins_Before_Filtering = 241 Target_Proteins_Before_Filtering = 37016 * Numbers of proteins after filtering Decoy_Proteins_After_Filtering = 60 Target_Proteins_After_Filtering = 12462 * Protein FDR = Decoy_Proteins_After_Filtering / Target_Proteins_After_Filtering Protein_FDR = 0.96% The ~12500 proteins with the 60 decoys are reported afterwards. But how does it end up with 0.96% decoy FDR? If it only found 241 decoys with almost 40k proteins before filtering it was already way below 1% or am I missing something? Also are re you still working on it? Love Sipros Ensemble and the results so far! Cheers Oskar — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#1?email_source=notifications&email_token=ADNGYADUU47CV4XRSUW6X7DP54GCRA5CNFSM4H6JN3AKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G5QL4KA>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ADNGYACHQMF7R4FFAZ6XUJ3P54GCRANCNFSM4H6JN3AA>.

ohickl · 2019-07-31T07:29:59Z

Hi Xuan,

got it. Thanks!
Do you plan on implementing protein level FDR filtering? I think I read something about it in the readme or the publication. I tried it by setting the FDR_Filtering = Protein in the config file but it does still seem to Filter on 1% peptide FDR.
I would like to do that, because I tend to get a protein level FDR of above 1% when filtering on at least 1 or more unique peptides. The effect is especially strong when searching large databases (e.g. the one I tried contained about 18*10^6 target sequences).
Thanks for your time!

Oskar

guo-xuan · 2019-08-05T14:19:01Z

Hi Oskar, I am a little confused. Do you want 1% FDR at protein level or peptide level? Xuan

…

________________________________ From: 0ssH <notifications@github.com> Sent: Wednesday, July 31, 2019 2:29 AM To: guo-xuan/Sipros-Ensemble <Sipros-Ensemble@noreply.github.com> Cc: Guo,Xuan <xuan_guo@outlook.com>; Comment <comment@noreply.github.com> Subject: Re: [guo-xuan/Sipros-Ensemble] Protein FDR calculation (#1) Hi Xuan, got it. Thanks! Do you plan on implementing protein level FDR filtering? I think I read something about it in the readme or the publication. I tried it by setting the FDR_Filtering = Protein in the config file but it does still seem to Filter on 1% peptide FDR. I would like to do that, because I tend to get a protein level FDR of above 1% when filtering on at least 1 or more unique peptides. The effect is especially strong when searching large databases (e.g. the one I tried contained about 18*10^6 target sequences). Thanks for your time! Oskar — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#1?email_source=notifications&email_token=ADNGYABKNTIV3HTKXKFGXFLQCE5PPA5CNFSM4H6JN3AKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3GLOSI#issuecomment-516732745>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ADNGYAEZAPHJH3DNIK5JODLQCE5PPANCNFSM4H6JN3AA>.

ohickl · 2019-08-13T09:28:48Z

Hi Xuan,

sorry about that. I would like to filter on protein level.

guo-xuan · 2019-08-22T23:14:31Z

Hi Oskar, Sorry for the late reply. I am hell busy these days. I don't have a publicly available protein FDR control script. If 1% protein FDR is designed, what I would do is to try a set of peptide FDRs to see which one gives the exact 1% protein FDR or the closest. I have a python script for this purpose, but it is not user-friendly. I attached that script in this email anyway. Note that the comments in this python script may not be helpful. I may be able to upgrade this script, but I don't know when I have time to do that. Bests, Xuan

…

________________________________ From: Oskar Hickl <notifications@github.com> Sent: Tuesday, August 13, 2019 4:28 AM To: guo-xuan/Sipros-Ensemble <Sipros-Ensemble@noreply.github.com> Cc: Guo,Xuan <xuan_guo@outlook.com>; Comment <comment@noreply.github.com> Subject: Re: [guo-xuan/Sipros-Ensemble] Protein FDR calculation (#1) Hi Xuan, sorry about that. I would like to filter on protein level. — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#1?email_source=notifications&email_token=ADNGYACZORRPL2KJDLAH473QEJ5FBA5CNFSM4H6JN3AKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4FDCXY#issuecomment-520761695>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ADNGYAEOM6WETOMB3XVVSYTQEJ5FBANCNFSM4H6JN3AA>.

ohickl · 2020-01-23T15:17:00Z

Hey Xuan,

sorry for the late reply. I am still interested in your python script. Could you send it to me at oskar.hickl@uni.lu? Your last reply went to github and there was no file attached.
Are there any news regarding the development of Sipros Ensemble? Id love to see it continued!

Cheers
Oskar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Protein FDR calculation #1

Protein FDR calculation #1

ohickl commented Jul 5, 2019 •

edited

Loading

guo-xuan commented Jul 27, 2019 via email

ohickl commented Jul 31, 2019

guo-xuan commented Aug 5, 2019 via email

ohickl commented Aug 13, 2019

guo-xuan commented Aug 22, 2019 via email

ohickl commented Jan 23, 2020

Protein FDR calculation #1

Protein FDR calculation #1

Comments

ohickl commented Jul 5, 2019 • edited Loading

guo-xuan commented Jul 27, 2019 via email

ohickl commented Jul 31, 2019

guo-xuan commented Aug 5, 2019 via email

ohickl commented Aug 13, 2019

guo-xuan commented Aug 22, 2019 via email

ohickl commented Jan 23, 2020

ohickl commented Jul 5, 2019 •

edited

Loading