-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protein FDR calculation #1
Comments
Hi Oskar,
Thank you for your questions.
The reason for 0.96% as the FDR is that we use half of the decoy PSMs for training a machine model. So the estimate decoy proteins should be doubled, i.e., FDR = 60*2/12462.
There are a few other parameters for protein filtering, such as the minimum number of required unique peptides. Some of these 37016 proteins may only support by shared peptides, so, get grouped together and are counted just once.
I hope this helps you and I am happy to answer if you have any further questions.
Bests,
Xuan
…________________________________
From: 0ssH <notifications@github.com>
Sent: Friday, July 5, 2019 4:13 AM
To: guo-xuan/Sipros-Ensemble <Sipros-Ensemble@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Subject: [guo-xuan/Sipros-Ensemble] Protein FDR calculation (#1)
Hi,
I am a bit confused by the values calculated for the final protein report. I looks like this with my data for example:
* Numbers of proteins before filtering
Decoy_Proteins_Before_Filtering = 241
Target_Proteins_Before_Filtering = 37016
* Numbers of proteins after filtering
Decoy_Proteins_After_Filtering = 60
Target_Proteins_After_Filtering = 12462
* Protein FDR = Decoy_Proteins_After_Filtering / Target_Proteins_After_Filtering
Protein_FDR = 0.96%
The ~12500 proteins with the 60 decoys are reported afterwards. But how does it end up with 0.96% decoy FDR? If it only found 241 decoys with almost 40k proteins before filtering it was already way below 1% or am I missing something?
Also are re you still working on it?
Love Sipros Ensemble and the results so far!
Cheers
Oskar
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#1?email_source=notifications&email_token=ADNGYADUU47CV4XRSUW6X7DP54GCRA5CNFSM4H6JN3AKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G5QL4KA>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ADNGYACHQMF7R4FFAZ6XUJ3P54GCRANCNFSM4H6JN3AA>.
|
Hi Xuan, got it. Thanks! Oskar |
Hi Oskar,
I am a little confused. Do you want 1% FDR at protein level or peptide level?
Xuan
…________________________________
From: 0ssH <notifications@github.com>
Sent: Wednesday, July 31, 2019 2:29 AM
To: guo-xuan/Sipros-Ensemble <Sipros-Ensemble@noreply.github.com>
Cc: Guo,Xuan <xuan_guo@outlook.com>; Comment <comment@noreply.github.com>
Subject: Re: [guo-xuan/Sipros-Ensemble] Protein FDR calculation (#1)
Hi Xuan,
got it. Thanks!
Do you plan on implementing protein level FDR filtering? I think I read something about it in the readme or the publication. I tried it by setting the FDR_Filtering = Protein in the config file but it does still seem to Filter on 1% peptide FDR.
I would like to do that, because I tend to get a protein level FDR of above 1% when filtering on at least 1 or more unique peptides. The effect is especially strong when searching large databases (e.g. the one I tried contained about 18*10^6 target sequences).
Thanks for your time!
Oskar
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#1?email_source=notifications&email_token=ADNGYABKNTIV3HTKXKFGXFLQCE5PPA5CNFSM4H6JN3AKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3GLOSI#issuecomment-516732745>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ADNGYAEZAPHJH3DNIK5JODLQCE5PPANCNFSM4H6JN3AA>.
|
Hi Xuan, sorry about that. I would like to filter on protein level. |
Hi Oskar,
Sorry for the late reply. I am hell busy these days.
I don't have a publicly available protein FDR control script. If 1% protein FDR is designed, what I would do is to try a set of peptide FDRs to see which one gives the exact 1% protein FDR or the closest. I have a python script for this purpose, but it is not user-friendly. I attached that script in this email anyway. Note that the comments in this python script may not be helpful.
I may be able to upgrade this script, but I don't know when I have time to do that.
Bests,
Xuan
…________________________________
From: Oskar Hickl <notifications@github.com>
Sent: Tuesday, August 13, 2019 4:28 AM
To: guo-xuan/Sipros-Ensemble <Sipros-Ensemble@noreply.github.com>
Cc: Guo,Xuan <xuan_guo@outlook.com>; Comment <comment@noreply.github.com>
Subject: Re: [guo-xuan/Sipros-Ensemble] Protein FDR calculation (#1)
Hi Xuan,
sorry about that. I would like to filter on protein level.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#1?email_source=notifications&email_token=ADNGYACZORRPL2KJDLAH473QEJ5FBA5CNFSM4H6JN3AKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4FDCXY#issuecomment-520761695>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ADNGYAEOM6WETOMB3XVVSYTQEJ5FBANCNFSM4H6JN3AA>.
|
Hey Xuan, sorry for the late reply. I am still interested in your python script. Could you send it to me at oskar.hickl@uni.lu? Your last reply went to github and there was no file attached. Cheers |
Hi,
I am a bit confused by the values calculated for the final protein report. I looks like this with my data for example:
Numbers of proteins before filtering
Decoy_Proteins_Before_Filtering = 241
Target_Proteins_Before_Filtering = 37016
Numbers of proteins after filtering
Decoy_Proteins_After_Filtering = 60
Target_Proteins_After_Filtering = 12462
Protein FDR = Decoy_Proteins_After_Filtering / Target_Proteins_After_Filtering
Protein_FDR = 0.96%
The ~12500 proteins with the 60 decoys are reported afterwards. But how does it end up with 0.96% decoy FDR? If it only found 241 decoys with almost 40k proteins before filtering it was already way below 1% or am I missing something?
Alo it get the following error trying to produce a pepXML file:
python2.7 /opt/sipros/Scripts/sipros_psm_tabulating.py -i /scratch/maxquant/OH/Sipros/method_test/markert_strap_brp_01/output -o /scratch/maxquant/OH/Sipros/method_test/markert_strap_brp_01/output -c /scratch/maxquant/OH/Sipros/method_test/markert_strap_brp_01/20190703_method_test.cfg -x
[Fri Jul 5 11:11:30 2019] Beginning Sipros Ensemble Tabulating (1.0.1 (Alpha))
[Step 1] Parse options and get config file: Running -> Done!
[Step 2] Generate PSM table: Running -> Done!
[Step 3] Merge Protein list: Running -> Done!
[Step 4] Generate Pepxml: Running -> Traceback (most recent call last):
File "/opt/sipros/Scripts/sipros_psm_tabulating.py", line 662, in <module> sys.exit(main())
File "/opt/sipros/Scripts/sipros_psm_tabulating.py", line 647, in main writePepxml(base_out + '.tab', config_dict, modification_dict, element_modification_list_dict, output_folder)
File "/opt/sipros/Scripts/sipros_psm_tabulating.py", line 406, in writePepxml psm_obj.score_process()
File "/opt/sipros/Scripts/sipros_psm_tabulating.py", line 348, in score_process diff = (pep.scorelist[idx1]/l1[0].scorelist[idx1]) - 1
ZeroDivisionError: float division by zero
Also are re you still actively working on Sipros Ensemble?
Love Sipros Ensemble and the results so far!
Cheers
Oskar
The text was updated successfully, but these errors were encountered: