Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peptides table: "accession" value if peptide assigned to multiple proteins #5

Closed
GoogleCodeExporter opened this issue Aug 6, 2015 · 3 comments

Comments

@GoogleCodeExporter
Copy link

In the current specification it's stated (page 26): "The protein's accession 
the peptide is associated with. In case no protein section is present in the 
file or the peptide was not assigned to a protein the field should be filled 
with “NA”."

It's not clear from this description how peptides shared by several proteins 
should be treated? Should it be NA (but then "unique" column doesn't make sense 
since it's true iff the accession is not NA), or should it be comma-separated 
list of the protein accession codes (in this case "unique" column also looks 
redundant, maybe it could be replaced by the column specifying the number of 
protein peptide could be assigned to, "num_proteins_shared")?

Original issue reported on code.google.com by astuka...@gmail.com on 30 Nov 2012 at 3:04

@GoogleCodeExporter
Copy link
Author

Only one main protein accession needs to be provided. The others can be members 
of the ambiguity_members. This was done in a very generic way for the sake of 
simplicity. This is the definition of "ambiguity_members":

A comma-delimited list of protein accessions. This field should be set in the
representative protein of the ambiguity group (the protein identified through 
the
accession in the first column). The accessions listed in this field should 
identify
proteins that could also be identified through these peptides but were not
chosen by the researcher or resource. The members of the ambiguity group
are not reported in the protein table for the respective unit. The exact
semantics of how the ambiguity members were defined depends on the
resource.

The only way to report all protein accessions the peptide maps to with the same 
hierarchy is replicating the same peptide element in different rows.

Original comment by javizca74@gmail.com on 30 Nov 2012 at 4:42

@GoogleCodeExporter
Copy link
Author

Thanks for the clarification!

The "ambiguity_members" column addresses slightly different problem. There 
could be peptides shared by the unambiguously identified proteins.

Of course, it's possible to duplicate the peptide information per each protein, 
but that would increase the size of the file and there is a chance (or, at 
least, confusion) that quantitative information would differ between the rows 
describing the same peptide. BTW, does the specification impose somewhere the 
uniqueness constraint on peptides table (i.e. specify "compound unique key")?

Original comment by astuka...@gmail.com on 30 Nov 2012 at 5:03

@GoogleCodeExporter
Copy link
Author

Yes, one entry in the peptide table ("one peptide") must only be assigned to 
one protein. The "accession" column must only contain one single protein 
accession. So the relation peptide->protein is unique. Of course, one protein 
can have multiple peptides with the exact same sequence (if identified from 
different spectra for example).

BTW, there is no unique key defined for the peptide table. 

Original comment by javizca74@gmail.com on 2 Dec 2012 at 9:03

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants