You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the current specification it's stated (page 26): "The protein's accession
the peptide is associated with. In case no protein section is present in the
file or the peptide was not assigned to a protein the field should be filled
with “NA”."
It's not clear from this description how peptides shared by several proteins
should be treated? Should it be NA (but then "unique" column doesn't make sense
since it's true iff the accession is not NA), or should it be comma-separated
list of the protein accession codes (in this case "unique" column also looks
redundant, maybe it could be replaced by the column specifying the number of
protein peptide could be assigned to, "num_proteins_shared")?
Original issue reported on code.google.com by astuka...@gmail.com on 30 Nov 2012 at 3:04
The text was updated successfully, but these errors were encountered:
Only one main protein accession needs to be provided. The others can be members
of the ambiguity_members. This was done in a very generic way for the sake of
simplicity. This is the definition of "ambiguity_members":
A comma-delimited list of protein accessions. This field should be set in the
representative protein of the ambiguity group (the protein identified through
the
accession in the first column). The accessions listed in this field should
identify
proteins that could also be identified through these peptides but were not
chosen by the researcher or resource. The members of the ambiguity group
are not reported in the protein table for the respective unit. The exact
semantics of how the ambiguity members were defined depends on the
resource.
The only way to report all protein accessions the peptide maps to with the same
hierarchy is replicating the same peptide element in different rows.
Original comment by javizca74@gmail.com on 30 Nov 2012 at 4:42
Thanks for the clarification!
The "ambiguity_members" column addresses slightly different problem. There
could be peptides shared by the unambiguously identified proteins.
Of course, it's possible to duplicate the peptide information per each protein,
but that would increase the size of the file and there is a chance (or, at
least, confusion) that quantitative information would differ between the rows
describing the same peptide. BTW, does the specification impose somewhere the
uniqueness constraint on peptides table (i.e. specify "compound unique key")?
Original comment by astuka...@gmail.com on 30 Nov 2012 at 5:03
Yes, one entry in the peptide table ("one peptide") must only be assigned to
one protein. The "accession" column must only contain one single protein
accession. So the relation peptide->protein is unique. Of course, one protein
can have multiple peptides with the exact same sequence (if identified from
different spectra for example).
BTW, there is no unique key defined for the peptide table.
Original comment by javizca74@gmail.com on 2 Dec 2012 at 9:03
Original issue reported on code.google.com by
astuka...@gmail.com
on 30 Nov 2012 at 3:04The text was updated successfully, but these errors were encountered: