Best way to represent the protein data #20

jorainer · 2016-10-05T08:54:53Z

Extracting the protein annotations in form of a data.frame and DataFrame is straight forward, the question however is what type of object could best represent the protein annotation.

The object should be something similar to a GRanges, eventually the Proteins class from the Pbase (https://github.com/ComputationalProteomicsUnit/Pbase) package?

I've got:

(Ensembl) protein ID with sequence.
1:n mapping of protein ID to Uniprot ID.
n:m mapping between protein ID and protein domain ID, which provides in addition the position of the protein domain within the protein sequence.

@lgatto any suggestions/preferences here?

The text was updated successfully, but these errors were encountered:

lgatto · 2016-10-07T19:53:25Z

I would have thought Proteins would have been the best choice, but given the circular dependency, this might not be an option. But maybe something more low-level will suffice, and we can then make use of it.

Do you want a single data structure for all the points above?

What do you mean by protein domain ID? Functional domains, or transcript exons start/end sites?

jorainer · 2016-10-07T20:16:35Z

regarding the protein domain ID: from Ensembl I get for each protein coding transcript its translation, which is in fact a protein sequence (AA) along with its ID (the Ensembl protein_id). For each protein_id I can then fetch the Uniprot ID (which can be none, 1 or more) and I fetch all protein domains from the various sources (Pfam, prosite, Smart); these have then start and end coordinates on the AA-sequence of the protein. That's why I thought Proteins might be a good data structure, as it allows to add to each AA-sequence also features on this sequence.

In a first version I will return protein results from the database as a AAStringSet with all additional annotations in the mcols.

lgatto · 2016-10-07T20:52:12Z

Fantastic! Let me know when this becomes available and I will update Pbase to make use of it.

jorainer mentioned this issue Oct 6, 2016

Proteins object good to represent protein annotation? lgatto/Pbase#26

Closed

jorainer closed this as completed Nov 11, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best way to represent the protein data #20

Best way to represent the protein data #20

jorainer commented Oct 5, 2016

lgatto commented Oct 7, 2016

jorainer commented Oct 7, 2016

lgatto commented Oct 7, 2016

Best way to represent the protein data #20

Best way to represent the protein data #20

Comments

jorainer commented Oct 5, 2016

lgatto commented Oct 7, 2016

jorainer commented Oct 7, 2016

lgatto commented Oct 7, 2016