An R package to obtain data from the EMBL-EBI Proteomics Repository Identifications Database (PRIDE Archive and PRIDE Cluster). It uses its RESTful Web Services at PRIDE Archive WS and PRIDE Cluster WS for that purpose.
Currently, the following domain entities are supported:
- Projects as S4 objects, including methods to get them from PRIDE by accession and
as.data.frame
- Assays as S4 objects, including methods to get them from PRIDE by accession and
as.data.frame
- Files as S4 objects, including methods to get them from PRIDE by project and assay accession and
as.data.frame
- Protein identifications associated with a project, as S4 objects, including methods to get them from PRIDE by project accession and
as.data.frame
- PSM identifications associated with a project, as S4 objects, including methods to get them from PRIDE by project accession and
as.data.frame
- PRIDE Cluster ClusterSummary, as S4 objects and as
as.data.frame
.
First, we need to install devtools
:
install.packages("devtools")
library(devtools)
Then we just call
install_github("PRIDE-R/prideR")
library(prideR)
Get project PXD000001
summary:
get.ProjectSummary("PXD000001")
Search for at most 20 projects by term blood
. The results are returned as a list
of ProjectSummary
objects:
search.list.ProjectSummary("blood",0,20)
Get the list of results from it:
project.list(search.list.ProjectSummary("blood",0,20))
Get them as a data.frame
:
as.data.frame(search.list.ProjectSummary("blood",0,20))
Get the first 50 Proteins for project PXD000001
as a list of ProteinDetail
objects:
protein.list(list.ProteinDetailList("PXD000001", 0, 50))
Or as a data.frame
:
as.data.frame(list.ProteinDetailList("PXD000001",0, 50))
Plot some counts:
plot(list.ProteinDetailList("PXD000001",0, 50))
Get 5 PSMs for project PXD000001
as a list of PsmDetail
objects:
get.list.PsmDetail("PXD000001", 5)
There are also count methods for each of the PRIDE Archive entitites.
Get page 0 with a size of 20 clusters for peptide sequence LSVDYGK:
search.ClusterSearchResults("LSVDYGK", 0, 20)
As a data frame:
as.data.frame(search.ClusterSearchResults("LSVDYGK", 0, 20))
Plot results:
plot(search.ClusterSearchResults("LSVDYGK", 0, 20))
Some things to be done, sooner than later:
- Check mandatory parameters
- Deal with
SpectrumDetail
entities when available
Find out about us in our GitHub profiles:
Jose A. Dianes
Rui Wang
- Vizcaíno, J. A., Côté, R. G., Csordas, A., Dianes, J. A., Fabregat, A., Foster, J. M., ... & Hermjakob, H. (2013). The PRoteomics IDEntifications (PRIDE) database and associated tools: status in 2013. Nucleic acids research, 41(D1), D1063-D1069. HERE