Olssen Web Service
Our server provides a RESTful-like API to perform on-line spectral search for proteomics spectral data. It is based on the SpectraST algorithm for spectral search and uses PRIDE Cluster spectral libraries.
The server is buit for scalability and performance working with big datasets. It uses Flask on top of CherryPy's server and performs its spectral searches using an engine based on Apache Spark clusters. The server has a very simple deployment cycle (see next).
server/server.py starts a CherryPy server running a
app.py to start a RESTful
web server wrapping a Spark-based
engine.py context. Through its API we can
perform on-line spectral search for proteomics data.
In order to have the server working properly, the PRIDE Cluster libraries must be downloaded and processed into the right folders. See Getting the Libraries.
Once you have the libraries ready, run the server using:
Or have a look at the provided
start_server.sh script as a guide.
After loading the Spark context and the spectral search library, the server
will be ready to be queried at the following end points, (speaking JSON
GET /stats: returns statistics about the spectral libraries that have been loaded, including its name and peptide counts.
POST /search: spectral search for a given peak list as an file of (mz, intensity) pairs (MGF file format).
Where the file
query.mgf contains a list of peaks to search for.
Getting the libraries
In order to have the server working properly, we must have the PRIDE Cluster
spectrum libraries available for the Spark cluster. Each of the libraries has
to be downloaded using
tools/download_and_split_lib.py and processed using
tools/create_lib_file.py. The final location of the libraries is hardcoded in the
server in the current version. So, from the folder where the server is started:
How to use the two Python scripts is described below.
This Python script downloads a PRIDE Cluster library and split it into a local folder. Example of use:
python download_and_split_lib.py ftp://ftp.pride.ebi.ac.uk/pride/data/cluster/spectrum-libraries/1.0.1/Contaminants.msp.gz ./contaminants
This Spark/Python script loads a split folder (created with the
download_and_split_lib.py tool) into an RDD and persist it as a
pickle file for later use by the server. Use it through the provided shell script, for example:
./create_lib_file.sh ../spectrumlibs/contaminants ../spectrumlibs/contaminants/lib.file