Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


A Deep Learning Framework for sequence-based Protein Crystallization Prediction



Protein structure determination has primarily been performed using X-ray crystallography. To overcome the expensive cost, high attrition rate and series of trial-and-error settings, many in-silico methods have been developed to predict crystallization propensities of proteins based on their sequences. However, majority of these methods build their predictors by extracting features from protein sequences which is computationally expensive and can explode the feature space.

We propose, DeepCrystal, a deep learning framework for sequence-based protein crystallization prediction. It uses deep learning to identify proteins which can produce diffraction quality crystals without the need to manually engineer additional biochemical and structural features from sequences. Our model is based on Convolutional Neural Networks (CNNs) which can exploit k-mer structure and interaction among sets of k-mers from the raw protein sequences.


Our model surpasses previous sequence-based protein crystallization predictors in terms of recall, F-score, accuracy and MCC on three independent test sets. DeepCrystal achieves an average improvement of 1.4%, 12.1% in recall, when compared to its closest competitors, Crysalis II and Crysf respectively. In addition, DeepCrystal attains an average improvement of 2.1%, 6.0% for F-score, 1.9%, 3.9% for accuracy and 3.8%, 7.0% for MCC w.r.t. Crysalis II and Crysf on independent test sets

web-server is also available at



Be sure the following tools are installed on your machine:

  • wget, git, unzip

Setting up environment using conda

Install Anaconda

1- Get anaconda (64 bit)installer python3.x for linux :
2- Run the installer : bash, and follow the instructions to install anaconda at your preferred directory.

Creating deepCystal environment

Run the following commands:

In order to test DeepCrystal on a fasta file, you need to run it while you are inside deepCrystal environment.

To deactivate deepCrystal environment run the following command:

  • source deactivate deepCrystal

Run DeepCrystal on a New Test File (Fasta file)

1- Protein sequences have to be saved in a fasta format similar to following format:


where '>Seq1' represents the fasta id and the second line is the protein sequence.

2- Download the model files ( all files *.hdf5 and files *.json) by running the following command:


3- Run the following two commands after downloading the model files:

  • unzip download
  • rm download

4- To test your protein sequences using run the following command:

$ python <file.fasta>

5- The output will be generated in the current working directory. The name of the output file is prediction_results.csv.

Sequence ID Prediction
Seq1 0.7230646491

6- If you run on test.fasta that's uploaded on this github, you can compare the results with the Expected_Prediction_Result.csv that's also uploaded on this github.

7- When you run, you will see some warnings which will not affect your results. Examples of these warnings are in expected_warnings.txt

To Train a Model (Optional)

  • following the same steps as in the section "Creating deepCystal environment" , you can train your own data using

  • and the fasta file have to be in the same directory .

  • Example of how to train the model on your own data, run the following command:
    $ python <file.fasta>

A simple example on how the fasta file should look like:
.>Seq1 Crystallizable

.>Seq2 Non Crystallizable

This file contains the architecture of DeepCrystal model.


DeepCrystal: A Deep Learning Framework for sequence-based Protein Crystallization Prediction



No releases published


No packages published