-
Notifications
You must be signed in to change notification settings - Fork 0
Picker: DeepPicker
DeepPicker is available as a GitHub repo. We used Python 3.6 for this installation.
The DeepPicker paper can be found here. Refer also to the README for more information. The following guide is drawn in part from these resources.
We have tested DeepPicker successfully on RHEL (Red Hat Enterprise Linux) 7.7. According to the maintainers,
it only supports Ubuntu 12.0+, centOS 7.0+, and RHEL 7.0+
First, clone the DeepPicker repo into pickers/deeppicker/
using
git clone https://github.com/nejyeah/DeepPicker-python.git pickers/deeppicker
In some cases, we have found that DeepPicker may not run properly as-is. If you would like to apply our patches to DeepPicker, run the script included in cryo-docs/patches/deeppicker
. This script will replace several DeepPicker source files with our patched versions (keeping the others as cloned in the previous step).
sh patches/deeppicker/patch-deeppicker.sh pickers/deeppicker/
Create and activate a new conda environment using Python version 3.6 (since tensorflow
versions 1.12.0 and earlier do not support Python 3.7).
conda create -n deeppicker matplotlib scipy==1.2.1 python=3.6
conda activate deeppicker
DeepPicker requires the tensorflow
machine learning package, which in turn requires cudatoolkit
and cudnn
in order to support CUDA-compatible GPUs (if your system already has global installations of CUDA and cuDNN, feel free to skip the following discussion). These can be installed manually, but it is important to use the correct version of each to avoid errors. See this compatibility chart for more information. The DeepPicker GitHub indicates that cudatoolkit
7.5 and cudnn
4 should be used, but these versions are quite old and may not be compatible with modern GPU hardware (i.e., CUDA 7.5 does not support Pascal GPUs or later).
To install the latest versions of tensorflow-gpu
and all its compatible dependencies (including cudatoolkit
and cudnn
), use
conda install tensorflow-gpu
or, to specify a tensorflow-gpu
version (which in turn will pull the correct versions of cudatoolkit
and cudnn
), use
conda install tensorflow-gpu=#.##.##
Note that conda search tensorflow-gpu
can be used to see which versions of tensorflow-gpu
are available with conda. If the version you would like to install is not available in your conda channels, you can either specify a different channel or install everything manually with pip
—see release history for tensorflow-gpu
.
DeepPicker is a trainable particle picker, that when presented coordinate files of particles along with the micrographs of origin, can produce new .h5
files that DeepPicker can then operate by to make refined picks.
Thus, the desired application is important:
- To pick by the general model that DeepPicker comes with, skip to the Pick using pretrained model section.
- To train a new model and then pick with that model, start by Training a new model, then Pick using pretrained model (but substitute the
pre_trained_model
parameter with the name of the newly created model, which will be a.h5
file).
.h5
file refers to a HDF5 format, which is a binary data format container for large arrays of data; this is an easy format for the NumPy module in python - which DeepPicker uses - to decipher. It works in DeepPicker's case to store particle information to identify.
DeepPicker has requires correctly formatted .star
files as input. Our coord_converter.py
script, which converts .box
ground truth coordinate files from EMPIAR into readable .star
files, may be of help. The .star
files have to be in the same folder as the micrographs they correspond to. They also have to have the same name as the corresponding .mrc
file, with an optional suffix (e.g., the micrograph Falcon_2012_06_12-15_27_22_0.mrc
would correspond with the coordinate file Falcon_2012_06_12-15_27_22_0_cnnPick.star
).
The following command will create a new model. Parameters outlined below.
Note: Command is directory-specific for location of train.py
and the inputDir; running the command below necessitates being in the same directory as train.py
, otherwise specify the folder (folder location is specified below for from cryodocs/).
python pickers/deeppicker/train.py --train_type 1 --train_inputDir "input_dir" --particle_size 160 --mrc_number -1 --particle_number -1 --coordinate_symbol 'some_string' --model_save_dir 'output_dir' --model_save_file 'output_model_name'
Descriptions of each flag are as follows:
-
--train_inputDir
: input directory of.star
and corresponding.mrc
files -
--train_type
: options are 1, 2, 3, or 4. 1 is recommended, for specimen-specific new models; 2 for multiple molecules, 3 for iterative training. -
--mrc_number
: number of.mrc
files to pick from the directory specified; default=-1 refers to all -
--particle_size
: the size of the particle -
--coordinate_symbol
: suffix that identifies.star
file for each.mrc
file; refer [Specifics] -
--model_save_dir
: the directory to save the model.h5
file to -
--model_save_file
: the name of the model.h5
file
Start by collecting the micrograph files (*.mrc
) to be picked in a directory (assuming they are not already available in their own directory). If you would like to use an existing public data set, our guide to the EMPIAR database may be helpful.
mkdir -p name_of_data_set/mrc
mv path/to/your_mrc_files/*.mrc name_of_data_set/mrc
Here we will use the micrographs located in demo_data/
as an example. Create another directory, in which any output, temporary, or configuration files will be saved by the picker.
mkdir demo_data/deeppicker_out
Use the following command to pick all micrographs in demo_data/mrc/
. A description of parameters is given below.
Note: Command is again directory-specific for autoPick.py
; you need to be in the same directory as autoPick.py or specify the folder location (folder location is specified below for from cryodocs/).
python pickers/deeppicker/autoPick.py --inputDir 'demo_data/mrc/' --pre_trained_model 'pretrained_or_created_model' --particle_size 176 --mrc_number -1 --outputDir 'demo_data/deepicker_out' --coordinate_symbol '_dp' --threshold 0.5
Parameters
-
--inputDir
: input directory of.mrc
files -
--pre_trained_model
: the.h5
model file -
--mrc_number
: number of.mrc
files to pick from the directory specified; default=-1 refers to all -
--particle_size
: the size of the particle -
--outputDir
: output directory to save the coordinate.star
files -
--coordinate_symbol
: suffix to be appended to the filenames of the output coordinate files (e.g., the input micrographFalcon_2012_06_12-15_27_22_0.mrc
would correspond with an output coordinate file namedFalcon_2012_06_12-15_27_22_0_cnnPick.star
)