Skip to content

Training DeepNovo

Samuel E. Miller edited this page Aug 16, 2019 · 6 revisions

DeepNovo model training can be performed via the Postnovo train_deepnovo subcommand. DeepNovo is automatically trained multiple times with different fragment mass tolerance parameterizations. Postnovo can distribute these processes via Slurm on a compute cluster. train_deepnovo requires that a directory for each fragment mass tolerance (e.g., postnovo/DeepNovo/DeepNovo.low.0.2/) already exists after downloading the default DeepNovo model with the Postnovo setup command. train_deepnovo will automatically overwrite key model files, so copy these files elsewhere as needed from the train directories for the resolution under consideration (e.g., for low-resolution models, postnovo/DeepNovo/DeepNovo.low.0.2/train/, postnovo/DeepNovo/DeepNovo.low.0.3/train/, ...).

A properly formatted MGF file of the DeepNovo training spectra must be created (see Wiki: MGF Input File Formatting: 2.d., 2.e.). Training takes ~12 hours with ~100,000 low-res spectra or ~50,000 high-res spectra, distributing the training tasks for each fragment mass tolerance (six tasks for low-res and five for high-res) to nodes of 12 CPUs.

It can take a few minutes for the DeepNovo processes to start at each fragment mass tolerance parameterization. The Postnovo command can be run in the background (with " &" at the end of the command), and upon logging out of the server, this will not cause the spawned DeepNovo processes to exit.

  1. Here is an example of training on a single machine, with tasks for the different fragment mass tolerances performed in sequence.

    python main.py train_deepnovo --container /path/to/tensorflow.simg --mgf /path/to/spectra.mgf --frag_resolution low --cpus 32

  2. Here is an example of training at two of the six low-resolution parameterizations. Training at specific parameterizations can be useful if some of the jobs for the whole set of parameterizations failed, say, by running out of memory.

    python main.py train_deepnovo --container /path/to/tensorflow.simg --mgf /path/to/training_spectra.mgf --frag_resolution low --frag_mass_tols 0.2 0.6 --cpus 32

  3. Here is an example of training on a compute cluster via Slurm. The jobs run on nodes (of the partition, "bigmem"), each job using 16 CPUs and a maximum of 48 GB of memory for up to 36 hours.

    python main.py train_deepnovo --container /path/to/tensorflow.simg --mgf /path/to/training_spectra.mgf --frag_resolution high --slurm --partition bigmem --cpu 16 --mem 48 --time_limit 36

  4. The memory requirements of DeepNovo training vary significantly by parameterization, with 0.01 Da requiring ~48 GB, 0.03 and 0.05 Da requiring ~32 GB, and >0.05 Da requiring ~16 GB. Accordingly, it may be convenient to split training between multiple nodes with different memory limits.

    python main.py train_deepnovo --container /path/to/tensorflow.simg --mgf /path/to/training_spectra.mgf --frag_resolution high --frag_mass_tols 0.01 --slurm --partition bigmem --cpu 16 --mem 48 --time_limit 36

    python main.py train_deepnovo --container /path/to/tensorflow.simg --mgf /path/to/training_spectra.mgf --frag_resolution high --frag_mass_tols 0.03 0.05 --slurm --partition medmem --cpu 16 --mem 32 --time_limit 36

    python main.py train_deepnovo --container /path/to/tensorflow.simg --mgf /path/to/training_spectra.mgf --frag_resolution high --frag_mass_tols 0.1 0.5 --slurm --partition smallmem --cpu 16 --mem 16 --time_limit 36