-
Notifications
You must be signed in to change notification settings - Fork 57
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
43 changed files
with
1,593 additions
and
200 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
53 changes: 53 additions & 0 deletions
53
downloads-generation/models_class1_allele_specific_ensemble/GENERATE.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
#!/bin/bash | ||
|
||
if [[ $# -eq 0 ]] ; then | ||
echo 'WARNING: This script is intended to be called with additional arguments to pass to mhcflurry-class1-allele-specific-cv-and-train' | ||
echo 'See README.md' | ||
fi | ||
|
||
set -e | ||
set -x | ||
|
||
DOWNLOAD_NAME=models_class1_allele_specific_ensemble | ||
SCRATCH_DIR=/tmp/mhcflurry-downloads-generation | ||
SCRIPT_ABSOLUTE_PATH="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)/$(basename "${BASH_SOURCE[0]}")" | ||
SCRIPT_DIR=$(dirname "$SCRIPT_ABSOLUTE_PATH") | ||
export PYTHONUNBUFFERED=1 | ||
|
||
mkdir -p "$SCRATCH_DIR" | ||
rm -rf "$SCRATCH_DIR/$DOWNLOAD_NAME" | ||
mkdir "$SCRATCH_DIR/$DOWNLOAD_NAME" | ||
|
||
# Send stdout and stderr to a logfile included with the archive. | ||
exec > >(tee -ia "$SCRATCH_DIR/$DOWNLOAD_NAME/LOG.txt") | ||
exec 2> >(tee -ia "$SCRATCH_DIR/$DOWNLOAD_NAME/LOG.txt" >&2) | ||
|
||
# Log some environment info | ||
date | ||
pip freeze | ||
git rev-parse HEAD | ||
git status | ||
|
||
cd $SCRATCH_DIR/$DOWNLOAD_NAME | ||
|
||
mkdir models | ||
|
||
cp $SCRIPT_DIR/models.py . | ||
python models.py > models.json | ||
|
||
time mhcflurry-class1-allele-specific-ensemble-train \ | ||
--ensemble-size 16 \ | ||
--model-architectures models.json \ | ||
--train-data "$(mhcflurry-downloads path data_combined_iedb_kim2014)/combined_human_class1_dataset.csv" \ | ||
--min-samples-per-allele 20 \ | ||
--out-manifest selected_models.csv \ | ||
--out-model-selection-manifest all_models.csv \ | ||
--out-models models \ | ||
--verbose \ | ||
"$@" | ||
|
||
bzip2 all_models.csv | ||
cp $SCRIPT_ABSOLUTE_PATH . | ||
tar -cjf "../${DOWNLOAD_NAME}.tar.bz2" * | ||
|
||
echo "Created archive: $SCRATCH_DIR/$DOWNLOAD_NAME.tar.bz2" |
29 changes: 29 additions & 0 deletions
29
downloads-generation/models_class1_allele_specific_ensemble/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# Class I allele-specific models (ensemble) | ||
|
||
This download contains trained MHC Class I allele-specific MHCflurry models. For each allele, an ensemble of predictors is trained on random halves of the training data. Model architectures are selected based on performance on the other half of the dataset, so in general each ensemble contains predictors of different architectures. At prediction time the geometric mean IC50 is taken over the trained models. The training data used is in the [data_combined_iedb_kim2014](../data_combined_iedb_kim2014) MHCflurry download. | ||
|
||
The training script supports multi-node parallel execution using the [kubeface](https://github.com/hammerlab/kubeface) library. | ||
|
||
To use kubeface, you should make a google storage bucket and pass it below with the --storage-prefix argument. | ||
|
||
To generate this download we run: | ||
|
||
``` | ||
./GENERATE.sh \ | ||
--parallel-backend kubeface \ | ||
--target-tasks 200 \ | ||
--kubeface-backend kubernetes \ | ||
--kubeface-storage gs://kubeface-tim \ | ||
--kubeface-worker-image hammerlab/mhcflurry-misc:latest \ | ||
--kubeface-kubernetes-task-resources-memory-mb 10000 \ | ||
--kubeface-worker-path-prefix venv-py3/bin \ | ||
--kubeface-max-simultaneous-tasks 200 \ | ||
--kubeface-speculation-max-reruns 3 \ | ||
``` | ||
|
||
To debug locally: | ||
``` | ||
./GENERATE.sh \ | ||
--parallel-backend local-threads \ | ||
--target-tasks 1 | ||
``` |
24 changes: 24 additions & 0 deletions
24
downloads-generation/models_class1_allele_specific_ensemble/models.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
import sys | ||
from mhcflurry.class1_allele_specific_ensemble import HYPERPARAMETER_DEFAULTS | ||
import json | ||
|
||
models = HYPERPARAMETER_DEFAULTS.models_grid( | ||
impute=[False, True], | ||
activation=["tanh"], | ||
layer_sizes=[[12], [64], [128]], | ||
embedding_output_dim=[8, 32, 64], | ||
dropout_probability=[0, .1, .25], | ||
fraction_negative=[0, .1, .2], | ||
n_training_epochs=[250], | ||
|
||
# Imputation arguments | ||
impute_method=["mice"], | ||
imputer_args=[ | ||
# Arguments specific to imputation method (mice) | ||
{"n_burn_in": 5, "n_imputations": 50, "n_nearest_columns": 25} | ||
], | ||
impute_min_observations_per_peptide=[3], | ||
impute_min_observations_per_allele=[3]) | ||
|
||
sys.stderr.write("Models: %d\n" % len(models)) | ||
print(json.dumps(models, indent=4)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.