-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training on different biological contexts #45
Training on different biological contexts #45
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice PR - LGTM. A couple of minor comments
26-describe_recount2.Rmd
Outdated
tissue.accessions <- ConvertToRecountSampleName(tissue.samples, | ||
conversion.df) | ||
tissue.file <- file.path("data", "sample_info", | ||
"recount2_tissue_accessions.tsv") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation a bit off
26-describe_recount2.Rmd
Outdated
ConvertToRecountSampleName(cancer.samples, | ||
conversion.df) | ||
cancer.file <- file.path("data", "sample_info", | ||
"recount2_cancer_accessions.tsv") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation here too
26-describe_recount2.Rmd
Outdated
blood.samples <- names(blood.samples[which(unlist(blood.samples))]) | ||
blood.accessions <- ConvertToRecountSampleName(blood.samples, conversion.df) | ||
blood.file <- file.path("data", "sample_info", | ||
"recount2_blood_accessions.tsv") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation
help = "Number of repeats to perform"), | ||
make_option(c("-s", "--seed"), type = "integer", default = 123, | ||
help = "Number of repeats to perform"), | ||
make_option(c("-u", "--use_sample_list"), type = "logical", default = FALSE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for a logical option, can use argument action='store_true'
.
This way, when its called here all that is needed is --use_sample_list
instead of --use_sample_list TRUE
scripts/subsampling_PLIER.R
Outdated
smpl.exprs <- prepped.data[[1]][, sample.index] | ||
|
||
plier.results <- PLIERWrapper(exprs = smpl.exprs, | ||
pathway.mat = prepped.data[[2]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation
Can confirm that changes introduced with dd93688 gave me the same md5 checksum for one of the models. |
Related to: #39
We want to train models on different biological contexts and see what pathways are recovered. Here, I modify
26-describe_recount2
such that we can identify samples that are predicted to be from specific contexts in MetaSRA and convert from the identifiers used by MetaSRA (e.g.,SRSxxxxx
) to the sample/column names used in recount2 (e.g.,SRPxxxxx.SRRxxxxx
).Contexts are:
I've added
scripts/subsampling_PLIER.R
which allows us to do the subsampling experiments two ways:n
number of samples; this is repeatedr
times (default is 5) using different random seeds (related PR forthcoming)Finally, I'm adding
28-train_different_biological_contexts.sh
-- the shell script for training all the models.