Predict mutation status #51

envest · 2021-08-18T20:43:35Z

I'm putting some thoughts down here as a working document and starting point for synchronous/asynchronous discussion 😺

Currently, this project does subtype classification as the supervised learning task. We want to expand to include prediction of mutation status in three genes: PIK3CA, PTEN, and TP53.

I see two main avenues to get this done.

Copy existing scripts into new scripts, which we edit to fit the mutation paradigm
Modify existing scripts (and structure of clinical data inputs) to make scripts work for subtype or mutation prediction.

I think option 1 might be easier, but 2 is better. 2 is better because any changes that apply to all supervised learning tasks would only need to be made in one place. 2 may even be easier.

Steps to accomplish modification of existing scripts (option 2):

Combine clinical info (with subtype) and mutation data such that each sample has one row with all their information. This way, one clinical file is read in for all prediction tasks and then the relevant column can be selected.
- Mutations are currently encoded as 0/1, but could be "Has Mutation"/"No Mutation", or "TP53 Mutation"/"No Mutation"
Add option to scripts for what is being predicted ("subtype" or gene name corresponding to column of input clinical data)
- Use check_options() functions to make sure the given option is correct
- Script option would apply to steps 0, 1, 2, 3
Each script creates output read by the next script. Use the prediction task ("subtype" or gene name) in the output file so the next script knows what to read as input. Alternatively, create subdirectories for the task and keep file names the same.
In general, replace "subtype" with "category" in variable names, etc. to make it clear the prediction task is not limited to subtype prediction

⚠️ The overlap of samples present in MC3 and having gene expression data is not complete (i.e. there are a few samples with gene expression but no mutation calls whatsoever in MC3). Theoretically, a sample could have been analyzed by MC3 but had 0 mutations to report and thus is not present in the MAF. However, I might regard such cases as highly suspect especially in BRCA and GBM. So for mutation prediction, we need to reduce our set of samples to only those with actual mutation calls in MC3 before splitting into testing/training (must have 0 or 1 mutation status, not NA). This was not a problem previously because all samples in our data have a subtype associated with them.

Beyond this, I want to know

What are the known and observed associations between subtypes and these mutations
What are the pathway genes up or down regulated in mutated samples and is this picked up by the model
For samples seemingly misclassified (especially no mutation predicted to have mutation) what other aberrations might be causing a "mutation-like" gene expression profile.

The text was updated successfully, but these errors were encountered:

jaclyn-taroni · 2021-08-19T12:46:02Z

I agree that option 2 is better. To me, it seems like that will be easier long term such that the additional time cost up front may be worth it, but let's discuss this afternoon. Specifically, I'm interested in how much more work you think option 2 would be over option 1 to see if our perceptions are the same.

envest · 2021-08-19T18:45:48Z

Based on virtual meeting, we discussed

Option 2 is preferred
Okay to drop samples with no observed mutations in MC3
Drop PTEN due to potential signaling from copy number change (1/3 have deep deletion of PTEN in GBM, 1/2 in BRCA)
Follow up on associations between subtype and mutation. If there are associations, include subtype as a model covariate.
Pathway questions go beyond scope of this paper. Try to avoid misclassification problem by dropping PTEN.
For Point 2 (pathway stuff) look at TCGA DNA damage repair paper wrt TP53 pathway genes

envest mentioned this issue Oct 1, 2021

Envest/calculate and plot delta kappa #72

Merged

envest closed this as completed Oct 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predict mutation status #51

Predict mutation status #51

envest commented Aug 18, 2021

jaclyn-taroni commented Aug 19, 2021

envest commented Aug 19, 2021

Predict mutation status #51

Predict mutation status #51

Comments

envest commented Aug 18, 2021

jaclyn-taroni commented Aug 19, 2021

envest commented Aug 19, 2021