Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isolated immune cell reconstruction evaluation #5

Merged
merged 6 commits into from
Apr 12, 2018

Conversation

jaclyn-taroni
Copy link
Collaborator

This PR adds 04-isolated_immune_cell_reconstruction.

In this notebook, I evaluate the reconstruction of the sorted leukocyte microarray dataset introduced in 03-isolated_cell_type_populations (E-MTAB-2452) as compared to a sorted leukocyte RNA-seq dataset that is included in the recount2 dataset (SRP045500), and, therefore, the training set for the PLIER model under consideration.

During #3, the idea of using only high-weight genes for reconstruction came up (see #4). I've chosen not explore this at this time because my goal was/is to test how the subset of latent variables used for reconstruction (all vs. only pathway-associated vs. only thoses LVs that are not significantly associated with any gene sets -- I assume these capture variation from technical factors), rather than to improve the reconstruction performance. I think exploration of improving reconstruction performance would probably require a deeper dive than makes sense for this particular project. Please let me know what you think.

Here's the notebook HTML file for easy viewing:
04-isolated_immune_cell_reconstruction.nb.html.zip

I've made a few changes upstream of this notebook in 02-recount2_PLIER_exploration to save the recount2 reconstructed expression data and associated evaluation metrics, as well.

Copy link

@gwaybio gwaybio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only minor comments

gs.file <- file.path("data", "expression_data",
"E-MTAB-2452_hugene11st_SCANfast_with_GeneSymbol.pcl")
exprs.df <-readr::read_tsv(gs.file)
exprs.mat <- as.matrix(exprs.df[, 3:ncol(exprs.df)])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why starting at column 3? Maybe add comment about what first 2 columns are

height = 11, width = 8.5)
```

### E-MTAB-2452 Boxplots
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The x axis tick labels seem to bleed a bit into one another. Is it possible to rename them? For example, when their is an n = in the label, I tend to put on a new line

ggplot2::ggsave(plot.file, plot = ggplot2::last_plot())
```

## Summary
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all datasets trained together? or are different models trained on each individually? Or, are the models trained using a single dataset and the other datasets are transformed into this space?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, are the models trained using a single dataset and the other datasets are transformed into this space?

This one -- a single PLIER model is trained on the recount2 dataset, which includes SRP045500. E-MTAB-2452 is transformed into the recount2 PLIER space.

I'm working with LVs from this recount2 PLIER model exclusively, but in some cases I'm using only LVs that are significantly associated with a pathway or only those LVs that are not associated with a pathway.

small doc change, relabel boxplot x axis ticks
Copy link

@huqiwen0313 huqiwen0313 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good ! only some minor comments

dplyr::mutate(MASE = as.numeric(as.character(MASE)),
`Spearman correlation` =
as.numeric(as.character(`Spearman correlation`)))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite understand why MASE need to do as.numeric(as.character(MASE)) conversion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When binding together the columns, MASE and the Spearman correlation end up as factors.

ggplot2::theme_bw() +
ggplot2::scale_fill_manual(values = c("white", "gray50", "black")) +
ggplot2::ggtitle(paste("All, n =", ncol(z.matrix)))
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the plot, recount2 means reconstructing the gene expression of recount samples based on recount PLIER model ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is correct.

"E-MTAB-2452_reconstruction_error_recount2_model.pdf")
ggplot2::ggsave(plot.file, plot = ggplot2::last_plot())
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, add statistics to show the distributions is significant different (t-test or ANNOVA) ? One benefit is it can provide a quantitative way to support the conclusion (e.g. the pre- and post-reconstruction correlation values are much more similar between two datasets), but it is depends on you since the difference is clear from the plot.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pairwise.t.test coming up in the next commit

@jaclyn-taroni jaclyn-taroni merged commit 384dcf6 into greenelab:master Apr 12, 2018
@jaclyn-taroni jaclyn-taroni deleted the cell-type-recon branch April 12, 2018 01:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants