-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exploring ImmuneSigDB #4
Comments
WEHI has separated out human and mouse data and made files available for loading into R. Unfortunately they use data from MSigDB 5.2; 6.1 appears to be the latest. Of course the Broad invented a new format, the .gmt format. |
I'll be using Bioconductor's GSEABase to see if I can explore this data. Note that Bioconductor wants rJava, and you'll need to Also, apparently GSEABase can't read .gmt files. I found ...and sure, it does work, but .gmt files do not have enough information--just the experiment name and a list of gene IDs. It is useful to run queries over experiment names though, e.g.
|
I think what I want is "ZIPped MSigDB v6.1 file set" from http://software.broadinstitute.org/gsea/downloads.jsp, which contains Ugh of course this XML file is just a table and could have been a TSV:
So in R
Now we can finally ask interesting questions
|
For future efforts, here's a TSV of the MSigDB 6.1 XML metadata with only the C7 (i.e. ImmuneSigDB) gene sets: c7.tsv.zip When loading, be sure to use |
Decided to take a look at the data from the top PMID turned up in #4 (comment), 14607935. The PubMed page links to the associated GEO datasets (nice!). GDS1290 is just metadata, while GSE2770 is legit data. What's more, GEO2R will generate R code to import this data into R! Had to |
So, w/
We've lost a lot of metadata, and a lot of replicates; only GSM60348 to GSM60381 made it. I think we're going to have to tidy this thing ourselves. |
After reviewing the GEO data model and how to work with S4 objects, I have a better handle on this data; To tidy up this data:
Note that The most important next steps: understand GeneSome packages that may help Annotations for the 3 platforms from this paper Reading the ImmuneSigDB paper, they
In other words, add a column for HUGO gene, group on it, and keep the max value within the group. Maybe fastest way to map, after
Here's another way to do it using
ValueSince some genes have multiple probes mapped to them, we need to take the max value from all probes mapped to a gene to be compatible with the ImmuneSigDB analysis.
Before I bother with normalization, I thought it would be fun to make some plots, inspired by Figure 1 of the paper!
ImmuneSigDB normalized CEL files with |
More information on why I couldn't find TBX21 in the previous analysis:
|
More information on the top 10 T cell gene signatures turned up in #4 (comment):
|
Okay time to work with CEL files!
Okay, I am going to have to read and normalize the CEL files from the 3 different platforms separately. It's not obvious that differential expression analysis across platforms is a good idea, cf. Cross-study validation and combined analysis of gene expression microarray data (2008) To get the |
Let's just get the CEL files used to generate the GSE2770_IL12_AND_TGFB_ACT_VS_ACT_CD4_TCELL_6H_DN gene set. First, I'll need to figure out which samples to use as input. I'm going to try working with the phenotype data from GEO. This involves interacting w/ ExpressionSet and AnnotatedDataFrame objects from Bioconductor, sadly.
|
Okay let's try to figure out why there's no overlap.
|
Downloading MSigDB XML files from the CLI:
cf. curl manpage:
|
From the MCP-counter paper:
Loos like frma will allow us to compare across array platforms. |
Also known as MSigDB C7. Paper
The text was updated successfully, but these errors were encountered: