getSignificance very memory intensive when fit = ANOVA #63

Dragonmasterx87 · 2023-02-10T21:22:08Z

Hi Nick!

Amazing tool! I was wondering if you could help me with a issue I am having owing to a large dataset.
I have a seurat object which consists of sample data integrated across 15 individuals 50k+ subsetted high-quality doublet free cells.
Now in this dataset I have a metadata for celltype (15 types) which are further broken into race (2 types) and sex (2 types) which leads to joined metadata slot for 60 celltypes, on the basis of sex and race.

Now I have already downloaded the entire msigdb for cat C5 (gene ontologies), using:

# molecular Signature database for msigdb
gene.sets <- getGenesets(org = "hsa",
                         db = c("msigdb"), #https://rdrr.io/bioc/EnrichmentBrowser/man/getGenesets.html
                         gene.id.type = "SYMBOL", #idTypes(org = "hsa")
                         cat = c("C5"), # C5 is gene ontology
                         cache = TRUE, 
                         return.type = "list")

and successfuly ran enrichIt:

# Enrichment calculation
ES <- enrichIt(obj = pancreas.combined.h.s, 
               gene.sets = gene.sets, 
               groups = 1000, 
               cores = 30, 
               min.size = NULL)

At this point, I have successfully reproduced all of the plotting params that I have tested and which are outlined in your vignette. Obviously after transferring an enormous amount of C5 metadata or the ES file to the Seurat object.

Now I am interested in running a statistical test across all 60 samples in various configurations to test for pathways that are more activated in some sex_ancestral_celltype vs another sex_ancestral_celltype (obviously the comparison is for the same cell type but differing sex and ancestry). In order to achieve this I run:

# Significance testing
significant_pathways <- getSignificance(ES2, group = "celltype.sample", fit = "ANOVA")

As I write this I just crossed 450GB of ram utilization. Is there any way I can reduce this colossal computational complexity?
I am thinking probably the only way, is to individually subset cell types, calculate the significance and then add that metadata iteratively back to the primary Seurat file. Or is there some way of using getSignificance to specify the testing combinations?

Thanks again for the wonderful tool.

Cheers,

🐉

The text was updated successfully, but these errors were encountered:

Dragonmasterx87 · 2023-02-11T00:53:33Z

Update: I have subsetted each of the individual cell types, and am now running getSignificance for each file separately. So far it seems to be less computationally intensive. The trick will be to concatenate the data when I have it. Will update you later.

Update: Using some meta-data kung-fu, I was able to segregate the cell-types into individual Seurat objects and successfully run getSignificance on them, using simpler computational complexity I was able to reduce ram usage from >1.2TB to <20GB.

With regards to the interpretation, for the sake of argument in a cell-type X for the GO term 'increased carbohydrate synthesis' say I am comparing males to females and the p.val is <0.001 (the column name is malesvsfemales.pval), also say the median for males is 4000 and median for females is 2000, would that suggest that this specific GO is up-regulated in males, owing to a higher median in males and furthermore, in the males_vs_females direction and it is significant?

Cheers,

🐉

ncborcherding · 2023-02-16T11:12:42Z

Hey Fahd,

Apologies for the delay -

Thank for reaching out and giving an extensive summary (with follow up) of the problem. You are completely correct - there is a large computational requirement for a lot of the additional testing as getSignificance() is dependent on r::stats. I think this is a larger issue as data sets expand and will mark this as help wanted because the issue is persistent.

Please let me know if you have any suggestions from your experience and I will test some ideas I have in the mean time.

Nick

Dragonmasterx87 · 2023-02-17T07:04:41Z

Cheers thanks legend!!

ncborcherding added the help wanted Extra attention is needed label Feb 16, 2023

Dragonmasterx87 closed this as completed Jul 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

getSignificance very memory intensive when fit = ANOVA #63

getSignificance very memory intensive when fit = ANOVA #63

Dragonmasterx87 commented Feb 10, 2023 •

edited

Loading

Dragonmasterx87 commented Feb 11, 2023 •

edited

Loading

ncborcherding commented Feb 16, 2023

Dragonmasterx87 commented Feb 17, 2023

getSignificance very memory intensive when fit = ANOVA #63

getSignificance very memory intensive when fit = ANOVA #63

Comments

Dragonmasterx87 commented Feb 10, 2023 • edited Loading

Dragonmasterx87 commented Feb 11, 2023 • edited Loading

ncborcherding commented Feb 16, 2023

Dragonmasterx87 commented Feb 17, 2023

Dragonmasterx87 commented Feb 10, 2023 •

edited

Loading

Dragonmasterx87 commented Feb 11, 2023 •

edited

Loading