How to get report of average and percentage gene expression from a list of genes across entire dataset instead of per cluster #4497

ksaunders73 · 2021-05-20T01:36:28Z

I found code from #3521 which has allowed me to plot a list of genes onto one dotplot or featureplot, instead of having separate featureplots for each gene.

nkfocuslist <- list(c("TFF1", "MB", "ANKRD30B",
"LINC00173", "DSCAM-AS1", "IGHG1", "SERPINA5"))
sobj <- AddModuleScore(object = sobj, features = nkfocuslist, name = "NK_Focus_List")
FeaturePlot(object = sobj, features = "NK_Focus_List1")

From #1888 I found you can get the percentage and average gene expression from dotplot information. I would like to do something similar to the above where I can get the gene expression across the whole dataset instead of the expression per cluster:

a <- DotPlot(object = sobj, features = c("TFF1", "MB", "ANKRD30B",
"LINC00173", "DSCAM-AS1", "IGHG1", "SERPINA5"))
a$data

In addition, is there a way to get the p-values associated with these expression values? As well as set thresholds for either?

Hopefully this makes sense, and thanks for reading!

samuel-marsh · 2021-05-21T12:49:05Z

Hi,

Not member of dev team but hopefully this helps. In terms of getting the average expression I suggest taking look at AverageExpression function.

Alternatively you can run a modified version of the code used to create the DotPlot

seurat/R/visualization.R

Lines 3462 to 3476 in 4e868fc

    
           data.plot <- lapply( 
        
             X = unique(x = data.features$id), 
        
             FUN = function(ident) { 
        
               data.use <- data.features[data.features$id == ident, 1:(ncol(x = data.features) - 1), drop = FALSE] 
        
               avg.exp <- apply( 
        
                 X = data.use, 
        
                 MARGIN = 2, 
        
                 FUN = function(x) { 
        
                   return(mean(x = expm1(x = x))) 
        
                 } 
        
               ) 
        
               pct.exp <- apply(X = data.use, MARGIN = 2, FUN = PercentAbove, threshold = 0) 
        
               return(list(avg.exp = avg.exp, pct.exp = pct.exp)) 
        
             } 
        
           )

The modified version to perform across an entire seurat object would simply be:

# PercentAbove function to environment
PercentAbove <- function(x, threshold) {
  return(length(x = x[x > threshold]) / length(x = x))
}
# Pull data on features of interest
data.features <- FetchData(object = obj_name, vars = c("Gene1", "Gene2", etc))
# Calculate average expression (NOTE can also be done simply using `AverageExpression` function.)
avg.exp <- apply(X = data.features, MARGIN = 2, FUN = function(x) {return(mean(x = expm1(x = x)))})
# Calculate % expressing
pct.exp <- apply(X = data.features, MARGIN = 2, FUN = PercentAbove, threshold = 0)
# combine the data
combined_avg_pct <- list(avg.exp = avg.exp, pct.exp = pct.exp)```

I'm not sure what you mean by p-values associated with these values. p-value is based on statistical test and if you are performing these calculations across the whole dataset then you aren't comparing them to anything.

Best,
Sam

ksaunders73 · 2021-05-26T02:08:12Z

@samuel-marsh This worked! Thanks so much for your help!!

samuel-marsh · 2021-05-26T14:34:48Z

@ksaunders73 Happy to help!

torkencz closed this as completed May 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get report of average and percentage gene expression from a list of genes across entire dataset instead of per cluster #4497

How to get report of average and percentage gene expression from a list of genes across entire dataset instead of per cluster #4497

ksaunders73 commented May 20, 2021 •

edited

Loading

samuel-marsh commented May 21, 2021 •

edited

Loading

ksaunders73 commented May 26, 2021

samuel-marsh commented May 26, 2021

How to get report of average and percentage gene expression from a list of genes across entire dataset instead of per cluster #4497

How to get report of average and percentage gene expression from a list of genes across entire dataset instead of per cluster #4497

Comments

ksaunders73 commented May 20, 2021 • edited Loading

samuel-marsh commented May 21, 2021 • edited Loading

ksaunders73 commented May 26, 2021

samuel-marsh commented May 26, 2021

ksaunders73 commented May 20, 2021 •

edited

Loading

samuel-marsh commented May 21, 2021 •

edited

Loading