Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get report of average and percentage gene expression from a list of genes across entire dataset instead of per cluster #4497

Closed
ksaunders73 opened this issue May 20, 2021 · 3 comments

Comments

@ksaunders73
Copy link

ksaunders73 commented May 20, 2021

I found code from #3521 which has allowed me to plot a list of genes onto one dotplot or featureplot, instead of having separate featureplots for each gene.

nkfocuslist <- list(c("TFF1", "MB", "ANKRD30B",
"LINC00173", "DSCAM-AS1", "IGHG1", "SERPINA5"))
sobj <- AddModuleScore(object = sobj, features = nkfocuslist, name = "NK_Focus_List")
FeaturePlot(object = sobj, features = "NK_Focus_List1")
image

From #1888 I found you can get the percentage and average gene expression from dotplot information. I would like to do something similar to the above where I can get the gene expression across the whole dataset instead of the expression per cluster:

a <- DotPlot(object = sobj, features = c("TFF1", "MB", "ANKRD30B",
"LINC00173", "DSCAM-AS1", "IGHG1", "SERPINA5"))
a$data
image

In addition, is there a way to get the p-values associated with these expression values? As well as set thresholds for either?

Hopefully this makes sense, and thanks for reading!

@samuel-marsh
Copy link
Collaborator

samuel-marsh commented May 21, 2021

Hi,

Not member of dev team but hopefully this helps. In terms of getting the average expression I suggest taking look at AverageExpression function.

Alternatively you can run a modified version of the code used to create the DotPlot

seurat/R/visualization.R

Lines 3462 to 3476 in 4e868fc

data.plot <- lapply(
X = unique(x = data.features$id),
FUN = function(ident) {
data.use <- data.features[data.features$id == ident, 1:(ncol(x = data.features) - 1), drop = FALSE]
avg.exp <- apply(
X = data.use,
MARGIN = 2,
FUN = function(x) {
return(mean(x = expm1(x = x)))
}
)
pct.exp <- apply(X = data.use, MARGIN = 2, FUN = PercentAbove, threshold = 0)
return(list(avg.exp = avg.exp, pct.exp = pct.exp))
}
)

The modified version to perform across an entire seurat object would simply be:

# PercentAbove function to environment
PercentAbove <- function(x, threshold) {
  return(length(x = x[x > threshold]) / length(x = x))
}
# Pull data on features of interest
data.features <- FetchData(object = obj_name, vars = c("Gene1", "Gene2", etc))
# Calculate average expression (NOTE can also be done simply using `AverageExpression` function.)
avg.exp <- apply(X = data.features, MARGIN = 2, FUN = function(x) {return(mean(x = expm1(x = x)))})
# Calculate % expressing
pct.exp <- apply(X = data.features, MARGIN = 2, FUN = PercentAbove, threshold = 0)
# combine the data
combined_avg_pct <- list(avg.exp = avg.exp, pct.exp = pct.exp)```

I'm not sure what you mean by p-values associated with these values. p-value is based on statistical test and if you are performing these calculations across the whole dataset then you aren't comparing them to anything.

Best,
Sam

@ksaunders73
Copy link
Author

@samuel-marsh This worked! Thanks so much for your help!!

@samuel-marsh
Copy link
Collaborator

@ksaunders73 Happy to help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants