Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter out/subset features #4958

Closed
Leonrunning opened this issue Aug 18, 2021 · 6 comments
Closed

filter out/subset features #4958

Leonrunning opened this issue Aug 18, 2021 · 6 comments

Comments

@Leonrunning
Copy link

Hi All,

The function of "CreateSeuratObject" filters out low quality of cells and features from the raw data set.

For a specific purpose, I do not want to create a new object but would like to filter out some low distributed features in the down-stream analysis.

Are there any suggestions to do so? Is that possible to calculate the percentage of cells with a specific feature expression so that we can do subset based on that? Any suggestion would be highly appreciated.

thanks

Best,

Leon

@mhkowalski
Copy link
Contributor

Hi,

To calculate what percentage of cells express each gene, you could do something like this.

counts <- GetAssayData(seurat_obj, slot="counts", assay="RNA")   
genes.percent.expression <- rowMeans(counts>0 )*100   

However, you can not filter out certain genes unless you create a new Seurat object, like this.

genes.filter <- names(gene.percent.expressed[gene.percent.expressed>1])  #select genes expressed in at least 1% of cells
counts.sub <- counts[genes.filter,]
new_seurat_object <- CreateSeuratObject(counts=counts.sub)

Depending on what your downstream analysis is, it might be possible to select features without creating a new Seurat object. For example, the FindMarkers() command has a features argument that you can use to perform DE only on the genes you choose.

@Leonrunning
Copy link
Author

Hi,

To calculate what percentage of cells express each gene, you could do something like this.

counts <- GetAssayData(seurat_obj, slot="counts", assay="RNA")   
genes.percent.expression <- rowMeans(counts>0 )*100   

However, you can not filter out certain genes unless you create a new Seurat object, like this.

genes.filter <- names(gene.percent.expressed[gene.percent.expressed>1])  #select genes expressed in at least 1% of cells
counts.sub <- counts[genes.filter,]
new_seurat_object <- CreateSeuratObject(counts=counts.sub)

Depending on what your downstream analysis is, it might be possible to select features without creating a new Seurat object. For example, the FindMarkers() command has a features argument that you can use to perform DE only on the genes you choose.

That's very clear. Thanks

@Leonrunning
Copy link
Author

Hi,

To calculate what percentage of cells express each gene, you could do something like this.

counts <- GetAssayData(seurat_obj, slot="counts", assay="RNA")   
genes.percent.expression <- rowMeans(counts>0 )*100   

However, you can not filter out certain genes unless you create a new Seurat object, like this.

genes.filter <- names(gene.percent.expressed[gene.percent.expressed>1])  #select genes expressed in at least 1% of cells
counts.sub <- counts[genes.filter,]
new_seurat_object <- CreateSeuratObject(counts=counts.sub)

Depending on what your downstream analysis is, it might be possible to select features without creating a new Seurat object. For example, the FindMarkers() command has a features argument that you can use to perform DE only on the genes you choose.

Hi mhkowalski,

The idea is very clear and applicable to my analysis. Appreciate!

Do you mind to explain more about the functions of "rowMeans(counts>0)"? It seems like it calculates the mean counts of a feature among selected cells (counts>0), but not the ratio of the cells with its expression.

Sorry, I am new to R and please let me know if I am wrong with my understanding.

thanks

Leon

@mhkowalski
Copy link
Contributor

counts>0 returns a matrix where each entry is TRUE/FALSE if that entry of the counts matrix exceeds 0. Performing rowMeans on that matrix gives you for each gene the number of cells with a count > 0 divided by total # of cells, which is the percent of cells expressing a gene.

@Leonrunning
Copy link
Author

counts>0 returns a matrix where each entry is TRUE/FALSE if that entry of the counts matrix exceeds 0. Performing rowMeans on that matrix gives you for each gene the number of cells with a count > 0 divided by total # of cells, which is the percent of cells expressing a gene.

Awesome, that perfectly solved my issue. Thanks so much for your help!

Leon

@whiteorchid
Copy link

Dear authors,

Is there a recommend threshold for the filter? Say 5% or 10% or others instead of 1% of genes.percent.expression?

Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants