Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subset(downsample= X) #3033

Closed
chrismahony opened this issue May 19, 2020 · 4 comments
Closed

subset(downsample= X) #3033

chrismahony opened this issue May 19, 2020 · 4 comments

Comments

@chrismahony
Copy link

Can you tell me, when I use the downsample function, how does seurat exclude or choose cells?

Thanks
Chris

@yuhanH
Copy link
Collaborator

yuhanH commented May 22, 2020

Hi,
You can set invert = TRUE, then it will exclude input cells.
For example

select.cell <- WhichCells(pbmc_small, idents = 0)
pbmc_small <- subset(x = pbmc_small, cells = select.cell, invert = TRUE)

@yuhanH yuhanH closed this as completed May 22, 2020
@chrismahony
Copy link
Author

Thanks for this, but I really want to understand more how the downsample function actualy works. If I have an input of 2000 cells and downsample to 500, how are te 1500 cells excluded? What pareameters are excluding these cells? Thanks

@yuhanH
Copy link
Collaborator

yuhanH commented Jun 30, 2020

downsample is an input parameter from WhichCells

Maximum number of cells per identity class, default is Inf; downsampling will happen after all other operations, including inverting the cell selection

@bug1303
Copy link

bug1303 commented Jun 11, 2021

It's a closed issue, but I stumbled across the same question as well, and went on to find the answer.

You can see the code that is actually called as such: SeuratObject:::subset.Seurat, which in turn calls SeuratObject:::WhichCells.Seurat (as @yuhanH mentioned).

It first does all the selection and potential inversion of cells, and then this is the bit concerning downsampling:

    cells <- CellsByIdentities(object = object, cells = cells)
    cells <- lapply(X = cells, FUN = function(x) {
        if (length(x = x) > downsample) {
            x <- sample(x = x, size = downsample, replace = FALSE)
        }
        return(x)
    })

So indeed, it groups it into the identity classes (e.g. clusters or whichever idents are chosen), and then for each of those groups calls sample if it contains more than the requested number of cells. So, it's just a random selection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants