Skip to content

IntegrateData error: number of items to replace is not a multiple of replacement length #6341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
XinmiaoYan opened this issue Aug 24, 2022 · 13 comments

Comments

@XinmiaoYan
Copy link

I used below code to integrate data by rPCA, but error occurred, I've tried many ways but still couldn't solve this problem. appreciate it if you could help.

##### Perform integration #######
sc.anchors <- FindIntegrationAnchors(object.list = sc.list, anchor.features = features, reduction = "rpca")
# this command creates an 'integrated' data assay
sc.combined <- IntegrateData(anchorset = sc.anchors)

Merging dataset 2 into 4
Extracting anchors for merged samples
Finding integration vectors
Finding integration vector weights
Error in idx[i, ] <- res[[i]][[1]] :
number of items to replace is not a multiple of replacement length

This error couldn't solved by adjust parameters, such has l2.norm and k.filter....
Appreciate it if you could help.

@Gesmira
Copy link
Contributor

Gesmira commented Aug 26, 2022

Hi, what are the sizes of your datasets in sc.list? k.weight (the number of neighbors to consider when weighting anchors) in IntegrateData() defaults to 100 so if one of your datasets is smaller than 100 it will cause this error to occur. Have you tried adjusting the k.weight parameter to the size of your smallest dataset? Otherwise, you may want to remove very small datasets.

@XinmiaoYan
Copy link
Author

@Gesmira, yes, I tried to adjust k.weight. the smallest sc in my sc.list has 130 cells, when I adjusted k.weight to 50, this error was still here and didn't be resolved.

@Gesmira
Copy link
Contributor

Gesmira commented Aug 26, 2022

Have you tried also setting k.filter inFindIntegrationAnchors() before you run IntegrateData()?
For example:

sc.anchors <- FindIntegrationAnchors(object.list = sc.list, anchor.features = features, reduction = "rpca", k.filter = 100)
sc.combined <- IntegrateData(anchorset = sc.anchors, k.weight = 100)

@XinmiaoYan
Copy link
Author

yes, I tried k.filter = 30, it didn't work for this error

@Gesmira
Copy link
Contributor

Gesmira commented Aug 26, 2022

Just to see if the issue is due to the small dataset, are you able to run the integration through without it?

@XinmiaoYan
Copy link
Author

I removed the smallest 130 sc, and it works this time. and the smallest one is 205, I'm wondering what's the cutoff for the smallest obj that can be used to integrate? thanks

@XinmiaoYan
Copy link
Author

XinmiaoYan commented Oct 11, 2022 via email

@decodebiology
Copy link

I got the lowest cell number "3770" for a smaller object. It is same error. tried playing with different parameters k.filter, k.anchor, k.score, no luck so far.

@ajynair
Copy link

ajynair commented Jan 12, 2023

I also got similar error with my dataset integration. When I removed two samples with less than 120 cells and removed the old 'integrated' assay in the seurat object, this error went away. Just noting it here.

@levinhein
Copy link

levinhein commented Jun 20, 2023

I got the same error. The lowest cell number is 375. I removed it and still the same error persists.
I further removed all data lower than 1,000 and still the same error shows up. The remaining data that I have are in 1k-3k+ range.

@ajynair
Copy link

ajynair commented Jul 3, 2023

@levinhein , I have also seen that when samples have large differences in cell numbers (10 fold) then also this problem comes. I was wondering if we could artificially split the large samples so that all samples have comparable number of cells.

@Gesmira
Copy link
Contributor

Gesmira commented Jul 7, 2023

Hi,
We are adding a more informative error message for this issue currently in the development branch. It seems to occur when the number of anchor cells is less than the k.weight parameter which is the number of neighbors to use when weighting anchors. For now, I would recommend reducing k.weight, combining samples if there are few cells in certain samples, or adjusting parameters to FindIntegrationAnchors (which can be provided as inputs to IntegrateLayers) such as increasing k.anchor to increase the number of anchors/cells which act as anchors. We recommend checking the results of your integration if these parameters are changed to ensure the results are satisfactory.

@tud03125
Copy link

I too am facing that same error. This is my code I'm using:

apply_rpca <- function(counts, batch, conditions) {
  
  # Create Seurat object
  seurat_obj <- CreateSeuratObject(counts = counts)
  seurat_obj$batch <- batch
  seurat_obj$condition <- conditions
  
  # Find variable features and normalize the data using NormalizeData
  seurat_obj <- FindVariableFeatures(seurat_obj)
  seurat_obj <- NormalizeData(seurat_obj)
  
  # Split object into a list by batch
  seurat_list <- SplitObject(seurat_obj, split.by = "batch")
  
  # Check and combine small batches
  cell_counts <- sapply(seurat_list, ncol)
  min_cells <- 50
  small_batches <- names(cell_counts[cell_counts < min_cells])
  
  if (length(small_batches) > 0) {
    warning(paste("Merging small batches:", paste(small_batches, collapse = ", ")))
    
    # Normalize each small batch separately using NormalizeData
    seurat_list <- lapply(seurat_list, function(obj) {
      obj <- NormalizeData(obj)
      return(obj)
    })
    
    # Then merge the normalized batches
    combined <- merge(seurat_list[[small_batches[1]]], y = seurat_list[small_batches[-1]])
    
    # Run FindVariableFeatures on the combined batch after normalization
    combined <- FindVariableFeatures(combined)
    print("Checking 'combined' variable's dimensions:")
    print(dim(combined))  # Check the dimensions of the combined object
    
    seurat_list <- seurat_list[!names(seurat_list) %in% small_batches]
    seurat_list[["combined_batch"]] <- combined
  }
  
  # Print updated cell counts after combining batches
  cell_counts <- sapply(seurat_list, ncol)
  print("Checking cell counts after combining batches")
  print(cell_counts)
  
  # Adjust number of PCs to the minimum between 10 and the number of cells in the smallest batch
  npcs <- min(10, min(cell_counts) - 1)
  
  # Check the calculated npcs and adjust to 5 if greater than 5
  print(paste("Calculated npcs:", npcs))  # Print the calculated npcs
  npcs <- min(npcs, 2)  # Ensure npcs does not exceed 2
  print(paste("Using npcs =", npcs))  # Confirm the final value of npcs
  
  # Scale data and Run PCA on the individual Seurat objects
  seurat_list <- lapply(seurat_list, function(obj) {
    obj <- ScaleData(obj)   # Scale the data before PCA
    obj <- RunPCA(obj, npcs = npcs, approx = FALSE)
    return(obj)
  })
  
  # Select features for integration
  features <- SelectIntegrationFeatures(object.list = seurat_list)
  
  # Adjust k.anchor based on the smallest batch (reduce to 5 for smaller batches)
  min_cells_in_batch <- min(cell_counts)
  k.anchor <- min(2, min_cells_in_batch)  # Use smaller k.anchor for small batches
  print(paste("Using k.anchor =", k.anchor))
  
  # Set dims based on the number of cells in the smallest batch (e.g., fewer than 10)
  rpca_result <- FindIntegrationAnchors(
    object.list = seurat_list,
    reduction = "rpca",
    dims = 1:2,  # Use fewer dimensions based on the smallest batch size
    k.anchor = k.anchor,  # Use the adjusted k.anchor
    anchor.features = features
  )
  
  # Integrate datasets
  integrated_data <- IntegrateData(anchorset = rpca_result, dims = 1:5)
  
  # Scale and run PCA on integrated data
  integrated_data <- ScaleData(integrated_data, features = rownames(integrated_data))
  integrated_data <- RunPCA(integrated_data, npcs = 5)
  print("Checking Dimension Reductions:")
  print(seurat_obj@reductions)
  print(head(seurat_obj@reductions$pca@cell.embeddings))
  
  # Run UMAP and plot
  integrated_data <- RunUMAP(integrated_data, dims = 1:5)
  DimPlot(integrated_data, reduction = "umap", group.by = "batch")
  dev.off()
}

And, the error message's this:

> apply_rpca(combined_counts, batches, combined_conditions)
Warning: Data is of class data.frame. Coercing to dgCMatrix.
Finding variable features for layer counts
Calculating gene variances
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating feature variances of standardized and clipped values
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Normalizing layer: counts
Performing log-normalization
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Normalizing layer: counts
Performing log-normalization
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Normalizing layer: counts
Performing log-normalization
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Finding variable features for layer counts.1
Calculating gene variances
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating feature variances of standardized and clipped values
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Finding variable features for layer counts.2
Calculating gene variances
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating feature variances of standardized and clipped values
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
[1] "Checking 'combined' variable's dimensions:"
[1] 50358    10
[1] "Checking cell counts after combining batches"
combined_batch 
            10 
[1] "Calculated npcs: 9"
[1] "Using npcs = 2"
Centering and scaling data matrix
  |=================================================================================================================| 100%
PC_ 1 
Positive:  ENSMUSG00000029314, ENSMUSG00000039361, ENSMUSG00000021895, ENSMUSG00000066278, ENSMUSG00000060519, ENSMUSG00000095595, ENSMUSG00000030643, ENSMUSG00000025041, ENSMUSG00000027999, ENSMUSG00000041057 
	   ENSMUSG00000049300, ENSMUSG00000025532, ENSMUSG00000035372, ENSMUSG00000030494, ENSMUSG00000041238, ENSMUSG00000036002, ENSMUSG00000026377, ENSMUSG00000042354, ENSMUSG00000039899, ENSMUSG00000028016 
	   ENSMUSG00000022009, ENSMUSG00000026558, ENSMUSG00000048218, ENSMUSG00000015711, ENSMUSG00000090213, ENSMUSG00000030278, ENSMUSG00000022747, ENSMUSG00000024679, ENSMUSG00000030966, ENSMUSG00000031765 
Negative:  ENSMUSG00000066071, ENSMUSG00000068457, ENSMUSG00000022181, ENSMUSG00000024421, ENSMUSG00000073842, ENSMUSG00000078672, ENSMUSG00000072849, ENSMUSG00000069049, ENSMUSG00000087141, ENSMUSG00000037977 
	   ENSMUSG00000068086, ENSMUSG00000078234, ENSMUSG00000013495, ENSMUSG00000025402, ENSMUSG00000041698, ENSMUSG00000027983, ENSMUSG00000041119, ENSMUSG00000035845, ENSMUSG00000074824, ENSMUSG00000003348 
	   ENSMUSG00000073830, ENSMUSG00000041372, ENSMUSG00000073834, ENSMUSG00000078597, ENSMUSG00000016028, ENSMUSG00000028976, ENSMUSG00000032177, ENSMUSG00000057068, ENSMUSG00000027242, ENSMUSG00000058613 
PC_ 2 

Positive:  ENSMUSG00000037826, ENSMUSG00000027513, ENSMUSG00000037795, ENSMUSG00000052934, ENSMUSG00000032531, ENSMUSG00000022383, ENSMUSG00000017639, ENSMUSG00000074623, ENSMUSG00000035878, ENSMUSG00000025357 
	   ENSMUSG00000028645, ENSMUSG00000025479, ENSMUSG00000025260, ENSMUSG00000032898, ENSMUSG00000057342, ENSMUSG00000041779, ENSMUSG00000049791, ENSMUSG00000039958, ENSMUSG00000030055, ENSMUSG00000047379 
	   ENSMUSG00000024254, ENSMUSG00000033318, ENSMUSG00000025059, ENSMUSG00000021509, ENSMUSG00000024900, ENSMUSG00000091264, ENSMUSG00000029630, ENSMUSG00000054723, ENSMUSG00000028132, ENSMUSG00000040466 
Negative:  ENSMUSG00000020538, ENSMUSG00000038146, ENSMUSG00000055413, ENSMUSG00000045934, ENSMUSG00000019235, ENSMUSG00000030555, ENSMUSG00000056313, ENSMUSG00000079442, ENSMUSG00000035686, ENSMUSG00000025153 
	   ENSMUSG00000038233, ENSMUSG00000029093, ENSMUSG00000052684, ENSMUSG00000053604, ENSMUSG00000032113, ENSMUSG00000017715, ENSMUSG00000021670, ENSMUSG00000034371, ENSMUSG00000029016, ENSMUSG00000029482 
	   ENSMUSG00000030302, ENSMUSG00000044254, ENSMUSG00000030103, ENSMUSG00000060183, ENSMUSG00000032596, ENSMUSG00000028378, ENSMUSG00000044505, ENSMUSG00000044350, ENSMUSG00000011034, ENSMUSG00000030064 
[1] "Using k.anchor = 2"
Scaling features for provided objects
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s  
Computing within dataset neighborhoods
  |                                                  | 0 % ~calculating  Error in idx[i, ] <- res[[i]][[1]] : 
  number of items to replace is not a multiple of replacement length
In addition: Warning messages:
1: In .refine_k(k, precomputed, query = TRUE) :
  'k' capped at the number of observations
2: In .refine_k(k, precomputed, query = TRUE) :
  'k' capped at the number of observations
3: In print.DimReduc(x = reduction.data, dims = ndims.print, nfeatures = nfeatures.print) :
  Only 2 dimensions have been computed.
4: In check_numbers(k = k, nu = nu, nv = nv, limit = min(dim(x)) -  :
  more singular values/vectors requested than available
5: In .refine_k(k, precomputed, query = TRUE) :
  'k' capped at the number of observations
6: In .refine_k(k, precomputed, query = TRUE) :
  'k' capped at the number of observations
7: In apply_rpca(combined_counts, batches, combined_conditions) :
  Merging small batches: I, IA
8: In print.DimReduc(x = reduction.data, dims = ndims.print, nfeatures = nfeatures.print) :
  Only 2 dimensions have been computed.

Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'
Also defined by ‘spam’
>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants