IntegrateData error: number of items to replace is not a multiple of replacement length #6341

XinmiaoYan · 2022-08-24T17:17:29Z

I used below code to integrate data by rPCA, but error occurred, I've tried many ways but still couldn't solve this problem. appreciate it if you could help.

##### Perform integration #######
sc.anchors <- FindIntegrationAnchors(object.list = sc.list, anchor.features = features, reduction = "rpca")
# this command creates an 'integrated' data assay
sc.combined <- IntegrateData(anchorset = sc.anchors)

Merging dataset 2 into 4
Extracting anchors for merged samples
Finding integration vectors
Finding integration vector weights
Error in idx[i, ] <- res[[i]][[1]] :
number of items to replace is not a multiple of replacement length

This error couldn't solved by adjust parameters, such has l2.norm and k.filter....
Appreciate it if you could help.

The text was updated successfully, but these errors were encountered:

Gesmira · 2022-08-26T19:08:28Z

Hi, what are the sizes of your datasets in sc.list? k.weight (the number of neighbors to consider when weighting anchors) in IntegrateData() defaults to 100 so if one of your datasets is smaller than 100 it will cause this error to occur. Have you tried adjusting the k.weight parameter to the size of your smallest dataset? Otherwise, you may want to remove very small datasets.

XinmiaoYan · 2022-08-26T19:34:54Z

@Gesmira, yes, I tried to adjust k.weight. the smallest sc in my sc.list has 130 cells, when I adjusted k.weight to 50, this error was still here and didn't be resolved.

Gesmira · 2022-08-26T19:59:18Z

Have you tried also setting k.filter inFindIntegrationAnchors() before you run IntegrateData()?
For example:

sc.anchors <- FindIntegrationAnchors(object.list = sc.list, anchor.features = features, reduction = "rpca", k.filter = 100)
sc.combined <- IntegrateData(anchorset = sc.anchors, k.weight = 100)

XinmiaoYan · 2022-08-26T20:14:47Z

yes, I tried k.filter = 30, it didn't work for this error

Gesmira · 2022-08-26T20:22:14Z

Just to see if the issue is due to the small dataset, are you able to run the integration through without it?

XinmiaoYan · 2022-08-26T20:27:46Z

I removed the smallest 130 sc, and it works this time. and the smallest one is 205, I'm wondering what's the cutoff for the smallest obj that can be used to integrate? thanks

XinmiaoYan · 2022-10-11T07:06:04Z

yes, I did checked the size of sc.list, the smallest one is 130, which is higher than 100. And I also adjusted k.weight to 50, also the same error.

…

On Fri, Aug 26, 2022 at 2:08 PM gesmira ***@***.***> wrote: Hi, what are the sizes of your datasets in sc.list? k.weight (the number of neighbors to consider when weighting anchors) in IntegrateData() defaults to 100 so if one of your datasets is smaller than 100 it will cause this error to occur. Have you tried adjusting the k.weight parameter to the size of your smallest dataset? Otherwise, you may want to remove very small datasets. — Reply to this email directly, view it on GitHub <#6341 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ATZU2PDNHFI6M625ZHZHSWLV3EI3PANCNFSM57QEIF4Q> . You are receiving this because you authored the thread.Message ID: ***@***.***>

decodebiology · 2022-10-27T20:04:33Z

I got the lowest cell number "3770" for a smaller object. It is same error. tried playing with different parameters k.filter, k.anchor, k.score, no luck so far.

ajynair · 2023-01-12T20:30:40Z

I also got similar error with my dataset integration. When I removed two samples with less than 120 cells and removed the old 'integrated' assay in the seurat object, this error went away. Just noting it here.

levinhein · 2023-06-20T20:45:58Z

I got the same error. The lowest cell number is 375. I removed it and still the same error persists.
I further removed all data lower than 1,000 and still the same error shows up. The remaining data that I have are in 1k-3k+ range.

ajynair · 2023-07-03T16:24:16Z

@levinhein , I have also seen that when samples have large differences in cell numbers (10 fold) then also this problem comes. I was wondering if we could artificially split the large samples so that all samples have comparable number of cells.

Gesmira · 2023-07-07T21:38:21Z

Hi,
We are adding a more informative error message for this issue currently in the development branch. It seems to occur when the number of anchor cells is less than the k.weight parameter which is the number of neighbors to use when weighting anchors. For now, I would recommend reducing k.weight, combining samples if there are few cells in certain samples, or adjusting parameters to FindIntegrationAnchors (which can be provided as inputs to IntegrateLayers) such as increasing k.anchor to increase the number of anchors/cells which act as anchors. We recommend checking the results of your integration if these parameters are changed to ensure the results are satisfactory.

tud03125 · 2024-09-24T15:08:42Z

I too am facing that same error. This is my code I'm using:

apply_rpca <- function(counts, batch, conditions) {
  
  # Create Seurat object
  seurat_obj <- CreateSeuratObject(counts = counts)
  seurat_obj$batch <- batch
  seurat_obj$condition <- conditions
  
  # Find variable features and normalize the data using NormalizeData
  seurat_obj <- FindVariableFeatures(seurat_obj)
  seurat_obj <- NormalizeData(seurat_obj)
  
  # Split object into a list by batch
  seurat_list <- SplitObject(seurat_obj, split.by = "batch")
  
  # Check and combine small batches
  cell_counts <- sapply(seurat_list, ncol)
  min_cells <- 50
  small_batches <- names(cell_counts[cell_counts < min_cells])
  
  if (length(small_batches) > 0) {
    warning(paste("Merging small batches:", paste(small_batches, collapse = ", ")))
    
    # Normalize each small batch separately using NormalizeData
    seurat_list <- lapply(seurat_list, function(obj) {
      obj <- NormalizeData(obj)
      return(obj)
    })
    
    # Then merge the normalized batches
    combined <- merge(seurat_list[[small_batches[1]]], y = seurat_list[small_batches[-1]])
    
    # Run FindVariableFeatures on the combined batch after normalization
    combined <- FindVariableFeatures(combined)
    print("Checking 'combined' variable's dimensions:")
    print(dim(combined))  # Check the dimensions of the combined object
    
    seurat_list <- seurat_list[!names(seurat_list) %in% small_batches]
    seurat_list[["combined_batch"]] <- combined
  }
  
  # Print updated cell counts after combining batches
  cell_counts <- sapply(seurat_list, ncol)
  print("Checking cell counts after combining batches")
  print(cell_counts)
  
  # Adjust number of PCs to the minimum between 10 and the number of cells in the smallest batch
  npcs <- min(10, min(cell_counts) - 1)
  
  # Check the calculated npcs and adjust to 5 if greater than 5
  print(paste("Calculated npcs:", npcs))  # Print the calculated npcs
  npcs <- min(npcs, 2)  # Ensure npcs does not exceed 2
  print(paste("Using npcs =", npcs))  # Confirm the final value of npcs
  
  # Scale data and Run PCA on the individual Seurat objects
  seurat_list <- lapply(seurat_list, function(obj) {
    obj <- ScaleData(obj)   # Scale the data before PCA
    obj <- RunPCA(obj, npcs = npcs, approx = FALSE)
    return(obj)
  })
  
  # Select features for integration
  features <- SelectIntegrationFeatures(object.list = seurat_list)
  
  # Adjust k.anchor based on the smallest batch (reduce to 5 for smaller batches)
  min_cells_in_batch <- min(cell_counts)
  k.anchor <- min(2, min_cells_in_batch)  # Use smaller k.anchor for small batches
  print(paste("Using k.anchor =", k.anchor))
  
  # Set dims based on the number of cells in the smallest batch (e.g., fewer than 10)
  rpca_result <- FindIntegrationAnchors(
    object.list = seurat_list,
    reduction = "rpca",
    dims = 1:2,  # Use fewer dimensions based on the smallest batch size
    k.anchor = k.anchor,  # Use the adjusted k.anchor
    anchor.features = features
  )
  
  # Integrate datasets
  integrated_data <- IntegrateData(anchorset = rpca_result, dims = 1:5)
  
  # Scale and run PCA on integrated data
  integrated_data <- ScaleData(integrated_data, features = rownames(integrated_data))
  integrated_data <- RunPCA(integrated_data, npcs = 5)
  print("Checking Dimension Reductions:")
  print(seurat_obj@reductions)
  print(head(seurat_obj@reductions$pca@cell.embeddings))
  
  # Run UMAP and plot
  integrated_data <- RunUMAP(integrated_data, dims = 1:5)
  DimPlot(integrated_data, reduction = "umap", group.by = "batch")
  dev.off()
}

And, the error message's this:

> apply_rpca(combined_counts, batches, combined_conditions)
Warning: Data is of class data.frame. Coercing to dgCMatrix.
Finding variable features for layer counts
Calculating gene variances
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating feature variances of standardized and clipped values
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Normalizing layer: counts
Performing log-normalization
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Normalizing layer: counts
Performing log-normalization
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Normalizing layer: counts
Performing log-normalization
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Finding variable features for layer counts.1
Calculating gene variances
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating feature variances of standardized and clipped values
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Finding variable features for layer counts.2
Calculating gene variances
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating feature variances of standardized and clipped values
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
[1] "Checking 'combined' variable's dimensions:"
[1] 50358    10
[1] "Checking cell counts after combining batches"
combined_batch 
            10 
[1] "Calculated npcs: 9"
[1] "Using npcs = 2"
Centering and scaling data matrix
  |=================================================================================================================| 100%
PC_ 1 
Positive:  ENSMUSG00000029314, ENSMUSG00000039361, ENSMUSG00000021895, ENSMUSG00000066278, ENSMUSG00000060519, ENSMUSG00000095595, ENSMUSG00000030643, ENSMUSG00000025041, ENSMUSG00000027999, ENSMUSG00000041057 
	   ENSMUSG00000049300, ENSMUSG00000025532, ENSMUSG00000035372, ENSMUSG00000030494, ENSMUSG00000041238, ENSMUSG00000036002, ENSMUSG00000026377, ENSMUSG00000042354, ENSMUSG00000039899, ENSMUSG00000028016 
	   ENSMUSG00000022009, ENSMUSG00000026558, ENSMUSG00000048218, ENSMUSG00000015711, ENSMUSG00000090213, ENSMUSG00000030278, ENSMUSG00000022747, ENSMUSG00000024679, ENSMUSG00000030966, ENSMUSG00000031765 
Negative:  ENSMUSG00000066071, ENSMUSG00000068457, ENSMUSG00000022181, ENSMUSG00000024421, ENSMUSG00000073842, ENSMUSG00000078672, ENSMUSG00000072849, ENSMUSG00000069049, ENSMUSG00000087141, ENSMUSG00000037977 
	   ENSMUSG00000068086, ENSMUSG00000078234, ENSMUSG00000013495, ENSMUSG00000025402, ENSMUSG00000041698, ENSMUSG00000027983, ENSMUSG00000041119, ENSMUSG00000035845, ENSMUSG00000074824, ENSMUSG00000003348 
	   ENSMUSG00000073830, ENSMUSG00000041372, ENSMUSG00000073834, ENSMUSG00000078597, ENSMUSG00000016028, ENSMUSG00000028976, ENSMUSG00000032177, ENSMUSG00000057068, ENSMUSG00000027242, ENSMUSG00000058613 
PC_ 2 

Positive:  ENSMUSG00000037826, ENSMUSG00000027513, ENSMUSG00000037795, ENSMUSG00000052934, ENSMUSG00000032531, ENSMUSG00000022383, ENSMUSG00000017639, ENSMUSG00000074623, ENSMUSG00000035878, ENSMUSG00000025357 
	   ENSMUSG00000028645, ENSMUSG00000025479, ENSMUSG00000025260, ENSMUSG00000032898, ENSMUSG00000057342, ENSMUSG00000041779, ENSMUSG00000049791, ENSMUSG00000039958, ENSMUSG00000030055, ENSMUSG00000047379 
	   ENSMUSG00000024254, ENSMUSG00000033318, ENSMUSG00000025059, ENSMUSG00000021509, ENSMUSG00000024900, ENSMUSG00000091264, ENSMUSG00000029630, ENSMUSG00000054723, ENSMUSG00000028132, ENSMUSG00000040466 
Negative:  ENSMUSG00000020538, ENSMUSG00000038146, ENSMUSG00000055413, ENSMUSG00000045934, ENSMUSG00000019235, ENSMUSG00000030555, ENSMUSG00000056313, ENSMUSG00000079442, ENSMUSG00000035686, ENSMUSG00000025153 
	   ENSMUSG00000038233, ENSMUSG00000029093, ENSMUSG00000052684, ENSMUSG00000053604, ENSMUSG00000032113, ENSMUSG00000017715, ENSMUSG00000021670, ENSMUSG00000034371, ENSMUSG00000029016, ENSMUSG00000029482 
	   ENSMUSG00000030302, ENSMUSG00000044254, ENSMUSG00000030103, ENSMUSG00000060183, ENSMUSG00000032596, ENSMUSG00000028378, ENSMUSG00000044505, ENSMUSG00000044350, ENSMUSG00000011034, ENSMUSG00000030064 
[1] "Using k.anchor = 2"
Scaling features for provided objects
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s  
Computing within dataset neighborhoods
  |                                                  | 0 % ~calculating  Error in idx[i, ] <- res[[i]][[1]] : 
  number of items to replace is not a multiple of replacement length
In addition: Warning messages:
1: In .refine_k(k, precomputed, query = TRUE) :
  'k' capped at the number of observations
2: In .refine_k(k, precomputed, query = TRUE) :
  'k' capped at the number of observations
3: In print.DimReduc(x = reduction.data, dims = ndims.print, nfeatures = nfeatures.print) :
  Only 2 dimensions have been computed.
4: In check_numbers(k = k, nu = nu, nv = nv, limit = min(dim(x)) -  :
  more singular values/vectors requested than available
5: In .refine_k(k, precomputed, query = TRUE) :
  'k' capped at the number of observations
6: In .refine_k(k, precomputed, query = TRUE) :
  'k' capped at the number of observations
7: In apply_rpca(combined_counts, batches, combined_conditions) :
  Merging small batches: I, IA
8: In print.DimReduc(x = reduction.data, dims = ndims.print, nfeatures = nfeatures.print) :
  Only 2 dimensions have been computed.

Found more than one class "dist" in cache; using the first, from namespace 'BiocGenerics'
Also defined by ‘spam’
>

t-whalley mentioned this issue Aug 30, 2022

Error in idx[i, ] <- res[[i]][[1]] : number of items to replace is not a multiple of replacement length in IntegrateData() #6359

Closed

Gesmira closed this as completed Jul 7, 2023

rayajallad mentioned this issue Jul 25, 2023

Finding integration anchors for small datasets generates an error #7612

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IntegrateData error: number of items to replace is not a multiple of replacement length #6341

IntegrateData error: number of items to replace is not a multiple of replacement length #6341

XinmiaoYan commented Aug 24, 2022

Gesmira commented Aug 26, 2022

XinmiaoYan commented Aug 26, 2022

Gesmira commented Aug 26, 2022

XinmiaoYan commented Aug 26, 2022

Gesmira commented Aug 26, 2022

XinmiaoYan commented Aug 26, 2022

XinmiaoYan commented Oct 11, 2022 via email

decodebiology commented Oct 27, 2022

ajynair commented Jan 12, 2023

levinhein commented Jun 20, 2023 •

edited

Loading

ajynair commented Jul 3, 2023

Gesmira commented Jul 7, 2023 •

edited

Loading

tud03125 commented Sep 24, 2024

IntegrateData error: number of items to replace is not a multiple of replacement length #6341

IntegrateData error: number of items to replace is not a multiple of replacement length #6341

Comments

XinmiaoYan commented Aug 24, 2022

Gesmira commented Aug 26, 2022

XinmiaoYan commented Aug 26, 2022

Gesmira commented Aug 26, 2022

XinmiaoYan commented Aug 26, 2022

Gesmira commented Aug 26, 2022

XinmiaoYan commented Aug 26, 2022

XinmiaoYan commented Oct 11, 2022 via email

decodebiology commented Oct 27, 2022

ajynair commented Jan 12, 2023

levinhein commented Jun 20, 2023 • edited Loading

ajynair commented Jul 3, 2023

Gesmira commented Jul 7, 2023 • edited Loading

tud03125 commented Sep 24, 2024

levinhein commented Jun 20, 2023 •

edited

Loading

Gesmira commented Jul 7, 2023 •

edited

Loading