Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nebula is failing to recognize that there are the same number of subject ids as count columns #25

Closed
AngCamp opened this issue Jun 15, 2023 · 6 comments

Comments

@AngCamp
Copy link

AngCamp commented Jun 15, 2023

I created a list like the sample_data you provide, with the model matrix. Here is its structure....

>str(dkkl1_nebula_g)

List of 4
 $ count :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. ..@ i       : int [1:8465852] 2 3 7 9 11 14 17 20 21 23 ...
  .. ..@ p       : int [1:1165] 0 7897 16819 24435 32635 40432 48513 55924 60459 64383 ...
  .. ..@ Dim     : int [1:2] 23355 1164
  .. ..@ Dimnames:List of 2
  .. .. ..$ : chr [1:23355] "00R-AC107638.2" "0610005C13Rik" "0610007P14Rik" "0610009B22Rik" ...
  .. .. ..$ : chr [1:1164] "B1_T6_K7_S83_mouse1" "D6_T3_H15_S91_mouse1" "E3_T6_A10_S146_mouse1" "B7_T6_A8_S144_mouse1" ...
  .. ..@ x       : num [1:8465852] 57 35 1 48 42 13 2 17 103 14 ...
  .. ..@ factors : list()
 $ id    : num [1:1164] 1 1 1 1 1 1 1 1 1 1 ...
 $ pred  : num [1:1164, 1:9] 1 1 1 1 1 1 1 1 1 1 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:1164] "1" "2" "3" "4" ...
  .. ..$ : chr [1:9] "(Intercept)" "ConditionContext-Only:LabeltdT+" "ConditionFear-Only:LabeltdT+" "ConditionFear-Recall:LabeltdT+" ...
 $ offset: num [1:1164] 1 1 1 1 1 1 1 1 1 1 ...

I have grouped it with group_cell(), but for some reason when I run nebula on it, it does not recognize that the sample id's are the same length as the number of columns (cells) in the data. What am I doing wrong? The only difference I see between my object and your sample_data object is that mine contains the cell names.

Running nebula on the list above produces this error:

results.dkkl1.nebula <- nebula(dkkl1_nebula_g$count, dkkl1_nebula_g$sid,
                               pred=dkkl1_nebula_g$pred, ncore=2)

Error message:

Error in nebula(dkkl1_nebula_g$count, dkkl1_nebula_g$sid, pred = dkkl1_nebula_g$pred, : The length of subject IDs should be equal to the number of columns of the count matrix.
Traceback:

1. nebula(dkkl1_nebula_g$count, dkkl1_nebula_g$sid, pred = dkkl1_nebula_g$pred, 
 .     ncore = 2)
2. stop("The length of subject IDs should be equal to the number of columns of the count matrix.")

EDIT Not sure if this could also be the issue but the model I am tyring to fit is as follows:
dkkl1.nebula.df = model.matrix(~Condition:Label, data=dkkl1_nebula$pred)

@AngCamp AngCamp changed the title Nebula is failing to recognize that the subject ids match the cells. Nebula is failing to recognize that the subject ids match the cell names Jun 15, 2023
@AngCamp AngCamp changed the title Nebula is failing to recognize that the subject ids match the cell names Nebula is failing to recognize that there are the same number of subject ids as count columns Jun 15, 2023
@Raghav1881
Copy link
Collaborator

Your pred column in the list dkkl1_nebula_g$pred should not contain the model matrix. Within dkkl1_nebula_g$pred, you should only have predictors associated with each of the cells which you use to build dkkl1.nebula.df i.e. metadata from the original object. If your original object was a Seurat object for example, your predictors would just be dkkl1_nebula_g$pred <- seurat_object$predictor, then build your model matrix from the dkkl1_nebula_g$pred.

@lhe17
Copy link
Owner

lhe17 commented Jun 19, 2023 via email

@AngCamp
Copy link
Author

AngCamp commented Jun 23, 2023

Thanks I will try these things out.

@AngCamp
Copy link
Author

AngCamp commented Jun 23, 2023

Thanks these two solutions fixed it. I think it's worth noting that that it's a little unnecessarily confusing that you use data$sid in your tutorial. Also I know most people will probably use a Seurat object but it may be useful for you to provide an explanation for people working with standard csv's how to make an object that works with your package. Most data on GEO as well is stored as a .csv so often people working with publicly available data won't be using sparse matrices, at least not to do simple preprocessing like gene filtering.

I did the following:

# create counts for cell type(s) of interest, do gene filtering first
# in my case this gave me a dataframe called dkkl1.counts.df
# this can now be made into the counts matrix

dkkl1_nebula <- vector(mode = "list", length = 4)
dkkl1_nebula$count <- Matrix(as.matrix(dkkl1.counts.df ),sparse=TRUE)
dim(dkkl1_nebula$count)
dkkl1_nebula$count[1:5,1:5]
233551164
5 x 5 sparse Matrix of class "dgCMatrix"
               B1_T6_K7_S83_mouse1 D6_T3_H15_S91_mouse1 E3_T6_A10_S146_mouse1
00R-AC107638.2                   .                    .                     .
0610005C13Rik                    .                    .                     .
0610007P14Rik                   57                   13                     6
0610009B22Rik                   35                   27                    32
0610009E02Rik                    .                    .                     .
               B7_T6_A8_S144_mouse1 B4_T8_I19_S47_mouse1
00R-AC107638.2                    .                    .
0610005C13Rik                     .                    .
0610007P14Rik                   116                   26
0610009B22Rik                    76                    .
0610009E02Rik                     .                    6

Just a suggestion, could save a user some googling. Many of your users are also going to be biologists (like me) with limited programming experience and may not be familiar with sparse matrices. Might increase the user base if you can save them time with little things like this. Idiot proofing the tutorial for people like me can go a long way.

@AngCamp
Copy link
Author

AngCamp commented Jun 23, 2023

It may help to add a small paragraph to the tutorial just explaining the object nebula is expecting, I know it's easy to deduce by simply running str(sample_data) and by reading the documentation of the functions but it's easy to miss little things if they are not explicitly spelled out. A short paragraph could save a user a lot of time trawling through your documentation, arguably unnecessarily, since it would be quite easy to explain. Also just to reiterate, many users are going to be biologists with limited programming experience. It will not occur to them to do the things I listed above. Seurat has a wide user base not just because it is the "best" package, arguably it is not, but it does have the best tutorials. Users can easily pick the package up and learn to use it.

Thanks for the help =) btw, its appreciated.

@lhe17
Copy link
Owner

lhe17 commented Jun 23, 2023 via email

@AngCamp AngCamp closed this as completed Jun 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants