Nebula is failing to recognize that there are the same number of subject ids as count columns #25

AngCamp · 2023-06-15T00:09:36Z

I created a list like the sample_data you provide, with the model matrix. Here is its structure....

>str(dkkl1_nebula_g)

List of 4
 $ count :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. ..@ i       : int [1:8465852] 2 3 7 9 11 14 17 20 21 23 ...
  .. ..@ p       : int [1:1165] 0 7897 16819 24435 32635 40432 48513 55924 60459 64383 ...
  .. ..@ Dim     : int [1:2] 23355 1164
  .. ..@ Dimnames:List of 2
  .. .. ..$ : chr [1:23355] "00R-AC107638.2" "0610005C13Rik" "0610007P14Rik" "0610009B22Rik" ...
  .. .. ..$ : chr [1:1164] "B1_T6_K7_S83_mouse1" "D6_T3_H15_S91_mouse1" "E3_T6_A10_S146_mouse1" "B7_T6_A8_S144_mouse1" ...
  .. ..@ x       : num [1:8465852] 57 35 1 48 42 13 2 17 103 14 ...
  .. ..@ factors : list()
 $ id    : num [1:1164] 1 1 1 1 1 1 1 1 1 1 ...
 $ pred  : num [1:1164, 1:9] 1 1 1 1 1 1 1 1 1 1 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:1164] "1" "2" "3" "4" ...
  .. ..$ : chr [1:9] "(Intercept)" "ConditionContext-Only:LabeltdT+" "ConditionFear-Only:LabeltdT+" "ConditionFear-Recall:LabeltdT+" ...
 $ offset: num [1:1164] 1 1 1 1 1 1 1 1 1 1 ...

I have grouped it with group_cell(), but for some reason when I run nebula on it, it does not recognize that the sample id's are the same length as the number of columns (cells) in the data. What am I doing wrong? The only difference I see between my object and your sample_data object is that mine contains the cell names.

Running nebula on the list above produces this error:

results.dkkl1.nebula <- nebula(dkkl1_nebula_g$count, dkkl1_nebula_g$sid,
                               pred=dkkl1_nebula_g$pred, ncore=2)

Error message:

Error in nebula(dkkl1_nebula_g$count, dkkl1_nebula_g$sid, pred = dkkl1_nebula_g$pred, : The length of subject IDs should be equal to the number of columns of the count matrix.
Traceback:

1. nebula(dkkl1_nebula_g$count, dkkl1_nebula_g$sid, pred = dkkl1_nebula_g$pred, 
 .     ncore = 2)
2. stop("The length of subject IDs should be equal to the number of columns of the count matrix.")

EDIT Not sure if this could also be the issue but the model I am tyring to fit is as follows:
dkkl1.nebula.df = model.matrix(~Condition:Label, data=dkkl1_nebula$pred)

The text was updated successfully, but these errors were encountered:

Raghav1881 · 2023-06-16T16:55:17Z

Your pred column in the list dkkl1_nebula_g$pred should not contain the model matrix. Within dkkl1_nebula_g$pred, you should only have predictors associated with each of the cells which you use to build dkkl1.nebula.df i.e. metadata from the original object. If your original object was a Seurat object for example, your predictors would just be dkkl1_nebula_g$pred <- seurat_object$predictor, then build your model matrix from the dkkl1_nebula_g$pred.

lhe17 · 2023-06-19T07:28:14Z

Hi AngCam, I'm not sure why my previous reply four days ago does not show up on this thread. I think the error is in dkkl1_nebula_g$sid when used as an input for nebula. It should be dkkl1_nebula_g$id. Best regards, Liang

…

On Thu, Jun 15, 2023 at 2:09 AM AngCamp ***@***.***> wrote: I created a list like the sample_data you provide, with the model matrix. Here is its structure.... >str(dkkl1_nebula_g) List of 4 $ count :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots .. ..@ i : int [1:8465852] 2 3 7 9 11 14 17 20 21 23 ... .. ..@ p : int [1:1165] 0 7897 16819 24435 32635 40432 48513 55924 60459 64383 ... .. ..@ Dim : int [1:2] 23355 1164 .. ..@ Dimnames:List of 2 .. .. ..$ : chr [1:23355] "00R-AC107638.2" "0610005C13Rik" "0610007P14Rik" "0610009B22Rik" ... .. .. ..$ : chr [1:1164] "B1_T6_K7_S83_mouse1" "D6_T3_H15_S91_mouse1" "E3_T6_A10_S146_mouse1" "B7_T6_A8_S144_mouse1" ... .. ..@ x : num [1:8465852] 57 35 1 48 42 13 2 17 103 14 ... .. ..@ factors : list() $ id : num [1:1164] 1 1 1 1 1 1 1 1 1 1 ... $ pred : num [1:1164, 1:9] 1 1 1 1 1 1 1 1 1 1 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:1164] "1" "2" "3" "4" ... .. ..$ : chr [1:9] "(Intercept)" "ConditionContext-Only:LabeltdT+" "ConditionFear-Only:LabeltdT+" "ConditionFear-Recall:LabeltdT+" ... $ offset: num [1:1164] 1 1 1 1 1 1 1 1 1 1 ... I have grouped it with group_cell(), but for some reason it does not recognize that the cell names are provided and that the sample id's are the same length as the number of columns (cells) in the data. What am I doing wrong? Running nebula on the list above produces this error: results.dkkl1.nebula <- nebula(dkkl1_nebula_g$count, dkkl1_nebula_g$sid, pred=dkkl1_nebula_g$pred, ncore=2) *Error message:* Error in nebula(dkkl1_nebula_g$count, dkkl1_nebula_g$sid, pred = dkkl1_nebula_g$pred, : The length of subject IDs should be equal to the number of columns of the count matrix. Traceback: 1. nebula(dkkl1_nebula_g$count, dkkl1_nebula_g$sid, pred = dkkl1_nebula_g$pred, . ncore = 2) 2. stop("The length of subject IDs should be equal to the number of columns of the count matrix.") — Reply to this email directly, view it on GitHub <#25>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGDISUTRQ5PRP4YQXJ6P72DXLJHEXANCNFSM6AAAAAAZHCLSWE> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

AngCamp · 2023-06-23T06:11:45Z

Thanks I will try these things out.

AngCamp · 2023-06-23T18:09:29Z

Thanks these two solutions fixed it. I think it's worth noting that that it's a little unnecessarily confusing that you use data$sid in your tutorial. Also I know most people will probably use a Seurat object but it may be useful for you to provide an explanation for people working with standard csv's how to make an object that works with your package. Most data on GEO as well is stored as a .csv so often people working with publicly available data won't be using sparse matrices, at least not to do simple preprocessing like gene filtering.

I did the following:

# create counts for cell type(s) of interest, do gene filtering first
# in my case this gave me a dataframe called dkkl1.counts.df
# this can now be made into the counts matrix

dkkl1_nebula <- vector(mode = "list", length = 4)
dkkl1_nebula$count <- Matrix(as.matrix(dkkl1.counts.df ),sparse=TRUE)
dim(dkkl1_nebula$count)
dkkl1_nebula$count[1:5,1:5]

233551164
5 x 5 sparse Matrix of class "dgCMatrix"
               B1_T6_K7_S83_mouse1 D6_T3_H15_S91_mouse1 E3_T6_A10_S146_mouse1
00R-AC107638.2                   .                    .                     .
0610005C13Rik                    .                    .                     .
0610007P14Rik                   57                   13                     6
0610009B22Rik                   35                   27                    32
0610009E02Rik                    .                    .                     .
               B7_T6_A8_S144_mouse1 B4_T8_I19_S47_mouse1
00R-AC107638.2                    .                    .
0610005C13Rik                     .                    .
0610007P14Rik                   116                   26
0610009B22Rik                    76                    .
0610009E02Rik                     .                    6

Just a suggestion, could save a user some googling. Many of your users are also going to be biologists (like me) with limited programming experience and may not be familiar with sparse matrices. Might increase the user base if you can save them time with little things like this. Idiot proofing the tutorial for people like me can go a long way.

AngCamp · 2023-06-23T18:18:49Z

It may help to add a small paragraph to the tutorial just explaining the object nebula is expecting, I know it's easy to deduce by simply running str(sample_data) and by reading the documentation of the functions but it's easy to miss little things if they are not explicitly spelled out. A short paragraph could save a user a lot of time trawling through your documentation, arguably unnecessarily, since it would be quite easy to explain. Also just to reiterate, many users are going to be biologists with limited programming experience. It will not occur to them to do the things I listed above. Seurat has a wide user base not just because it is the "best" package, arguably it is not, but it does have the best tutorials. Users can easily pick the package up and learn to use it.

Thanks for the help =) btw, its appreciated.

lhe17 · 2023-06-23T19:55:00Z

Hi AngCamp, Thank you for your suggestions. They will be considered in updated versions. Best regards, Liang

…

On Fri, Jun 23, 2023 at 8:19 PM AngCamp ***@***.***> wrote: It may help to add a small paragraph to the tutorial just explaining the object nebula is expecting, I know it's easy to deduce by simply running str(sample_data) and by reading the documentation of the functions but it's easy to miss little things if they are not explicitly spelled out. A short paragraph could save a user a lot of time trawling through your documentation, arguably unnecessarily, since it would be quite easy to explain. Also just to reiterate, many users are going to be biologists with limited programming experience. It will not occur to them to do the things I listed above. Seurat has a wide user base not just because it is the "best" package, arguably it is not, but it does have the best tutorials. Users can easily pick the package up and learn to use it. — Reply to this email directly, view it on GitHub <#25 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGDISUVO62MQE3W3PBKIC7DXMXMZJANCNFSM6AAAAAAZHCLSWE> . You are receiving this because you commented.Message ID: ***@***.***>

AngCamp changed the title ~~Nebula is failing to recognize that the subject ids match the cells.~~ Nebula is failing to recognize that the subject ids match the cell names Jun 15, 2023

AngCamp changed the title ~~Nebula is failing to recognize that the subject ids match the cell names~~ Nebula is failing to recognize that there are the same number of subject ids as count columns Jun 15, 2023

AngCamp closed this as completed Jun 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nebula is failing to recognize that there are the same number of subject ids as count columns #25

Nebula is failing to recognize that there are the same number of subject ids as count columns #25

AngCamp commented Jun 15, 2023 •

edited

Loading

Raghav1881 commented Jun 16, 2023

lhe17 commented Jun 19, 2023 via email

AngCamp commented Jun 23, 2023 •

edited

Loading

AngCamp commented Jun 23, 2023 •

edited

Loading

AngCamp commented Jun 23, 2023 •

edited

Loading

lhe17 commented Jun 23, 2023 via email

Nebula is failing to recognize that there are the same number of subject ids as count columns #25

Nebula is failing to recognize that there are the same number of subject ids as count columns #25

Comments

AngCamp commented Jun 15, 2023 • edited Loading

Raghav1881 commented Jun 16, 2023

lhe17 commented Jun 19, 2023 via email

AngCamp commented Jun 23, 2023 • edited Loading

AngCamp commented Jun 23, 2023 • edited Loading

AngCamp commented Jun 23, 2023 • edited Loading

lhe17 commented Jun 23, 2023 via email

AngCamp commented Jun 15, 2023 •

edited

Loading

AngCamp commented Jun 23, 2023 •

edited

Loading

AngCamp commented Jun 23, 2023 •

edited

Loading

AngCamp commented Jun 23, 2023 •

edited

Loading