Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

msqrob error #33

Open
cdlawless opened this issue Mar 10, 2022 · 10 comments
Open

msqrob error #33

cdlawless opened this issue Mar 10, 2022 · 10 comments

Comments

@cdlawless
Copy link

Hi,

I'm getting the following error when running

se<-msqrob(object=pe, i="proteinRobust", formula= ~pH*matrix + matrix_pH_group, ridge=TRUE)

I get:

Error in if (m == 0) { : missing value where TRUE/FALSE needed

I can get around it somewhat by add the matrix_pH_group as a random effect, as in:

se<-msqrob(object=pe, i="proteinRobust", formula= ~pH*matrix + (1|matrix_pH_group), ridge=TRUE)

But there are n<5 repeats in that grouping factor.

Let me know if you need more information from me to help with this.

@lgatto
Copy link
Collaborator

lgatto commented Sep 13, 2022

This is indeed a critical bug related to this discussion. The developer must use getWithCols(x, i) instead of x[[i]].

OK, it's not related to this, but I would still suggest to be careful, this will lead to an error for more complex datasets.

@lgatto
Copy link
Collaborator

lgatto commented Sep 13, 2022

The issue might actually not be related to this (or not only to this) and requires more investigation.

@lgatto
Copy link
Collaborator

lgatto commented Sep 13, 2022

@cdlawless - could it be that some of the factors you are using in your formula have more levels than you intend to?

@Aokht17
Copy link

Aokht17 commented Nov 20, 2023

Hi,

I have a rather complex experimental design (please find the table with colData attached). I am trying to run

pe <- msqrob(object = pe, i = "proteinRobust", formula = ~batch*infection*virus*time, overwrite=TRUE)

and get this error:

Error in if (m == 0) { : missing value where TRUE/FALSE needed

When I exclude "virus" from the formula, everything works ok (but I cannot do this, obviously). I also set up the levels in the beginning:

colData(pe)$batch <- a$batch %>% factor(levels = c("B12", "B16"))
colData(pe)$infection <- a$infection %>% factor(levels = c("non-inf", "infected"))
colData(pe)$virus <- a$virus %>% factor(levels = c("mock", "Flu_PR8_wt", "Flu_PR8_delNS1", "Flu_SC35M_wt", "Flu_SC35M_delNS1", "THOV_wt", "THOV_delNS1"))
colData(pe)$time <- a$timepoint %>% factor(levels = c("t1", "t2", "t3", "t4", "t5", "t6"))
colData(pe)$rep <- a$rep %>% factor()

but this should work fine.

Would really appreciate your opinion on this

coldata_pe.txt

@ococrook
Copy link
Collaborator

Hi!

This maybe a statistics problem. Your variable infection, virus and batch are not identifiable. You may need to drop batch and/or infection from your model. Let me know if that doesn't work.

Best wishes,
Olly

@Aokht17
Copy link

Aokht17 commented Nov 20, 2023

I don't quite understand what you mean by "not identifiable". I do need them for the contrasts. It will run (somehow) if I remove both "batch" and "infection", but then in the next step
getCoef(rowData(pe[["proteinRobust"]])$msqrobModels[[2]])
getCoef(rowData(pe[["proteinRobust"]])$msqrobModels[[1]])
I get

[1] NA

"Virus" works only if it's the only parameter (~virus)

@ococrook
Copy link
Collaborator

The design you have doesn't allow you to infer certain groups. You cant differentiate between infected and non-infected at the same time as virus because you don't observe non-infected and virus. You may need to fit seperate regressions if you are interested in those contrasts.

@Aokht17
Copy link

Aokht17 commented Nov 27, 2023

I cannot agree completely, but let's imagine I remove levels of infection and batch. The thing is that even simple ~virus*time doesn't work (with ridge and without it). With ridge=FALSE it runs and returns NA, and with ridge=TRUE it takes forever to run (3 days and then I just interrupted the process).

@ococrook
Copy link
Collaborator

Given that some version of model work and not other, I dont think its a problem with the package exactly though it may be too slow for you data. I suggest you try a few tests to see where the problems are (as it could be in many places:)

  • Try your model with just the lm function in R
  • Use the rlm function from the MASS package
  • Use msqrob with only 1 timepoint

From here we can assess whether is one of the following 1) a modelling issues 2) a stats issue 3) a coding issue - otherwise we dont know right now.

Thanks!

also looping in @StijnVandenbulcke

@Aokht17
Copy link

Aokht17 commented Dec 1, 2023

Hello,

lmfit function works well with model.matrix(~virus*time + batch).
lm for 1 selected protein also worked without errors
rlm gives an error

'x' is singular: singular fits are not implemented in 'rlm'.

Imputation solves this issue

Running msqrob with 1 timepoint still returns NA in getCoef

Do you think this is because of a larger number of missing values for some conditions, or the model is "overwhelmed" with too many comparisons?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants