Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use the aggregateFeatures and msqrob functions #38

Open
mbonhomme opened this issue Sep 17, 2022 · 8 comments
Open

How to use the aggregateFeatures and msqrob functions #38

mbonhomme opened this issue Sep 17, 2022 · 8 comments

Comments

@mbonhomme
Copy link

mbonhomme commented Sep 17, 2022

Dear statOmics Team,

Thank you for the great msqrob2 package. Msqrob2 is built for proteomics experiments but seems to be a powerful tool to analyze my metabolomic data. I am currently using your code on my metabolomic data and having some questions to adapt it as best as possible.

I am starting directly with a matrix with multiple features raw intensities (in rows) for my different sample (in columns). I log-transformed and normalized the data. I provided informations about my experimental conditions with colData.
My question is about the Summarization to protein level step. Unlike proteomic, I do not have a assay about protein expression value and would like to continue working with the same pe with my features intensities to build the model fitting my design. How should I use this aggregateFeatures function in order to continue the analysis?

  • Does the model built at this step is needed for the rest of the process? Or Could I skip that step and start the data analysis with the msqrob function?

  • In one of my simple experiment, where I use a simple model for one condition (formula = ~condition) it seems to work pretty well.
    But on a more complex design the function msqrob (formula = ~ group*time + patient) give me an error message “Error in if (m == 0) { : missing value where TRUE/FALSE needed). I do not find any missing value on the matrix.
    I think it is related to the way my variables conditions are incoded on colData, but I can not find the solution of that. Group has 2 levels, Time has 2 levels and patient have 11 levels. The patients in group1 are different than the one in group2.

I thank you in advance; any help or explanation will be appreciated. I can send more information if needed.
I will be very grateful if you could help me with this.

Regards,

@lgatto
Copy link
Collaborator

lgatto commented Sep 17, 2022

  • Regarding the aggregation, simply ignore it if it's not relevant and proceed with the estimation and inference.
  • The error doesn't suggest that you have missing values in your data. Check the factors in your colData- the error can be the result of excessive levels.

@mbonhomme
Copy link
Author

Thank you very much for that very quick help!
could you explain me more about the levels? What is excessive ? Group has 2 levels, Time has 2 levels and patient has 11 level (one for each patient, they are repeated measures on Time, not in group; total of 22 samples).

thank you for helping, really appreciated

Regards,

@lgatto
Copy link
Collaborator

lgatto commented Sep 17, 2022

x2 below has 3 levels (because it is a subset of x), but only two of the three are left. If you referred to x2 in the formula, it would lead to that very same error.

> x <- factor(LETTERS[1:3])
> x
[1] A B C
Levels: A B C
> x2 <- x[1:2]
> x2
[1] A B
Levels: A B C

@ococrook
Copy link
Collaborator

If your formula is

formula = ~ group*time + patient

You are actually fitting

formula = ~ group + time + group:time + patient

You don't have enough samples to fit this model I don't think. Are patients nested within groups? You may want one of the following but I don't know enough about your question, model or data to be more helpful

formula = ~ group + time + group:time + (group|patient)
formula = ~ group + time + group:time + (time|patient)
formula = ~ group + time + group:time + (1|patient)

@mbonhomme
Copy link
Author

Thank you very much for you answers, it s really helpful.

I see the issue now.
I have 11 patients divided in 2 (unbalanced, n=6 and n=5) groups, they are measured at 2 time points.
I am investigating the differential expression between the 2 time point for the patient in group 1 and group 2, I would like to take into account that the measures are paired for the time (but not for the group: different patient in Gp1 vs. Gp2).
The appropriate formula seems to be : formula = ~ group + time + group:time + (1|patient). Correct?
However, msqrob does not seem to accept this formula.

This is how I created my colData

colData(pe)$patient <- rep(c("P1","P1","P2","P2","P4","P4","P7","P7","P8","P8","P9","P9","P10","P10","P11","P11","P3","P3","P5","P5","P6","P6")) %>% as.factor
=> 11 levels

colData(pe)$time <- rep(c("T0","T2","T0","T2","T0","T2","T0","T2","T0","T2","T0","T2","T0","T2","T0","T2","T0","T2","T0","T2","T0","T2"))%>% as.factor
=>2 levels

colData(pe)$group <- rep(c("Gp1","Gp1","Gp1","Gp1","Gp1","Gp1","Gp1","Gp1","Gp1","Gp1","Gp1","Gp1","Gp2","Gp2","Gp2","Gp2","Gp2","Gp2","Gp2","Gp2","Gp2","Gp2"))%>% as.factor
=>2 levels

Thank you very much,
Have a nice day,

Regards,

@mbonhomme
Copy link
Author

Hello,
I am sorry to come back with this topic, I am still unable to fit the model I need. Maybe someone here can see my mistake (details are in the previous reply) ? It is now probably more a statistical question rather than a question related to your package. But my formula "= ~ group + time + group:time + (1|patient)" does not work with msqrob.

Thanks in advance,

@ococrook
Copy link
Collaborator

ococrook commented Oct 5, 2022

@StijnVandenbulcke Do you have time to have a look at this?

@StijnVandenbulcke
Copy link
Collaborator

StijnVandenbulcke commented Oct 16, 2022

That formula is used for mixed models. In order to use this with msqrob2 you will have to set ridge = TRUE. Currently you cannot use mixed models without ridge regression, even though this is recommended we will update the package so that you can use mixed models without ridge regression.

If you plan to use the ridge regression, you should use this branch as this includes an important fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants