Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry: on model specification #156

Open
nangosyah opened this issue May 21, 2024 · 3 comments
Open

Inquiry: on model specification #156

nangosyah opened this issue May 21, 2024 · 3 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@nangosyah
Copy link

Hi I wanted to inquire on the specification of the models for this package, I'm trying to implement the Bayesian model from the MixAK extension "lcMethodMixAK_GLMM"

my data:
HDX(deuterium exchange values) - continuous,
Peptide (peptide id) - continuous ,
condition (factor variable),
SampleID (factor variable),
Time (factor variable)

When I fit the model this way:

mixAKMethod <- lcMethodMixAK_GLMM(fixed = HDX_transformed ~ Time,
random = ~SampleID,
id = "Peptide",
time = "Time",
nClusters = 2)

mixAK <- latrend(mixAKMethod, data = sc9)

why do I get this type of error, is the specification wrong ?

Error in str2lang(x) : :1:63: unexpected numeric constant
1: y ~ 1 + Time0 + Time10 + Time60 + Time3600 + SampleIDSample 1
^
11.
str2lang(x)
10.
formula.character(paste("y ~ 1 +", paste(colnames(x[[s]]), collapse = " + "),
" + ", paste(colnames(z[[s]]), collapse = " + "), " + (1 +",
paste(colnames(z[[s]]), collapse = " + "), " | id)"))
9.
formula(paste("y ~ 1 +", paste(colnames(x[[s]]), collapse = " + "),
" + ", paste(colnames(z[[s]]), collapse = " + "), " + (1 +",
paste(colnames(z[[s]]), collapse = " + "), " | id)"))
8.
GLMM_MCMCifit(do.init = TRUE, na.complete = FALSE, y = dd$y,
dist = dd$dist, id = dd$id, time = dd$time, x = dd$x, z = dd$z,
random.intercept = dd$random.intercept, xempty = dd$xempty,
zempty = dd$zempty, Rc = dd$Rc, Rd = dd$Rd, p = dd$p, p_fi = dd$p_fi, ...
7.
(function (y, dist = "gaussian", id, x, z, random.intercept,
prior.alpha, init.alpha, init2.alpha, scale.b, prior.b, init.b,
init2.b, prior.eps, init.eps, init2.eps, nMCMC = c(burn = 10,
keep = 10, thin = 1, info = 10), tuneMCMC = list(alpha = 1, ...
6.
do.call(mixAK::GLMM_MCMC, args)
5.
fit(method = method, data = data, envir = modelEnv, verbose = verbose)
4.
fit(method = method, data = data, envir = modelEnv, verbose = verbose)
3.
suppressFun({
modelEnv = preFit(method = method, data = data, envir = envir,
verbose = verbose)
model = fit(method = method, data = data, envir = modelEnv, ...
2.
.fitLatrendMethod(cmethod, modelData, envir = modelEnv, mc = mc,
verbose = verbose)
1.
latrend(mixAKMethod, data = sc9)

@niekdt niekdt self-assigned this May 23, 2024
@niekdt
Copy link
Collaborator

niekdt commented May 23, 2024

hi, I think the problem is that your Time column is a factor variable. It should be numeric.
If that is indeed the case, I need to add automatic checks to latrend to catch this.

You have two options, depending on what makes sense for your analysis:

  • Convert the factor levels to their actual values (0, 10, 60, 3600), by creating a new Time column defined as as.numeric(as.character(sc9$Time))
  • Use the time ordering only (1, 2, 3, 4), using as.numeric(sc9$Time)

@niekdt niekdt added the bug Something isn't working label May 23, 2024
@nangosyah
Copy link
Author

Hi thank you for your timely feedback, I have tried playing around with the data types in my dataset changing them to either numeric or factor variables as suggested.

I intended to fit the model below:

HDX = Time + condition + Time*condition with random effect for the sample.

I have realised that if I have a combination of both factor variables and numeric variables the package doesn't seem to like it and will return the error below:

 ----------------------------------------------------------------------
 - Longitudinal clustering using: generalized linear mixed model with normal random effects mixture
 ----------------------------------------------------------------------
 Method arguments:
  time:           "Time"
  id:             "Peptide"
  nClusters:      3
  dist:           "gaussian"
  nMCMC:          c(burn = 10, keep = 10, thin = 1, info =
  tuneMCMC:       list(alpha = 1, b = 1)
  store:          c(b = FALSE)
  PED:            TRUE
  keep.chains:    TRUE
  dens.zero:      9.99999999999999e-301
  parallel:       FALSE
  fixed:          HDX_transformed ~ Time
  random:         ~SampleID
 ----------------------------------------------------------------------
 Checking and transforming the training data format.
 Preparing the training data for fitting...
 Fitting the method...

Error in data.frame(Est = lme4::fixef(ifit)[iRAND], SE = sqrt(diag(as.matrix(vcov(ifit)))[iRAND])) :
row names contain missing values

If I then change all the variables in the model to numerical variables it seems to work perfectly and is able to do the clustering, I'm curious if this is how the package is set out to operate and why that could be the case.

Thanks.

@niekdt
Copy link
Collaborator

niekdt commented May 24, 2024

Combining numeric and factor covariates should be possible, since mixAK::GLMM_MCMC uses numeric model matrices. It's latrend that does the automatic factor conversion. This functionality is not well-tested yet unfortunately, as you have experienced.

I'll look into it in the coming days.

@niekdt niekdt added this to the 1.6.2 milestone May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants