Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Differing number of replicates in datasets #3

Open
nlgittens opened this issue Apr 29, 2024 · 2 comments
Open

[Feature Request] Differing number of replicates in datasets #3

nlgittens opened this issue Apr 29, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@nlgittens
Copy link

Issue: ReX can only handle datasets in which there is identical number of replicates across timepoints (perhaps across states also?).

This may be quite a common issue as there can be missed timepoints in certain datasets; different number of non-deuterated experiments; different number of replicates between states, which cannot be handled here.

Might be something to do with matrix being defined by number of timepoints and number of replicates, rather than by distinct experiments? It seems to be an error in error_prediction function, but can imagine there may also be other implications across different functions too as we also defined number of timepoints elsewhere.

Reproducible example:

data("BRD4_apo")

#filter data so 0 s only contains 2 replicates; other timepoints contain 3 replicates
BRD4_apo <- BRD4_apo %>%
  filter(!(Exposure == 0 & replicate == 3))

BRD4_apo <- DataFrame(BRD4_apo)
BRD4_apo <- cleanHDX(res = BRD4_apo, clean = TRUE)
BRD4_apo <- data.frame(BRD4_apo) %>% filter(End < 100)
BRD4_apo <- DataFrame(BRD4_apo)

numTimepoints <- length(unique(BRD4_apo$Exposure))
Timepoints <- unique(BRD4_apo$Exposure)
numPeptides <- length(unique(BRD4_apo$Sequence))
set.seed(1)
rex_test <- rex(HdxData = BRD4_apo,
                  numIter = 100,
                  R = max(BRD4_apo$End), 
                  density = "laplace",
                  numtimepoints = numTimepoints,
                  timepoints = Timepoints,
                  seed = 1L,
                  tCoef = c(0, rep(1, numTimepoints - 1)),
                  phi = 1,
                  BPPARAM = SerialParam())

Warning: 'package:stats' may not be available when loadingWarning: 'package:stats' may not be available when loadingFold 1 ... Fold 2 ... Fold 3 ... Fold 4 ... Fold 5 ...
Warning in res$Uptake[res$Sequence == unique(res$Sequence)[j]] - rep(mu, :
longer object length is not a multiple of shorter object length

Fold 1 ... Fold 2 ... Fold 3 ... Fold 4 ... Fold 5 ...
Warning in res$Uptake[res$Sequence == unique(res$Sequence)[j]] - rep(mu, :
longer object length is not a multiple of shorter object length

Error: BiocParallel errors
2 remote errors, element index: 1, 2
0 unevaluated and other errors
first remote error:
Error in .sd[j, ] <- rep(tCoef * numExch[[j]] * sqrt(sigmasq), each = numRep): number of items to replace is not a multiple of replacement length

@nlgittens nlgittens changed the title [BUG] A short description of the bug [BUG] Differing number of replicates in datasets Apr 29, 2024
@ococrook ococrook changed the title [BUG] Differing number of replicates in datasets [Feature Request] Differing number of replicates in datasets Apr 29, 2024
@ococrook ococrook added the enhancement New feature or request label Apr 29, 2024
@ococrook
Copy link
Owner

Thanks Nathan, I had this one on my list. It's more of an enchancement than a bug. There are two ways to deal with this:

  1. Impute them
  2. model them

Modelling them is quite computationally intensive but if there's lots of imputation then that can cause bias. I suggest I write a simpel imputation script that has a warning if there are lots of missing values?

@ococrook
Copy link
Owner

an example dataset might be useful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants