initialization failed for metabarcoding diet data #5

mfisher5 · 2023-09-27T21:40:50Z

I'd like to use zoid to model diet data from DNA metabarcoding (data_matrix) across predator sizes (measured as crab carapace width, in mm; CW_mm in design_matrix). But I am having trouble getting past this initialization error:

fit_3_prey <- fit_zoid(formula = y ~ CW_mm, 
                      design_matrix = design_matrix, 
                      data_matrix = as.matrix(data_matrix)/1000,
                       overdispersion = TRUE,
                       chains=1,   # just for testing
                       iter=500)   # just for testing

Chain 1: Rejecting initial value:
Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: 
Chain 1: Initialization between (-2, 2) failed after 100 attempts. 
Chain 1:  Try specifying initial values, reducing ranges of constrained values, or re parameterizing the model.
[1] "Error : Initialization failed."
[1] "error occurred during calling the sampler; sampling not done"

In my starting data set, I have 58 individuals and 61 different prey species; I removed rare taxa that occurred in fewer than three individuals.

I tried aggregating species by family (58 individuals x 57 classes), class (58 individuals x 26 classes), and phylum (58 individuals x 26 phyla), to reduce the amount of "0" values in the matrix, and the variation between the minimum / maximum read values; but I still end up with the same error.

The last time this happened I was able to divide the data matrix by 100 to make the error go away, but that doesn't seem to be working this time.

This is the distribution of size, in case it's helpful!

The text was updated successfully, but these errors were encountered:

mfisher5 · 2023-09-27T21:49:06Z

zoid_data.zip

each data matrix in the zip folder was used to construct the zoid data / design matrices using the following code:

design_matrix = zoidIN[,which(names(zoidIN)=="CW_mm"), drop=FALSE]
data_matrix = zoidIN[,which(names(zoidIN)!="CW_mm")]
design_matrix$y = 1 # dummy variable

fit_1_prey <- fit_zoid(formula = y ~ CW_mm, 
                      design_matrix = design_matrix, 
                      data_matrix = as.matrix(data_matrix),
                       overdispersion = TRUE,
                       chains=1,   # for testing; change to 4
                       iter=20)   # for testing; change to 5000

ericward-noaa · 2023-09-28T03:13:52Z

I think the general sparsity of the data is still a problem -- you have a few taxa that show up 3 times or less in a number of sites. I was able to get it to work fine for taxa showing up in 10 or more sites. This threshold is arbitrary and you probably want to play with it. But you could either drop those sites or aggregate them to some higher group.


z <- ceiling(data_matrix/1.0e100) # turn to matrix of 0s and 1s for counts
indx <- which(apply(z,2,sum) > 10) 
# fit to only taxa with 10+ sites
fit_1_prey <- fit_zoid(formula = y ~ CW_mm, 
                       design_matrix = design_matrix, 
                       data_matrix = as.matrix(data_matrix[,indx]) / 1000,
                       overdispersion = FALSE,
                       chains=1,   # for testing; change to 4
                       iter=20, overdispersion_sd=0.1)   # for testing; change to 5000

OleShelton · 2023-09-28T03:30:19Z

Hmmm. If you wanted to futz directly with the inits, that’s an easy thing to try. Inits_r = 1 for example

…

On Wed, Sep 27, 2023 at 7:14 PM Eric Ward ***@***.***> wrote: I think the general sparsity of the data is still a problem -- you have a few taxa that show up 3 times or less in a number of sites. I was able to get it to work fine for taxa showing up in 10 or more sites. This threshold is arbitrary and you probably want to play with it. But you could either drop those sites or aggregate them to some higher group. z <- ceiling(data_matrix/1.0e100) # turn to matrix of 0s and 1s for counts indx <- which(apply(z,2,sum) > 10) # fit to only taxa with 10+ sites fit_1_prey <- fit_zoid(formula = y ~ CW_mm, design_matrix = design_matrix, data_matrix = as.matrix(data_matrix[,indx]) / 1000, overdispersion = FALSE, chains=1, # for testing; change to 4 iter=20, overdispersion_sd=0.1) # for testing; change to 5000 — Reply to this email directly, view it on GitHub <#5 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACHE7C2IVYOKQ5Q5BJOR55DX4TTPXANCNFSM6AAAAAA5J77ICA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

mfisher5 · 2023-10-03T23:42:18Z

Got it, it's running again! And it looks like as I get into phyla as opposed to species, when the matrix isn't as sparse, I don't have to divide the data matrix by as large a number, or I'm able to drop the minimum occurrence cut-off to 5.

I was also able to get zoid running when I aggregated the crabs themselves into groups -- since my carapace widths are only measured to the nearest 0.5mm, I summed up the reads for each prey species across all crabs with the same carapace width. That gave me 19 rows / individual measurements. And then I removed any species that only showed up in a single crab (to end up with 87 columns), and only had to divide the entire data matrix by 10. But I think for my question it might be better to have size-based estimates derived from multiple observations, so I'll probably drop or aggregate rare species rather than aggregate crabs.

Thank you!

mfisher5 · 2023-10-04T18:19:13Z

briefly re-opening this for my own documentation -- because I haven't calibrated the reads for all species/taxa for their different PCR amplification efficiencies, I can't aggregate individual taxa by phyla (or other higher taxonomic level); otherwise the reads I'm using as input for each category will reflect not only true abundance but also species make-up of the category in each crab, and not be truly comparable across crab. @invertdna is that correct?

invertdna · 2023-10-04T18:25:14Z

Yep, correct!

…

On Oct 4, 2023, at 11:19 AM, Mary Fisher ***@***.***> wrote: briefly re-opening this for my own documentation -- because I haven't calibrated the reads for all species/taxa for their different PCR amplification efficiencies, I can't aggregate individual taxa by phyla (or other higher taxonomic level); otherwise the reads I'm using as input for each category will reflect not only true abundance but also species make-up of the category in each crab, and not be truly comparable across crab. @invertdna <https://github.com/invertdna> is that correct? — Reply to this email directly, view it on GitHub <#5 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADQ4HAG6AFMM7NH2LQ4T57LX5WSCXAVCNFSM6AAAAAA5J77ICCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBXGQYTGNRQGM>. You are receiving this because you were mentioned.

mfisher5 closed this as completed Oct 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

initialization failed for metabarcoding diet data #5

initialization failed for metabarcoding diet data #5

mfisher5 commented Sep 27, 2023

mfisher5 commented Sep 27, 2023 •

edited

Loading

ericward-noaa commented Sep 28, 2023

OleShelton commented Sep 28, 2023 via email

mfisher5 commented Oct 3, 2023

mfisher5 commented Oct 4, 2023

invertdna commented Oct 4, 2023 via email

initialization failed for metabarcoding diet data #5

initialization failed for metabarcoding diet data #5

Comments

mfisher5 commented Sep 27, 2023

mfisher5 commented Sep 27, 2023 • edited Loading

ericward-noaa commented Sep 28, 2023

OleShelton commented Sep 28, 2023 via email

mfisher5 commented Oct 3, 2023

mfisher5 commented Oct 4, 2023

invertdna commented Oct 4, 2023 via email

mfisher5 commented Sep 27, 2023 •

edited

Loading