Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initialization failed for metabarcoding diet data #5

Closed
mfisher5 opened this issue Sep 27, 2023 · 6 comments
Closed

initialization failed for metabarcoding diet data #5

mfisher5 opened this issue Sep 27, 2023 · 6 comments

Comments

@mfisher5
Copy link

I'd like to use zoid to model diet data from DNA metabarcoding (data_matrix) across predator sizes (measured as crab carapace width, in mm; CW_mm in design_matrix). But I am having trouble getting past this initialization error:

fit_3_prey <- fit_zoid(formula = y ~ CW_mm, 
                      design_matrix = design_matrix, 
                      data_matrix = as.matrix(data_matrix)/1000,
                       overdispersion = TRUE,
                       chains=1,   # just for testing
                       iter=500)   # just for testing
Chain 1: Rejecting initial value:
Chain 1:   Log probability evaluates to log(0), i.e. negative infinity.
Chain 1:   Stan can't start sampling from this initial value.
Chain 1: 
Chain 1: Initialization between (-2, 2) failed after 100 attempts. 
Chain 1:  Try specifying initial values, reducing ranges of constrained values, or re parameterizing the model.
[1] "Error : Initialization failed."
[1] "error occurred during calling the sampler; sampling not done"

In my starting data set, I have 58 individuals and 61 different prey species; I removed rare taxa that occurred in fewer than three individuals.

I tried aggregating species by family (58 individuals x 57 classes), class (58 individuals x 26 classes), and phylum (58 individuals x 26 phyla), to reduce the amount of "0" values in the matrix, and the variation between the minimum / maximum read values; but I still end up with the same error.

The last time this happened I was able to divide the data matrix by 100 to make the error go away, but that doesn't seem to be working this time.

This is the distribution of size, in case it's helpful!
image

@mfisher5
Copy link
Author

mfisher5 commented Sep 27, 2023

zoid_data.zip

each data matrix in the zip folder was used to construct the zoid data / design matrices using the following code:

design_matrix = zoidIN[,which(names(zoidIN)=="CW_mm"), drop=FALSE]
data_matrix = zoidIN[,which(names(zoidIN)!="CW_mm")]
design_matrix$y = 1 # dummy variable

fit_1_prey <- fit_zoid(formula = y ~ CW_mm, 
                      design_matrix = design_matrix, 
                      data_matrix = as.matrix(data_matrix),
                       overdispersion = TRUE,
                       chains=1,   # for testing; change to 4
                       iter=20)   # for testing; change to 5000

@ericward-noaa
Copy link
Member

I think the general sparsity of the data is still a problem -- you have a few taxa that show up 3 times or less in a number of sites. I was able to get it to work fine for taxa showing up in 10 or more sites. This threshold is arbitrary and you probably want to play with it. But you could either drop those sites or aggregate them to some higher group.


z <- ceiling(data_matrix/1.0e100) # turn to matrix of 0s and 1s for counts
indx <- which(apply(z,2,sum) > 10) 
# fit to only taxa with 10+ sites
fit_1_prey <- fit_zoid(formula = y ~ CW_mm, 
                       design_matrix = design_matrix, 
                       data_matrix = as.matrix(data_matrix[,indx]) / 1000,
                       overdispersion = FALSE,
                       chains=1,   # for testing; change to 4
                       iter=20, overdispersion_sd=0.1)   # for testing; change to 5000

@OleShelton
Copy link
Collaborator

OleShelton commented Sep 28, 2023 via email

@mfisher5
Copy link
Author

mfisher5 commented Oct 3, 2023

Got it, it's running again! And it looks like as I get into phyla as opposed to species, when the matrix isn't as sparse, I don't have to divide the data matrix by as large a number, or I'm able to drop the minimum occurrence cut-off to 5.

I was also able to get zoid running when I aggregated the crabs themselves into groups -- since my carapace widths are only measured to the nearest 0.5mm, I summed up the reads for each prey species across all crabs with the same carapace width. That gave me 19 rows / individual measurements. And then I removed any species that only showed up in a single crab (to end up with 87 columns), and only had to divide the entire data matrix by 10. But I think for my question it might be better to have size-based estimates derived from multiple observations, so I'll probably drop or aggregate rare species rather than aggregate crabs.

Thank you!

@mfisher5 mfisher5 closed this as completed Oct 3, 2023
@mfisher5
Copy link
Author

mfisher5 commented Oct 4, 2023

briefly re-opening this for my own documentation -- because I haven't calibrated the reads for all species/taxa for their different PCR amplification efficiencies, I can't aggregate individual taxa by phyla (or other higher taxonomic level); otherwise the reads I'm using as input for each category will reflect not only true abundance but also species make-up of the category in each crab, and not be truly comparable across crab. @invertdna is that correct?

@invertdna
Copy link

invertdna commented Oct 4, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants