-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High Rhat and ESS issues for K>1 with spatial analysis #31
Comments
Hello, So it seems like you have a few different questions/problems: (1) you're getting mixing warnings; (2) the analyses are taking a long time; (3) the traceplots for K=2 "don't agree". I'll respond to each below:
hope that helps, |
Gideon, When I look at the trace plots they dont all seem to agree. It seems like you mean they should all be in the same ballpark on the Y axis then right? For example the supplied k2 plots all tend to float between 45950 and 45980. I guess I saw the chain 6 in that run with a slight trend upwards and got worried. I will run this with smaller iterations, and monitor the chains again and let you know how things go, ie: I appreciate your guidance. |
I guess I’m still not sure I understand what you mean by “agree”. The MCMC algorithm stochastically samples from the posterior probability surface, so no two independent runs should be identical. Instead, if they’re performing well, you should see that they’re converging on the same stationary distribution - ie, if you were to compare the marginal distributions for any given parameter across multiple independent runs, they should be on top of each other. But the sampled value of the parameter in a given MCMC iteration isn't expected to be the same between any two runs. I agree that the chain 6 for K=2 is maybe trending somewhere, but if 5/6 chains largely agree, that's pretty good. Let me know if you have further troubles on this issue. |
HI Gideon, I have run the analysis, and I still get the ESS warnings, but the distributions seem to be OK. But this is one K4 run performed particularly poorly. I would like to confirm before I close this, If one run of K4 spatial analysis performed poorly is it OK to include it when comparing layer contributions? It seems that additional layers above K2 add little information. However, I have a new problem. I understand I should also run the But this might require a different issues post, because its not relevant to the high Rhat issues. Let me know if you would like me to open a new issue. In order to expedite this, I will be running much smaller iterations (1,000 instead of 10,000, and 5 iterations), and see if I can get the cross validation method to work. Thanks again for all your help, |
Hi Cody, Yeah that one run looks wonky, but if the others don't then I wouldn't worry too much about it. W/r/t the xvalidation - it's a hard rule of the likelihood function that there have to be more samples than loci in each partition. And as you've found, missing data can contribute to positive-definite woes. If you'd like to open a new issue you're welcome to, but I'm not sure how much help I can provide, as these problems seem to be features of your data. You could try dropping samples with lots of missing data (that'll probably help with the positive-definite stuff, and will definitely help decrease the samples:loci ratio). Alternatively, you could try lumping samples from the same location. hope that helps, |
Hello,
I am currently running conStruct with a fairly small dataset of 74 individuals and 525 SNPs, I cannot use my full data set of 1113 neutral SNPs because there seems to be too much missing data, I could only get the analysis to run after filtering it down this much.
I have run conStruct at
K=1
,n.chains=3
andn.iter=1e4
iterations without any Tail and Bulk ESS issues or Rhat problems. When ever I run K>1 I get warnings. I tried finding a solution in other issues with the same problem (for example here and here), but as of yet nothing seems to have worked.My latest run took all weekend with k=1:2, but K=2 but still gives me these warnings:
The largest R-hat....
Bulk Effective Sampling Sizes (ESS)...
Tail Effective Sampling Sizes (ESS)...
Here is the code I used and associated trace save files.
k01sp_trace.plots.chain_1.pdf
k01sp_trace.plots.chain_2.pdf
k01sp_trace.plots.chain_3.pdf
k01sp_trace.plots.chain_4.pdf
k01sp_trace.plots.chain_5.pdf
k01sp_trace.plots.chain_6.pdf
k2sp_trace.plots.chain_1.pdf
k2sp_trace.plots.chain_2.pdf
k2sp_trace.plots.chain_3.pdf
k2sp_trace.plots.chain_4.pdf
k2sp_trace.plots.chain_5.pdf
k2sp_trace.plots.chain_6.pdf
The trace plots for K=2 don't seem to agree for any iteration, but they look pretty good for K=1.
I would like to test higher values of K (up to 8), but at
... n.chains= 6, n.iter = 3e4, ...
it would take a long time. I have run a non-spatial analysis in the past with this data, but that gave me similar warnings. I am mainly interested in the spatial analysis. This is because so far with this data there seems to be an effect of IBD. I have run a hierarchical AMOVA that indicates a panmictic population, with a small percentage of the variation coming from one of my stratifications. My STRUCTURE analysis fails to find a solution (multiple "peaks" of deltaK , and all with relatively low magnitude).I do wonder, could the poor mixing and ESS warnings at K>1 be because there is no ancestral K value greater than 1 in my data? Is this an unreasonable assumption or can I move forward ignoring those warnings and see what the cross-validation and contribution layers say?
Let me know what other information or data you may need. I appreciate any assistance you can provide.
The text was updated successfully, but these errors were encountered: