New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always specify subject level in BIDS Model #641
Comments
There is a potential conflict. In HBN, where there is only one run, but importantly the single run has no run label, the output "looks like" a subject level map: because typically a subject level map is denoted by the lack of a This causes two problems:
|
I don't think we can safely assign |
On second thought, in the case where there's only one run, and no run ID is provided, I think it might actually be fine to only have one set of maps. Anyone trying to programmatically read either a run-level map or a subject-level one will still get the right image based on the entities they have available. It might be a little confusing if someone's browsing a directory like this, but I can live with that. |
I just confirmed that it is necessary for fitlins (as of now) to omit the subject level, if there wouldn't be more than 2 inputs (i.e. runs). |
Hey, sorry, swamped with non-FitLins and family stuff at the moment. Please let me know if this is urgent, otherwise unlikely to respond until next week. Edit: Also, ping me next week to remind me, if I don't get to this quickly. |
Nope, not urgent. |
@effigies you have the bandwidth for this now? |
Yes. Let me look... |
So my interpretation of this is that you want to always write a subject-level step in your model, not necessarily that FitLins has to implicitly build one? I think that's fine. Possibly the easiest approach will be to make the first level |
I agree that we definitely don't want to generate |
Tal's summary was my recollection of the outstanding issue with fitlins right now. The idea is also that by always including a |
No, we don't adjust to any special situations, such as individual inputs, to simply pass through. However, I think if we handled fixed effects at the subject level, the degenerate case of that would be pass-through. So maybe the correct approach is to decide how we're going to say "combine runs as fixed effects at the subject level" in the spec, implement that in pybids/FitLins, and then you add that flag or whatever in NeuroScout. |
Agreed, but not sure what you mean by "add that flag"? Shouldn't the inclusion of a subject level just be fixed effects? |
Is that a universal desire, to never treat runs or sessions as random effects? Edit: I thought from previous discussions that any given level should be able to be marked as a fixed or random effects combination. |
I think that's right. There are two separate issues, one for fitlins and one for the spec:
Note though that for all of the models we (and I suspect anyone else) has been working with so far, the distinction between fixed and random is immaterial, because we never have repeated measures that need to be modeled as random effects. I.e., the conventional approach is to model runs within a subject-level analysis as fixed. At the group level, we also model subjects as fixed, because there's only one estimate per subject (the within-subject averaging in the previous step means that implicitly, the subject-level variability is being modeled as a random variable). So at the moment, as long as the fitlins issue is addressed, we should be able to fit pretty much any standard model that one might normally want to fit using a summary statistics approach. Down the line, when we have legit mixed-effects estimators we can use to handle multiple random effects (e.g., stimuli and subjects) in one giant dataset-level analysis, the spec level issue will need to be addressed. |
Okay, this may be me being shit at keeping stats straight in my head, but my impression was that the nistats SecondLevelModel treats its inputs as random effects, and what we should generally be doing at the subject level is combining as fixed using a more algebraic approach, e.g. a weighted average using the variance estimate. |
Ah, yeah, the subject-level stat maps present an issue. I wasn't considering those. Let me think on it a bit. |
Okay, I thought about it some more, and also went back and looked at the spec (I was previously looking at an out-of-date draft, and had forgotten that we do in fact have draft language describing variance component specification). Here are my thoughts. Also tagging @nicholst and @cmaumet for any additional insights. (To save folks coming to the thread late the trouble of reading the whole thing, the key question here is how users should specify that test statistics should be computed using a fixed-effects rather than a random-effects analysis.) The main issue as I see it is that we need a different approach to handle the fixed vs. random distinction in the context of generating summary statistics-based stat maps than in the standard mixed-effects models where variables are assumed to be fixed unless they're specified as part of a variance component. Consider the very common use case where we want to produce single-subject z-stat maps, given n runs as input. If we wanted to model runs as random, this would be trivially handled with an intercept-only model (i.e., y ~ 1, where the y's are the parameter estimates for each voxel). But the fixed-effects analysis is quite different—we probably want to do a inverse variance-weighted meta-analysis that takes as inputs the parameter estimates and the variances for all the runs (@cmaumet, I believe you brought this up in the spec document a long time ago, and I punted on it at the time—well, we can no longer punt :)). In principle, we could introduce a new "type" of analysis one can specify in What we could do instead, without having to blow things up, is deal with this in the Setting aside whatever convention we end up with in the spec, there's also a fairly serious UX issue here, IMO. Thanks to the summary stats tradition, I'm guessing that most people will tend to simply assume that when they see a Thoughts? |
To be clear, a If |
But to be clear, implementationally, if there is a Is there another name we can come up with that's more intuitive? (i.e. not an acronym that makes me think of hurricanes). Also, what's the implication for the way we've been doing things (random effects). Since we are only taking the effects + variances for the next level (not the t-stats), how much of a difference does it make? |
Also, in terms of passthrough being allowed if a model is identifiable, doesn't it now depends on the contrasts requested? (i.e. it wouldn't be allowed when theres only a single run or subject and a |
Yes, you could specify them alongside one another. I don't think it's completely crazy to want to see the results of treating runs as both fixed and random effects. They answer different questions. @adelavega there's no real implication for the way we do things, because we don't care about single-subject inference. The propagated estimates remain the same (in principle one could maybe forward the inverse-weighted estimate, but in practice that's something we probably want to leave to the estimator to decide). Ideally it wouldn't be our job to worry about identifiability; if an estimator can't handle a particular model, it just fails, and the pipeline breaks. In principle, an estimator that worries about edge cases should recognize a one-run case and just return the same stat maps when doing fixed-effects aggregation; when doing a random-effects analysis, breaking seems fine. The concern here is just that since we only support a single estimator right now, we should at least be able to gracefully handle the most common scenario we're likely to encounter (i.e., fixed-effects analysis at run level, where there's a non-trivial chance of getting just one run as input). |
Makes sense. Any example code for how to implement To move forward w/ NeuroScout (i.e. generate valid models), we do need to settle the spec terminology, so I guess if theres no major opposition we go with the plan above. Otherwise, the lazy short term solution would be the passthrough in the one-run case even w/ random-effects. |
I don't think we need to implement any statistical routines in fitlins; this is already handled by nistats in a first-level analysis. The issue from our perspective is being able to map the desired analysis onto the right nistats routine—i.e., instead of automatically routing run-level analysis to I'd rather not rush finalization of the language in the spec, and am hoping @nicholst and/or @cmaumet will weigh in at some point. The expedient thing to do is probably to go with my suggestion above internally in fitlins, and we can always tweak things later. One way or another there needs to be logic internally that maps analyses onto nistats as appropriate; I'm guessing that changing the way the analysis type is detected in the spec will be fairly easy. |
I was under the impression that nistats didn't have explicit support for fixed effects, yet. Although @effigies says it may be in there somewhere. Agree that code shouldn't live in fitlins. I'm fine with going w/ the suggestion internally for now. It wouldn't change how the previously defined |
The nistats |
The tricky part, from our perspective, is that taking advantage of nistats this way may require fitlins to look ahead a level to see what the model wants to have done with the run-level outputs, which is not so ideal. |
Ah, yeah that would be a bit of a pain. There's an open issue to add explicit fixed effects to nistats: https://github.com/nistats/nistats/issues/281 |
Yeah, it's not that big a deal to do the fixed effects ourselves, as long as we know it at some point. It's like one or two lines of code to actually perform the combination. If we can convince Bertrand to expose it as a first-class interface, then cool, but my concern at this point is being able to deterministically derive what should be done from a model spec. |
I think from Tal's suggestion it would be clear what should be done. I don't think using |
Allright, I looked into nistats, and I think it's possible to use the These The other thing that nistats does, which needs to be added, is using a learned masker for loading/writing out images. I'll give this a go, and maybe if done well enough it could just be merged into nistats and we could import it. |
Coming late to the party here. +1 for At the risk of forking the thread, can inquire about the variance? If "varcopes" are present, but I want to do a "OLS" style analysis, i.e. ignore them, is it clear how the user makes this happen? |
@nicholst good point—we don't currently have a way to specify that. Do you have suggestions? If we wanted to build this into the spec, it could be an option in the The alternative is to leave it to the estimator, and let the user pass whatever they like via the |
I fear I feel too removed to make good suggestions. To be clear, wanting to do OLS ignoring variance is a special case that only applies when (at a given level) we have a single observation per subject, but it's a very high frequency "special" case in some large circles. Anyway, While we're talking variance, there's also the issue that both SPM and FSL have facilities for (and advocate) group-specific variance in multi-group (e.g. two-sample) analyses. Have we thought about how that will be articulated? And for more fun, SPM's repeated measure model in fact by default assumes unstructured and heteroscedastic covariance. (Though I think we talked about repeated measures variance modelling like this to be a pain, better handled via random intercepts, slopes, etc). |
@nicholst I think unequal variances can in principle be accommodated by the This seems pretty clunky/confusing, so I'm going to think about it some more and see if maybe there's some way to restructure things that makes sense. EDIT: specifying the covariance structure seems more difficult... My inclination is to push that into the software-specific specification. |
The other way to do things would be to separate "steps/level" and "analyses". For example, if you're asking for both a One solution would be the allow multiple "models/analyses" that specify the same "level". These two analyses would be given them same inputs but run independently, and then the results from both of these would be joined and passed to the next level. This may cause some problems for the next level if not done carefully though (i.e. two similar inputs, such as the same contrast produced by two different analyses) |
How should |
If we want to avoid conditioning major decisions on the level of analysis (which I think we do), then we have to assume that We should also be clear (I can't find this in the spec anywhere right now) that if the user doesn't explicitly say anything about aggregation at a given level, then all the outputs are simply collected and passed forward, without any aggregation. E.g., if there are multiple runs per subject, and the subject-level model doesn't specify contrasts (or auto contrasts), then the level after subject (typically dataset) will get a concatenation of all run-level estimates for all subjects passed in. EDIT: Ignore the second paragraph; the spec does make the chaining behavior explicit. |
I like the idea of mandating strings to specify which kind of contrast. The issue is this doesn't let you specify which variables to create contrasts for. Here's what I see as the common uses cases for
For 1) we can allow either a string (
Alternatively, since most of these models will be machine generated we could keep it simple and only allow 1). |
always add subject level to bidsmodel #641
Need to check if this is true, but currently, if there is only 1 run per subject, the BIDS model doesnt specify a subject level (even its just a copy), and the first level gets passed to the group level.
Let's make it so subject level is always written. This is more explicit so you can always find the subject level maps.
Also need to check if this is how I construct models with
pyns
The text was updated successfully, but these errors were encountered: