Advice on comparator selection for educational outcomes #48

lucyv28 · 2026-03-30T15:30:28Z

lucyv28
Mar 30, 2026

I am currently undertaking a research project using ECHILD, exploring educational outcomes in a specific disease cohort and would appreciate some advice.

I am uncertain about the most appropriate approach for selecting a control group. Would it be reasonable to use nationally reported statistics (e.g. exam performance and absence rates) as a comparator? If using national data, I assume it would need to be stratified by academic year and possibly other demographic factors.

I have noticed that many studies using the ECHILD database identify control groups from within the dataset itself. Could anyone advise on best practices for doing this, particularly in terms of matching or adjustment methods?

Any guidance or examples from previous work would be greatly appreciated.

zcabh35 · 2026-03-30T17:21:46Z

zcabh35
Mar 30, 2026
Maintainer

Howdy

please see one of our latest publications as of March 2026 for inspiration: https://onlinelibrary.wiley.com/doi/10.1111/ppe.70122

The following advice is not. meant to act as the gold standard but explains how previous work has approached this and why it has approached this..and others are very much welcome to debate this, especially those who are more knowledgeable.

Firstly, it is hard to understand causality when it comes to looking at the effect of a disease due to understanding the aetiology of such a disease; some might say it could be possible with genetic data and use methods such as instrumental variables/Mendelian randomization etc, but this data is not in ECHILD. Thus such work described here is descriptive

Secondly, given how large the ECHILD dataset is (i.e. it contains all state funded students and linkable state funded hospital activity), there is a chance the nationally reported statistics would contain the children you are investigating. Unless the national reported statistics also divide their data by a specific disease, it is likely that nested within such data includes those ECHILD children...thus the aggregated national national reported statistics would be biased towards this group of children....although such results are only impacted depending on the rarity of the disease. If your goal is to compare to use the national reported statistics as a comparator this is fine, but please use the term comparator and not "control" as you don't know the disease status in the national comparator data. If you are using the national comparator data, you should replicate the method created to replicate the national data e.g. if it was stratified by academic year, and possibly other demographic factors, so should you...This way you can draw a chart/graph etc which shows the national comparator vs ECHILD

Thirdly, if you want to show the association between having a phenotypical case definition in HES for a disease, versus the absence of such a phenotypical case definition in HES (i.e controls), you should use controls from within the ECHILD group; this way your comparator group doesn't include your own cases. Remember, matching and adjustment (in a regression) are methods for controlling for differences in the distribution of some variables (this is disease specific and requires domain knowledge) and each come with different assumptions. I think matching reduces the variation by selecting individuals who are similar to your cases, but this is at the cost of excluding individuals who weren't matched , thus making it less representative of the control population. Matching is often used in causal work but not always. On the other hand, depending on the distribution of your variables, this might move your estimate e.g. if your control group are wayyyyy more likely to be in one demographic compared to the cases...but adjustment does allow you to include all the control group.

The method you choose really depends on what the goal of your analysis is. Is it to compare against national statistics? Is it comparing against cases versus non-cases (i.e. controls)? Are you trying to isolate the association of the disease? Are you trying to see how that disease is associated amongst the whole population?

I believe the paper posted has the goal of comparing cases versus non-cases and looks at the association of the disease, amongst the whole population.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ECHILD

Advice on comparator selection for educational outcomes #48

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

ECHILD

Advice on comparator selection for educational outcomes #48

Uh oh!

lucyv28 Mar 30, 2026

Replies: 1 comment

Uh oh!

zcabh35 Mar 30, 2026 Maintainer

lucyv28
Mar 30, 2026

zcabh35
Mar 30, 2026
Maintainer