This information fits a retrospective observational cohort comparison. Details

---

**Time Frame**
<br>
Data Collected over 2-Year Span

**Unit of Analysis**
<br>
Individual Child

**Group Variable**
<br>
Parent Self-Reported SLD and/or ADHD (`bool`)

**Primary/Dependent Variable**
<br>
WPS, Words Per Sentence (`float`)

**Secondary/Independent Variables**
<br>
- Total Communication Score, PLS5/Bayley 4 (`float`) 
- Age in Months at Time of Testing (`int`) 
- Gender (`string` / `enum`) 
- Parent Demographics (`string[]` and/or `bool[]`) 
    - *Can also be used as covariates or descriptive balance*
---

**Research Question**:
<br>
Do DHH Children whose parents self-report a history of SLD and/or ADHD have significantly lower WPS scores compared to DHH children whose parents do not report such a history?

**Null**:
<br>
There is no difference in mean WPS scores between children of parents who self-report SLD/ADHD and those who do not.

**Alternate**:
<br>
There is a difference in WPS scores between the two groups.

**Analysis**:
<br>
Smaller sample, non-normality is likely.

---

**Descriptive**
- Mean, median, SD, min-max WPS by group
- Breakdowns of Age and Gender

**Inferential**
- Nonparametric Test: Mann-Whitney U test for WPS between the two groups
- T-Test if assumptions are met
- Effect Size: r or Cliff's delta to get interpretable magnitude, even with small n

**Covariate**
- ANCOVA with Age, if assumptions hold, or controlled through stratification/regression

**Supplementary**
- Correlation between WPS and PLS5/Bayley total scores
- Visualization via boxplot or scatterplot with group overlay

**Even with two small groups**
- Can vizualize each child's data point, not just group mean
- Can take case series approach; group trends supplemented by individual-level patterns; potentially powerful in developmental speech-language research with a low n (e.g. plotting age (x-axis) vs WPS (y-axis) and marking group by color)

---

**Reporting Structure**
- Participants
- n = Total children (xx parent self-report, xx no self-report)
- Age Range = xx (mean; SD)
- similar gender distribution accross groups

**Measures**
- WPS derived from attempted language sample
- PLS5/Bayley total score
- Parent self-report of SLD/ADHD

**Analysis**
- Nonparametric group comparison of WPS
- Effect size
- Descriptive statistics of age, gender, and language score
- Exploratory scatterplots and individual-level data presentation

**Regression (optional)**
- WPS = b~0 + (b~1 * Parent SLD/ADHD) + (b~2 * Age) + e
- Adjusted effect of parent sld/adhd on child wps
- quantitative estimate of effect size
- label options = "retrospective cohort comparison with small n" or "retrospective case series with group comparison" 


In [1]:
import pandas as pd
import statsmodels.stats.descriptivestats as ds
import scipy.stats as stats
import matplotlib.pyplot as plt
import researchpy as rp
import warnings

# Supress unwanted errors from script output
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", category=RuntimeWarning)
# Manually define a tab variable because \t isn't working right
# And I don't want to take the time to figure out why, when I can brute force it :)
tab = "        "

# Import data after exclusion from csv
df = pd.read_csv("/workspaces/codespaces-blank/data/dataset.csv")

In [7]:
# Descriptive Statistics:
# Count, Mean, Median, SD, min-max WPS and Language Scores by group
# Breakdowns of Age and Gender
# ==============================
grouped_stats = ['count', 'mean', 'median', 'std', 'min', 'max']
df_test_filter = df[df['Language_Instrument'].isin({'PLS5', 'Bayley_language'})]

sld_count = df[df['SLD_or_ADHD'] == 'Yes']
print(sld_count)


     ID  Gender  Age Language_In_Home     Household Guardian_Marital_Status  \
7    16    Male  2.9          English  Both parents                 Married   
13   34  Female  2.8         Multiple  Both parents                  Single   
14   35    Male  2.8          English  Both parents               Any Other   
17   47  Female  2.8         Multiple  Both parents                 Married   
23   69    Male  2.9         Multiple     Any Other                  Single   
24   77  Female  2.8          English     Any Other                  Single   
27   97    Male  2.9         Multiple  Both parents                 Married   
46  189    Male  2.7         Multiple  Both parents                 Married   
47  192  Female  2.6         Multiple  Both parents                 Married   

     Maternal_Education Maternal_Learning_Differences Maternal_ADHD  \
7   High School or Less                           Yes           NaN   
13  High School or Less                            No           Yes

In [3]:
age_grouped_df = df_test_filter.groupby(['Age', 'SLD_or_ADHD'])
age_grouped_df[['WPS_Score', 'Total_Score_Standard_Score']].agg(grouped_stats).round(4)

Unnamed: 0_level_0,Unnamed: 1_level_0,WPS_Score,WPS_Score,WPS_Score,WPS_Score,WPS_Score,WPS_Score,Total_Score_Standard_Score,Total_Score_Standard_Score,Total_Score_Standard_Score,Total_Score_Standard_Score,Total_Score_Standard_Score,Total_Score_Standard_Score
Unnamed: 0_level_1,Unnamed: 1_level_1,count,mean,median,std,min,max,count,mean,median,std,min,max
Age,SLD_or_ADHD,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2
2.6,No,1,1.0,1.0,,1.0,1.0,1,57.0,57.0,,57.0,57.0
2.7,No,3,1.58,1.73,0.5214,1.0,2.01,5,80.2,76.0,22.4321,59.0,110.0
2.8,No,6,2.1717,1.75,1.3879,1.0,4.17,14,86.4286,90.0,22.1141,50.0,114.0
2.8,Yes,1,1.0,1.0,,1.0,1.0,4,74.5,76.0,12.5565,59.0,87.0
2.9,No,3,2.5533,2.42,1.6241,1.0,4.24,5,88.2,91.0,25.044,56.0,112.0
2.9,Yes,0,,,,,,2,73.5,73.5,33.234,50.0,97.0


In [4]:
gender_grouped_df = df_test_filter.groupby(['Gender', 'SLD_or_ADHD'])
gender_grouped_df[['WPS_Score', 'Total_Score_Standard_Score']].agg(grouped_stats).round(4)

Unnamed: 0_level_0,Unnamed: 1_level_0,WPS_Score,WPS_Score,WPS_Score,WPS_Score,WPS_Score,WPS_Score,Total_Score_Standard_Score,Total_Score_Standard_Score,Total_Score_Standard_Score,Total_Score_Standard_Score,Total_Score_Standard_Score,Total_Score_Standard_Score
Unnamed: 0_level_1,Unnamed: 1_level_1,count,mean,median,std,min,max,count,mean,median,std,min,max
Gender,SLD_or_ADHD,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2
Female,No,5,2.296,2.01,1.2148,1.0,4.24,6,90.0,95.5,20.1296,57.0,112.0
Female,Yes,1,1.0,1.0,,1.0,1.0,3,76.0,82.0,14.9332,59.0,87.0
Male,No,8,1.8688,1.0,1.2871,1.0,4.17,19,82.5789,86.0,23.0587,50.0,114.0
Male,Yes,0,,,,,,3,72.3333,70.0,23.5867,50.0,97.0


In [None]:
test_female = [112, 95, 96, 76, 57, 59, 87, 82, 104, 45]
test_male = [114, 97, 91, 63, 50, 86, 60, 56, 51, 92, 59, 50, 50, 70, 109, 88, 62, 86, 97, 112, 70, 110, 114]


stat, p_value = stats.mannwhitneyu(test_male, test_female)
print('Statistics=%.2f, p=%.2f' % (stat, p_value))
alpha = 0.05
if p_value < alpha:
    print('Reject Null Hypothesis (significant difference between two samples)')
else:
    print('Do not Reject Null Hypotheses (no significant difference between two samples)')