# Exercise 9: Cue Salience & Context Memory

Learning Goals:
- Map fMRI regions-of-interest (ROIs) to clear, atlas-backed brain measures (no guessing in noisy EPI data).
- Quantify cue vs neutral effects in the amygdala and hippocampus, and relate these brain “bias” metrics to behavioral craving reports.
- Test group differences with appropriate statistical rigor (BH FDR correction + effect sizes) and communicate results responsibly (cautious interpretation, not overstatement). 

What is a “cue-reactivity” task? In cue-reactivity paradigms, participants view images related to substances (e.g. marijuana, e-cigarette, alcohol) and neutral content (e.g. food or outdoor scenes). The idea is to trigger craving and brain activation similar to real-life cues. For example, the Adolescent Health and Development in Context (AHDC) study’s fMRI task showed pictures of marijuana, e-cig, alcohol, food, and outdoor scenes. Brain activity is modeled per condition; a beta value represents how strongly a participant’s BOLD signal responded to that cue category (relative to baseline). Higher beta = greater activation to that condition. In this notebook, we’ll analyze semi-synthetic beta data inspired by that task.

About the data you’ll use: The CSV files are semi-synthetic data generated for teaching, inspired by OpenNeuro dataset ds005901 (AHDC study). They mimic first-level GLM beta coefficients (percent signal change units) for two ROIs – the amygdala and hippocampus – under each cue condition. A separate small file provides a self-reported craving score (0–10) for each participant, and a median-split grouping into “High” vs “Low” craving. These are not real study data, but they reflect a plausible pattern for learning purposes. No actual fMRI downloads or heavy computation are needed.

Baldwin M. Way, Christopher R. Browning, Dylan D. Wagner, Jodi L. Ford, Bethany Boettner, and Ping Bai (2025). Structural and functional MRI dataset from the Adolescent Health and Development in Context (AHDC) study in Columbus, Ohio. OpenNeuro. [Dataset] doi: doi:10.18112/openneuro.ds005901.v1.0.0

### Before you begin: complete Part A in the Guide. This notebook starts at Table 2 tasks.

In [None]:
# import required libraries.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy
from scipy import stats
from statsmodels.stats.multitest import multipletests

# Set up plotting defaults for clarity
plt.rcParams['figure.dpi'] = 120
plt.rcParams['figure.figsize'] = (7, 4)

# Print versions for reproducibility
print("Loaded libraries:",
      "pandas", pd.__version__,
      "| numpy", np.__version__,
      "| matplotlib", plt.matplotlib.__version__,
      "| scipy", scipy.__version__)


In [None]:
# Load the semi-synthetic datasets
df_betas = pd.read_csv('data/e9_task_qreact_roi_betas.csv')
df_behav = pd.read_csv('data/e9_qreact_behavior_min.csv')

# Quick checks on data shape and content
print("Betas shape:", df_betas.shape)
print("Behavior shape:", df_behav.shape)
display(df_betas.head(6))
display(df_behav.head(6))

# List unique conditions and ROIs
print("Unique conditions:", df_betas['condition'].unique())
print("Unique ROIs:", df_betas['ROI'].unique())

# Summary of the craving ratings
mean_crave = df_behav['cue_craving_mean'].mean().round(2)
sd_crave = df_behav['cue_craving_mean'].std().round(2)
min_crave = df_behav['cue_craving_mean'].min()
max_crave = df_behav['cue_craving_mean'].max()
print(f"Craving mean = {mean_crave}, SD = {sd_crave}, range = {min_crave}–{max_crave}")


*Note:* In this task, “outdoor” pictures serve as the **neutral** condition (baseline scenes with no drug-related content). The “Drug” cues refer to the three substance-related conditions: marijuana, ecig, and alcohol. (There is also a  food condition which is a non-drug cue, but our primary interest will be in the drug-vs-neutral contrast.)

### Prepping for Part B: Compute Drug vs Neutral Bias for Key Regions (Amygdala & Hippocampus)

In this step, we calculate each participant’s **DrugBias** — how much their brain’s response to **drug-related cues** differs from a neutral comparison condition (**outdoor scenes**).  

For each participant:
- We take the **average activation (beta)** across the *drug cue* conditions (marijuana, e-cig, alcohol).  
- We subtract the beta for the *neutral outdoor* condition.  
- We do this separately for two regions of interest (ROIs): the **amygdala** and the **hippocampus**.

The result is a simple index:

$$
\text{DrugBias}_{\text{ROI}} = \bar{\beta}_{\text{drug cues}} - \beta_{\text{outdoor}}
$$

Positive values mean stronger activation to drug cues than neutral cues (greater cue reactivity);  
values near 0 mean little difference; negative values mean less activation to drug cues.

This table of bias scores will feed directly into **Table 2 of your deliverable slides**, where you’ll:
1. Plot your assigned participant’s amygdala and hippocampus DrugBias as a two-bar chart.  
2. Combine it with your T1 crosshair screenshot from the viewer.  

We’ll now compute those bias values for everyone in the sample so you can extract your assigned participant’s data.


In [None]:
# --- Compute DrugBias values (already preloaded or reproducible if needed) ---
def compute_drug_bias(df_betas, df_behav):
    drug_conditions = ['marijuana', 'ecig', 'alcohol']
    df_subset = df_betas[df_betas['ROI'].isin(['amygdala', 'hippocampus'])]
    bias_list = []
    for pid, sub_df in df_subset.groupby('participant_id'):
        amyg_drug = sub_df.query("ROI == 'amygdala' and condition in @drug_conditions")['beta'].mean()
        amyg_out = sub_df.query("ROI == 'amygdala' and condition == 'outdoor'")['beta'].values[0]
        hipp_drug = sub_df.query("ROI == 'hippocampus' and condition in @drug_conditions")['beta'].mean()
        hipp_out = sub_df.query("ROI == 'hippocampus' and condition == 'outdoor'")['beta'].values[0]
        bias_list.append({
            'participant_id': pid,
            'Amyg_DrugBias': amyg_drug - amyg_out,
            'Hipp_DrugBias': hipp_drug - hipp_out
        })
    df_bias = pd.DataFrame(bias_list)
    return pd.merge(df_bias, df_behav, on='participant_id')

df_wide = compute_drug_bias(df_betas, df_behav)

# Preview participant-level DrugBias values
df_wide.head()


### Part B — Single-Subject Bias Snapshot (Feeds Table 2)

Pick **your assigned subject ID** and run the cell below to compute and plot their two DrugBias values:
- **DrugBias** = average beta for drug cues *(marijuana, e-cig, alcohol)* minus the beta for the neutral **outdoor** condition.
- We compute this separately for **amygdala** and **hippocampus**.

This cell will:
1) Compute `Amyg_DrugBias` and `Hipp_DrugBias` for your subject,  
2) Save a two-bar figure to `figs/<SUB>_drugbias.png`, and  
3) Print a small summary you can copy into your slide caption.

**What to submit on Table 2:**  
- Your **T1 crosshair** screenshot (from the viewer), and  
- The saved **two-bar DrugBias** figure for your subject.


In [None]:
# --- Part B: Single-subject DrugBias snapshot (feeds Table 2) ---
# Uses: df_betas (condition × ROI betas), df_behav (cue_craving_mean, craving_group)

# 1) Pick your assigned participant ID (do NOT change after you start)
SUB = "___"  # ← replace with YOUR assigned subject ID

# 2) Constants
drug_conditions  = ["marijuana", "ecig", "alcohol"]
condition_order  = ["food", "marijuana", "ecig", "alcohol", "outdoor"]
rois_needed      = ["amygdala", "hippocampus"]

# 3) Basic checks
if SUB not in df_betas["participant_id"].unique():
    raise ValueError(f"Participant {SUB} not found in df_betas.")
if SUB not in df_behav["participant_id"].unique():
    raise ValueError(f"Participant {SUB} not found in df_behav.")

# 4) Extract subject rows and pivot to (condition × ROI) → beta
sub_df = df_betas[df_betas["participant_id"] == SUB]
pivot = (
    sub_df.pivot_table(index="condition", columns="ROI", values="beta", aggfunc="mean")
         .reindex(condition_order)
)

# Warn if any ROI/condition is missing
missing_conditions = pivot.index[pivot.isna().any(axis=1)].tolist()
missing_rois = [r for r in rois_needed if r not in pivot.columns]
if missing_conditions or missing_rois:
    print("⚠️ Warning: Missing data detected.")
    if missing_rois:
        print("   Missing ROIs:", missing_rois)
    if missing_conditions:
        print("   Condition rows with NaN values:", missing_conditions)

# 5) Compute DrugBias = mean(drug) - outdoor for each ROI
def drug_bias_for_roi(roi: str) -> float:
    return float(pivot.loc[drug_conditions, roi].mean() - pivot.loc["outdoor", roi])

Amyg_DrugBias = drug_bias_for_roi("amygdala")
Hipp_DrugBias = drug_bias_for_roi("hippocampus")

# 6) Pull craving info
row     = df_behav[df_behav["participant_id"] == SUB].iloc[0]
craving = float(row["cue_craving_mean"])
cgroup  = str(row["craving_group"])

# 7) Save minimal 2-bar figure (symmetric y-limits, zero line, value labels)
import os
import matplotlib.pyplot as plt

os.makedirs("figs", exist_ok=True)
vals   = [Amyg_DrugBias, Hipp_DrugBias]
labels = ["Amyg_DrugBias", "Hipp_DrugBias"]

plt.figure()
bars = plt.bar(labels, vals)
plt.axhline(0, linestyle="--", linewidth=1)

# symmetric y-limits around zero with light padding
ylim = max(0.15, max(abs(v) for v in vals) * 1.25)
plt.ylim(-ylim, ylim)

for b, v in zip(bars, vals):
    y  = v + (0.01 if v >= 0 else -0.01)
    va = "bottom" if v >= 0 else "top"
    plt.text(b.get_x() + b.get_width()/2, y, f"{v:.3f}", ha="center", va=va)

plt.ylabel("DrugBias (DrugMean − Neutral)")
plt.title(f"{SUB} — Cue > Neutral bias  |  Craving {craving:.1f}/10 ({cgroup})")
plt.tight_layout()

outpath = f"figs/{SUB}_drugbias.png"
plt.savefig(outpath, dpi=300)
plt.show()
print(f"Saved figure → {outpath}")

# 8) Print copy-paste blocks for the slide (ID block + suggested tokens)
def _sign(v): 
    return "+" if v > 0.01 else "-" if v < -0.01 else "≈0"
def _mag(v):
    a = abs(v)
    # tweak thresholds if you prefer different cut points
    return "small" if a < 0.05 else "moderate" if a < 0.15 else "large"

print(
    f"""--- COPY INTO TABLE 2 (ID block) ---
SUB: {SUB}
DrugBias_fig: {outpath}
cue_craving_mean: {craving:.1f}
craving_group: {cgroup}
Amyg_DrugBias: {Amyg_DrugBias:.3f}
Hipp_DrugBias: {Hipp_DrugBias:.3f}
--- END ---
"""
)

print(
    f"""--- SUGGESTED TOKENS (for right-panel fields) ---
Amyg_DrugBias_sign: {_sign(Amyg_DrugBias)}
Amyg_DrugBias_mag:  {_mag(Amyg_DrugBias)}
Hipp_DrugBias_sign: {_sign(Hipp_DrugBias)}
Hipp_DrugBias_mag:  {_mag(Hipp_DrugBias)}
--- END ---
"""
)


## Part C — Your Hypothesis Test (H₁–H₄)

Each group member is responsible for one hypothesis. Run your assigned test, create a small plot, and interpret results in cautious, non-deterministic language.

| **Member** | **Hypothesis Code** | **Quick Description of Hypothesis** |
|-------------|--------------------|-------------------------------------|
| Member 1 | **H1** | One-sample (one-tailed > 0): mean(Amyg_DrugBias) > 0 — tests if the **amygdala** shows positive cue > neutral bias at the group level. |
| Member 2 | **H2** | One-sample (one-tailed > 0): mean(Hipp_DrugBias) > 0 — tests if the **hippocampus** shows positive cue > neutral bias at the group level. |
| Member 3 | **H3** | Two-group (Welch, one-tailed High > Low): Amyg_DrugBias(High) > Amyg_DrugBias(Low) — tests if **high-craving participants** show stronger amygdala bias. |
| Member 4 | **H4** | Two-group (Welch, one-tailed High > Low): Hipp_DrugBias(High) > Hipp_DrugBias(Low) — tests if **high-craving participants** show stronger hippocampal bias. |

### Your job
(a) Compute *t*, *df*, *p* (one-tailed, consistent with your alternative) and the effect size (*Cohen’s d* for H₁/H₂; *Hedges’ g* for H₃/H₄).  
(b) Make the small plot for your test.  
(c) Write a 1–2 line interpretation in careful, non-deterministic language.

**Note:** We’re not applying multiple-testing correction in code.  
In your **Table 3 slide**, explain the false-positive risk across four tests and name a reasonable correction (e.g., Bonferroni or FDR).  


In [None]:
# Run this function to compute one-tailed p-values from t-statistics

from scipy.stats import t as student_t

def p_one_tailed(t_stat, df, alt='greater'):
    """
    Compute a one-tailed p-value from a t-statistic and degrees of freedom.

    Parameters
    ----------
    t_stat : float
        The observed t value.
    df : int or float
        Degrees of freedom for the test.
    alt : str, optional
        Direction of the alternative hypothesis:
        'greater' for H_A: mean > 0,
        'less' for H_A: mean < 0.

    Returns
    -------
    p_one : float
        One-tailed p-value.
    """
    if alt == 'greater':
        p_one = 1 - student_t.cdf(t_stat, df)
    elif alt == 'less':
        p_one = student_t.cdf(t_stat, df)
    else:
        raise ValueError("alt must be 'greater' or 'less'")
    return p_one


### H1 — One-sample: Amyg_DrugBias > 0

**Context:**  
We’re asking if, on average, the **amygdala** responds more to **drug cues** than to **neutral cues**.

Formally, we test:

$$
H_0: \mu = 0 \quad \text{vs.} \quad H_A: \mu > 0
$$

for the variable `Amyg_DrugBias`.

**Effect size:**  
Cohen’s *d* (mean difference in standard deviation units).  

**Visualization:**  
Plot a histogram of `Amyg_DrugBias` values with a vertical line at 0.

**Report in Table 3:**  
Include the *t*-statistic, degrees of freedom (*df*), one-tailed *p*-value, effect size (*d*), and a 1–2 sentence interpretation (direction, size, and uncertainty).


In [None]:
# --- H1: Amygdala DrugBias > 0 ---

x = df_wide['Amyg_DrugBias'].dropna().to_numpy()
n  = x.size
mean_x = x.mean()
sd_x   = x.std(ddof=1)

# TODO: compute one-sample t statistic for mu0=0
t_stat = 

# TODO: degrees of freedom
df = 

# TODO: one-tailed p for alternative mean>0
p_one = 

# TODO: Cohen's d
d = 

# Figure
plt.figure(figsize=(7,4))
plt.hist(x, bins=20, alpha=0.9)
plt.axvline(0, ls='--', lw=1)
plt.xlabel('Amyg_DrugBias'); plt.ylabel('Count')
plt.title('H1: Amygdala DrugBias (Cue > Neutral)')
plt.tight_layout(); plt.show()

# Row to paste into Table 3
h1_row = {
    'test_id':'H1',
    'test':'Amyg_DrugBias mean > 0',
    't_df':f"t={t_stat:.3f}, df={df}",
    'p':round(p_one, 3),
    'effect_size':f"d={d:.3f}"
}
h1_row


### H2 — One-sample: Hipp_DrugBias > 0

**Context:**  
We’re asking whether, on average, the **hippocampus** responds more to **drug cues** than to **neutral cues**.

Formally, we test:

$$
H_0: \mu = 0 \quad \text{vs.} \quad H_A: \mu > 0
$$

for the variable `Hipp_DrugBias`.

**Effect size:**  
Cohen’s *d* (mean difference in standard deviation units).  

**Visualization:**  
Plot a histogram of `Hipp_DrugBias` values with a vertical line at 0.

**Report in Table 3:**  
Include the *t*-statistic, degrees of freedom (*df*), one-tailed *p*-value, effect size (*d*), and a 1–2 sentence interpretation (direction, size, and uncertainty).


In [None]:
# --- H2: Hippocampus DrugBias > 0 ---

x = df_wide['Hipp_DrugBias'].dropna().to_numpy()
n  = x.size
mean_x = x.mean()
sd_x   = x.std(ddof=1)

# TODO: compute one-sample t statistic for mu0>0
t_stat = 

# TODO: degrees of freedom
df = 

# TODO: one-tailed p for alternative mean>0
p_one = 

# TODO: Cohen's d
d = 


plt.figure(figsize=(7,4))
plt.hist(x, bins=20, alpha=0.9)
plt.axvline(0, ls='--', lw=1)
plt.xlabel('Hipp_DrugBias'); plt.ylabel('Count')
plt.title('H2: Hippocampus DrugBias (Cue > Neutral)')
plt.tight_layout(); plt.show()

h2_row = {
    'test_id':'H2',
    'test':'Hipp_DrugBias mean > 0',
    't_df':f"t={t_stat:.3f}, df={df}",
    'p':round(p_one, 3),
    'effect_size':f"d={d:.3f}"
}
h2_row


### H3 — Group Difference (Welch): High > Low on Amyg_DrugBias

**Context:**  
We’re asking whether **High-craving** participants show greater **amygdala DrugBias** than **Low-craving** participants.  
This uses **Welch’s t-test** (unequal variances, unequal sample sizes) with Satterthwaite-adjusted degrees of freedom.

Formally, we test:

$$
H_0: \mu_{\text{High}} = \mu_{\text{Low}} \quad \text{vs.} \quad H_A: \mu_{\text{High}} > \mu_{\text{Low}}
$$

for the variable `Amyg_DrugBias`.

**Effect size:**  
Hedges’ *g* (a small-sample correction of Cohen’s *d*).  

**Visualization:**  
Create a boxplot (or violin plot) comparing High vs Low groups for `Amyg_DrugBias`.

**Report in Table 3:**  
Include the *t*-statistic, degrees of freedom (*df*), one-tailed *p*-value, effect size (*g*), and a 1–2 sentence interpretation (direction, size, and uncertainty).




In [None]:
# --- H3: High vs Low (Amygdala DrugBias) ---

hi = df_wide.query("craving_group == 'High'")['Amyg_DrugBias'].dropna().to_numpy()
lo = df_wide.query("craving_group == 'Low'")['Amyg_DrugBias'].dropna().to_numpy()

n1, n2 = len(hi), len(lo)
mean1, mean2 = hi.mean(), lo.mean()
var1, var2 = hi.var(ddof=1), lo.var(ddof=1)

# TODO: compute Welch's t statistic
t_stat = 

# TODO: compute Satterthwaite degrees of freedom
df = ((var1/n1 + var2/n2)**2) / ((var1**2 / ((n1**2)*(n1-1))) + (var2**2 / ((n2**2)*(n2-1))))

# TODO: one-tailed p for High > Low
p_one = 

# TODO: pooled SD and Hedges' g
pooled_sd = np.sqrt(((n1-1)*var1 + (n2-1)*var2) / (n1+n2-2))
d = 
correction = 1 - (3 / (4*(n1+n2) - 9))
g = d * correction

# Plot
plt.figure(figsize=(7,4))
plt.boxplot([lo, hi], labels=['Low', 'High'], patch_artist=True,
            boxprops=dict(facecolor='lightgray'), medianprops=dict(color='black'))
plt.ylabel('Amyg_DrugBias')
plt.title('H3: High vs Low (Amygdala DrugBias)')
plt.tight_layout(); plt.show()

h3_row = {
    'test_id':'H3',
    'test':'High > Low (Amyg_DrugBias)',
    't_df':f"t={t_stat:.3f}, df={df:.1f}",
    'p':round(p_one, 3),
    'effect_size':f"g={g:.3f}"
}
h3_row

### H4 — Group Difference (Welch): High > Low on Hipp_DrugBias

**Context:**  
We’re asking whether **High-craving** participants show greater **hippocampus DrugBias** than **Low-craving** participants.  
This uses **Welch’s t-test** (unequal variances, unequal sample sizes) with Satterthwaite-adjusted degrees of freedom.

Formally, we test:

$$
H_0: \mu_{\text{High}} = \mu_{\text{Low}} \quad \text{vs.} \quad H_A: \mu_{\text{High}} > \mu_{\text{Low}}
$$

for the variable `Hipp_DrugBias`.

**Effect size:**  
Hedges’ *g* (a small-sample correction of Cohen’s *d*).  

**Visualization:**  
Create a boxplot comparing High vs Low groups for `Hipp_DrugBias`.

**Report in Table 3:**  
Include the *t*-statistic, degrees of freedom (*df*), one-tailed *p*-value, effect size (*g*), and a 1–2 sentence interpretation (direction, size, and uncertainty).



In [None]:
# --- H4: High vs Low (Hippocampus DrugBias) ---

hi = df_wide.query("craving_group == 'High'")['Hipp_DrugBias'].dropna().to_numpy()
lo = df_wide.query("craving_group == 'Low'")['Hipp_DrugBias'].dropna().to_numpy()

n1, n2 = len(hi), len(lo)
mean1, mean2 = hi.mean(), lo.mean()
var1, var2 = hi.var(ddof=1), lo.var(ddof=1)

# TODO: compute Welch's t statistic
t_stat = 

# TODO: compute Satterthwaite degrees of freedom
df = ((var1/n1 + var2/n2)**2) / ((var1**2 / ((n1**2)*(n1-1))) + (var2**2 / ((n2**2)*(n2-1))))

# TODO: one-tailed p for High > Low
p_one = 

# TODO: pooled SD and Hedges' g
pooled_sd = np.sqrt(((n1-1)*var1 + (n2-1)*var2) / (n1+n2-2))
d = 
correction = 1 - (3 / (4*(n1+n2) - 9))
g = d * correction

# Plot
plt.figure(figsize=(7,4))
plt.boxplot([lo, hi], labels=['Low', 'High'], patch_artist=True,
            boxprops=dict(facecolor='lightgray'), medianprops=dict(color='black'))
plt.ylabel('Hipp_DrugBias')
plt.title('H4: High vs Low (Hippocampus DrugBias)')
plt.tight_layout(); plt.show()

h4_row = {
    'test_id':'H4',
    'test':'High > Low (Hipp_DrugBias)',
    't_df':f"t={t_stat:.3f}, df={df:.1f}",
    'p':round(p_one, 3),
    'effect_size':f"g={g:.3f}"
}
h4_row
