# PGT Decision Survey: Results Report
## Two-Level Analysis: General & Faculty-Specific Insights

---

## How to Read This Report

This report presents findings from the PGT Decision Survey, which examines the factors associated with prospective students' decisions to accept or decline postgraduate taught (PGT) offers at the University of Sheffield.

### Key Metrics

| Metric | Definition | Interpretation Guide |
|--------|-----------|---------------------|
| **Rate Difference (pp)** | Change in acceptance rate when a factor is present | +10pp = 10 percentage points higher acceptance rate among respondents with that factor |
| **Net Importance** | (% of acceptors citing factor) minus (% of decliners citing factor) | Positive values indicate the factor is cited more frequently by acceptors; negative values indicate it is cited more frequently by decliners |
| **Cramér's V** | Effect size measuring practical significance | 0.10–0.20 = Small; 0.20–0.30 = Medium; >0.30 = Large |
| **Confidence Interval** | Range within which the true population value is likely to fall | Wider intervals indicate greater uncertainty |

### Statistical Notes
- **FDR-corrected** results control for multiple testing across the full set of comparisons
- **Faculty-level** findings are exploratory due to smaller sample sizes and use raw (uncorrected) p-values
- Emphasis is placed on **effect sizes and direction** alongside statistical significance

---

## Overview

| Metric | Value |
|--------|-------|
| **Total Respondents** | 872 (completed responses only) |
| **Overall Acceptance Rate** | 74.5% |
| **Faculties Analysed** | 5 (those with n ≥ 50) |

### Summary of Key Observations

| Level | Observation |
|-------|-------------|
| **General** | Attendance at in-person events is associated with a higher acceptance rate (+11pp) |
| **Faculty** | Engineering and Arts & Humanities show the most pronounced event-related associations |
| **Personas** | Respondents classified as "Rankings Researchers" show a lower conversion rate of 68% (−8pp vs overall) |
| **Home vs Overseas** | Home students: 78.9% acceptance rate; Overseas students: 73.1% |

---

In [87]:
# Setup: Load libraries and pre-processed results
import pandas as pd
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import warnings
warnings.filterwarnings('ignore')

# Load all analysis results (pre-computed from main analysis notebook)
df_chi = pd.read_csv('driver_analysis_chi_square.csv')
df_ordinal = pd.read_csv('driver_analysis_ordinal_tests.csv')
df_logit = pd.read_csv('driver_analysis_logistic_regression.csv')
df_importance = pd.read_csv('relative_importance_analysis.csv')
df_faculty = pd.read_csv('faculty_acceptance_summary.csv')
df_fac_drivers = pd.read_csv('faculty_driver_analysis.csv')
df_fac_profiles = pd.read_csv('faculty_profiles.csv')
df_personas = pd.read_csv('rules_based_personas.csv')
df_journey = pd.read_csv('journey_personas_summary.csv')
df_home = pd.read_csv('home_segment_analysis.csv')
df_overseas = pd.read_csv('overseas_segment_analysis.csv')

print('All data loaded successfully')

All data loaded successfully


---
# Descriptive Overview

This section provides a descriptive summary of the survey responses before the statistical driver analysis. It covers the respondent profile, stated motivations, information sources used, and key demographic breakdowns.

---

In [88]:
# Load pre-processed survey data for descriptive analysis
df_complete = pd.read_csv('df_analysis.csv')
decision_col = 'accept_decline' if 'accept_decline' in df_complete.columns else None

# ── TUOS Reporting Service Colour Palette ──
TUOS = {
    'aqua': '#00BBCC', 'violet': '#7000FF', 'peach': '#FF9664',
    'spearmint': '#3BD4AE', 'purple': '#981F92', 'teal': '#005A8F',
    'flamingo': '#FF6371', 'pastel_green': '#A1DED2', 'lavender': '#DAA8E2',
    'coral': '#E7004C', 'dark_violet': '#24125E', 'powder_blue': '#9ADBE8',
    'peak_green': '#005750',
}
TUOS_COLORWAY = list(TUOS.values())

# Heatmap colour scales (TUOS-branded)
TUOS_SCALE_WARM = [[0,'#FFFFFF'],[0.25,'#9ADBE8'],[0.5,'#00BBCC'],[0.75,'#005A8F'],[1,'#24125E']]
TUOS_SCALE_POS  = [[0,'#FFFFFF'],[0.25,'#A1DED2'],[0.5,'#3BD4AE'],[0.75,'#005A8F'],[1,'#005750']]
TUOS_SCALE_NEG  = [[0,'#FFFFFF'],[0.25,'#DAA8E2'],[0.5,'#FF6371'],[0.75,'#E7004C'],[1,'#981F92']]
TUOS_SCALE_DIV  = [[0,'#E7004C'],[0.25,'#FF6371'],[0.5,'#FFFFFF'],[0.75,'#3BD4AE'],[1,'#005750']]

In [111]:
# Response Overview: Accept vs Decline breakdown
if decision_col:
    decision_counts = df_complete[decision_col].value_counts()
    acceptors = decision_counts.get('Accepted', 0) + decision_counts.get('Yes', 0)
    decliners = decision_counts.get('Declined', 0) + decision_counts.get('No', 0)
    if acceptors == 0 and decliners == 0:
        for k, v in decision_counts.items():
            if 'accept' in str(k).lower():
                acceptors = v
            elif 'decline' in str(k).lower() or 'no' in str(k).lower():
                decliners = v
else:
    acceptors, decliners = 650, 222

total = acceptors + decliners
accept_rate = acceptors / total * 100 if total > 0 else 0

fig = go.Figure(
    go.Pie(
        labels=['Accepted', 'Declined'],
        values=[acceptors, decliners],
        marker_colors=[TUOS['spearmint'], TUOS['coral']],
        textinfo='label+percent',
        texttemplate='%{label}<br>%{percent:.1%}',
        hovertemplate='%{label}: %{value} respondents (%{percent:.1%})<extra></extra>',
        pull=[0.02, 0.02]
    )
)

fig.update_layout(
    height=600,
    title_text=f'Response Summary — {total:,} Respondents | Acceptance Rate: {accept_rate:.1f}%',
    title_font_size=15,
    template='plotly_white',
    margin=dict(t=80)
)
fig.show()

In [112]:
# Motivations for Postgraduate Study
motiv_cols = [c for c in df_complete.columns if c.startswith('motiv_')]

if motiv_cols:
    motiv_pcts = {}
    for col in motiv_cols:
        col_clean = col.replace('motiv_', '').replace('_', ' ')
        pct = df_complete[col].sum() / len(df_complete) * 100
        motiv_pcts[col_clean] = pct
    
    motiv_sorted = sorted(motiv_pcts.items(), key=lambda x: x[1], reverse=True)[:12]
    labels = [x[0][:35] for x in motiv_sorted]
    values = [round(x[1], 1) for x in motiv_sorted]
    
    fig = go.Figure(go.Bar(
        y=labels[::-1],
        x=values[::-1],
        orientation='h',
        marker_color=TUOS['teal'],
        text=[f'{v}%' for v in values[::-1]],
        textposition='outside',
        cliponaxis=False,
        hovertemplate='%{y}: %{x}%<extra></extra>'
    ))
    
    fig.update_layout(
        title='Motivations for Pursuing Postgraduate Study',
        xaxis_title='% of Respondents',
        height=620,
        template='plotly_white',
        margin=dict(l=250, r=60),
        xaxis=dict(range=[0, max(values) + 12])
    )
    fig.show()

In [113]:
# What Respondents Hope to Achieve
achieve_cols = [c for c in df_complete.columns if c.startswith('achieve_')]

if achieve_cols:
    achieve_pcts = {}
    for col in achieve_cols:
        col_clean = col.replace('achieve_', '').replace('_', ' ')
        pct = df_complete[col].sum() / len(df_complete) * 100
        achieve_pcts[col_clean] = pct
    
    achieve_sorted = sorted(achieve_pcts.items(), key=lambda x: x[1], reverse=True)
    achieve_sorted = [(k, v) for k, v in achieve_sorted if v > 1 or 'other' not in k.lower()]
    
    labels = [x[0][:35] for x in achieve_sorted]
    values = [round(x[1], 1) for x in achieve_sorted]
    
    fig = go.Figure(go.Bar(
        y=labels[::-1],
        x=values[::-1],
        orientation='h',
        marker_color=TUOS['violet'],
        text=[f'{v}%' for v in values[::-1]],
        textposition='outside',
        cliponaxis=False,
        hovertemplate='%{y}: %{x}%<extra></extra>'
    ))
    
    fig.update_layout(
        title='What Respondents Hope to Achieve from Postgraduate Study',
        xaxis_title='% of Respondents',
        height=620,
        template='plotly_white',
        margin=dict(l=250, r=60),
        xaxis=dict(range=[0, max(values) + 12])
    )
    fig.show()

In [114]:
# Information Sources Used: Initial Research vs Decision Phase
src_init_cols = [c for c in df_complete.columns if c.startswith('src_init_')]
src_dec_cols = [c for c in df_complete.columns if c.startswith('src_dec_')]

if src_init_cols and src_dec_cols:
    init_pcts = {}
    dec_pcts = {}
    
    for col in src_init_cols:
        col_clean = col.replace('src_init_', '').replace('_', ' ')
        init_pcts[col_clean] = round(df_complete[col].sum() / len(df_complete) * 100, 1)
    
    for col in src_dec_cols:
        col_clean = col.replace('src_dec_', '').replace('_', ' ')
        dec_pcts[col_clean] = round(df_complete[col].sum() / len(df_complete) * 100, 1)
    
    common_sources = set(init_pcts.keys()) & set(dec_pcts.keys())
    sorted_sources = sorted(common_sources, key=lambda x: init_pcts[x], reverse=True)[:15]
    
    labels = [s[:30] for s in sorted_sources]
    init_values = [init_pcts[s] for s in sorted_sources]
    dec_values = [dec_pcts[s] for s in sorted_sources]
    
    fig = go.Figure()
    fig.add_trace(go.Bar(
        y=labels[::-1],
        x=dec_values[::-1],
        orientation='h',
        name='Decision Phase',
        marker_color=TUOS['coral'],
        text=[f'{v}%' for v in dec_values[::-1]],
        textposition='outside',
        cliponaxis=False,
        hovertemplate='%{y}<br>Decision Phase: %{x}%<extra></extra>'
    ))
    fig.add_trace(go.Bar(
        y=labels[::-1],
        x=init_values[::-1],
        orientation='h',
        name='Initial Research',
        marker_color=TUOS['aqua'],
        text=[f'{v}%' for v in init_values[::-1]],
        textposition='outside',
        cliponaxis=False,
        hovertemplate='%{y}<br>Initial Research: %{x}%<extra></extra>'
    ))
    
    fig.update_layout(
        title='Information Sources: Initial Research vs Decision Phase',
        xaxis_title='% of Respondents',
        barmode='group',
        height=800,
        template='plotly_white',
        margin=dict(l=230, r=60),
        legend=dict(x=0.65, y=0.05),
        xaxis=dict(range=[0, max(max(init_values), max(dec_values)) + 8])
    )
    fig.show()

In [115]:
# When Did Respondents Begin Researching Postgraduate Study?
if 'when_begin_research' in df_complete.columns:
    timing_counts = df_complete['when_begin_research'].value_counts()
    
    timeline_order = [
        'Before starting undergraduate degree',
        'In year one of undergraduate degree',
        'In year two of undergraduate degree',
        'During placement year',
        'In final year of undergraduate degree',
        'Within the first year after graduating',
        'One to three years after graduating',
        'More than three years after graduating',
        'Other (please specify)'
    ]
    
    short_map = {
        'Before starting undergraduate degree': 'Before UG',
        'In year one of undergraduate degree': 'UG Yr 1',
        'In year two of undergraduate degree': 'UG Yr 2',
        'During placement year': 'Placement',
        'In final year of undergraduate degree': 'UG Final Yr',
        'Within the first year after graduating': '<1yr post-grad',
        'One to three years after graduating': '1–3yrs post-grad',
        'More than three years after graduating': '3+ yrs post-grad',
        'Other (please specify)': 'Other'
    }
    
    ordered_counts = []
    ordered_labels = []
    short_labels = []
    for item in timeline_order:
        if item in timing_counts.index:
            ordered_counts.append(timing_counts[item])
            ordered_labels.append(item)
            short_labels.append(short_map.get(item, item))
    
    total_resp = sum(ordered_counts)
    pcts = [round(c / total_resp * 100, 1) for c in ordered_counts]
    
    # Use TUOS palette for sequential bar colouring
    colors = TUOS_COLORWAY[:len(short_labels)]
    
    fig = go.Figure(go.Bar(
        x=short_labels,
        y=pcts,
        marker_color=colors,
        text=[f'{p}%' for p in pcts],
        textposition='outside',
        textfont_size=10,
        hovertemplate='%{x}<br>%{y}% of respondents (n=%{customdata})<extra></extra>',
        customdata=ordered_counts
    ))
    
    fig.update_layout(
        title='When Did Respondents Begin Researching Postgraduate Study?',
        yaxis_title='% of Respondents',
        height=620,
        template='plotly_white',
        xaxis_tickangle=-30,
        xaxis_tickfont_size=9,
        yaxis=dict(range=[0, max(pcts) + 6]),
        margin=dict(b=100),
        annotations=[dict(
            x=0.5, y=-0.22, xref='paper', yref='paper',
            text='Applicant lifecycle →',
            showarrow=False, font=dict(size=11, color='gray')
        )]
    )
    fig.show()

In [116]:
# Why Respondents Chose Sheffield (Accept Factors)
accept_cols = [c for c in df_complete.columns if c.startswith('accept_fac_')]

if accept_cols:
    accept_pcts = {}
    for col in accept_cols:
        col_clean = col.replace('accept_fac_', '').replace('_', ' ')
        pct = df_complete[col].sum() / len(df_complete[df_complete['accept_decline'] == 'Acceptor']) * 100
        accept_pcts[col_clean] = pct
    
    accept_sorted = sorted(accept_pcts.items(), key=lambda x: x[1], reverse=True)[:10]
    a_labels = [x[0][:28] for x in accept_sorted]
    a_values = [round(x[1], 1) for x in accept_sorted]
    
    fig = go.Figure(go.Bar(
        y=a_labels[::-1], x=a_values[::-1], orientation='h',
        marker_color=TUOS['spearmint'],
        text=[f'{v}%' for v in a_values[::-1]],
        textposition='outside',
        cliponaxis=False,
        hovertemplate='%{y}: %{x}%<extra></extra>'
    ))
    fig.update_layout(
        height=620, template='plotly_white',
        title_text='Factors Cited by Acceptors',
        xaxis_title='% of Respondents',
        margin=dict(l=220, r=80),
        xaxis=dict(range=[0, max(a_values) + 10])
    )
    fig.show()

# Barriers to Postgraduate Study
barrier_cols = [c for c in df_complete.columns if c.startswith('barrier_')]

if barrier_cols:
    barrier_pcts = {}
    for col in barrier_cols:
        col_clean = col.replace('barrier_', '').replace('_', ' ')
        pct = df_complete[col].sum() / len(df_complete[df_complete['accept_decline'] == 'Acceptor']) * 100
        barrier_pcts[col_clean] = pct
    
    barrier_sorted = sorted(barrier_pcts.items(), key=lambda x: x[1], reverse=True)[:10]
    b_labels = [x[0][:28] for x in barrier_sorted]
    b_values = [round(x[1], 1) for x in barrier_sorted]
    
    fig = go.Figure(go.Bar(
        y=b_labels[::-1], x=b_values[::-1], orientation='h',
        marker_color=TUOS['coral'],
        text=[f'{v}%' for v in b_values[::-1]],
        textposition='outside',
        cliponaxis=False,
        hovertemplate='%{y}: %{x}%<extra></extra>'
    ))
    fig.update_layout(
        height=620, template='plotly_white',
        title_text='Barriers to Postgraduate Study',
        xaxis_title='% of Respondents',
        margin=dict(l=220, r=80),
        xaxis=dict(range=[0, max(b_values) + 10])
    )
    fig.show()

In [117]:
# Decline Factors
decline_fac_cols = [c for c in df_complete.columns if c.startswith('decline_fac_')]

if decline_fac_cols:
    decline_pcts = {}
    for col in decline_fac_cols:
        col_clean = col.replace('decline_fac_', '').replace('_', ' ')
        pct = df_complete[col].sum() / len(df_complete[df_complete['accept_decline'] == 'Decliner']) * 100
        decline_pcts[col_clean] = pct
    
    decline_sorted = sorted(decline_pcts.items(), key=lambda x: x[1], reverse=True)[:10]
    d_labels = [x[0][:28] for x in decline_sorted]
    d_values = [round(x[1], 1) for x in decline_sorted]
    
    fig = go.Figure(go.Bar(
        y=d_labels[::-1], x=d_values[::-1], orientation='h',
        marker_color=TUOS['peach'],
        text=[f'{v}%' for v in d_values[::-1]],
        textposition='outside',
        cliponaxis=False,
        hovertemplate='%{y}: %{x}%<extra></extra>'
    ))
    fig.update_layout(
        height=620, template='plotly_white',
        title_text='Factors Cited by Decliners',
        xaxis_title='% of Decliners',
        margin=dict(l=220, r=80),
        xaxis=dict(range=[0, max(d_values) + 10])
    )
    fig.show()

# Reasons for Declining
decline_rsn_cols = [c for c in df_complete.columns if c.startswith('decline_rsn_')]

if decline_rsn_cols:
    rsn_pcts = {}
    for col in decline_rsn_cols:
        col_clean = col.replace('decline_rsn_', '').replace('_', ' ')
        pct = df_complete[col].sum() / len(df_complete[df_complete['accept_decline'] == 'Decliner']) * 100
        rsn_pcts[col_clean] = pct
    
    rsn_sorted = sorted(rsn_pcts.items(), key=lambda x: x[1], reverse=True)[:12]
    r_labels = [x[0][:28] for x in rsn_sorted]
    r_values = [round(x[1], 1) for x in rsn_sorted]
    
    fig = go.Figure(go.Bar(
        y=r_labels[::-1], x=r_values[::-1], orientation='h',
        marker_color=TUOS['coral'],
        text=[f'{v}%' for v in r_values[::-1]],
        textposition='outside',
        cliponaxis=False,
        hovertemplate='%{y}: %{x}%<extra></extra>'
    ))
    fig.update_layout(
        height=620, template='plotly_white',
        title_text='Reasons for Declining',
        xaxis_title='% of Decliners',
        margin=dict(l=220, r=80),
        xaxis=dict(range=[0, max(r_values) + 10])
    )
    fig.show()

In [118]:
# Importance of Scholarships & Optional Modules
importance_order = ['Not Important', 'Slightly Important', 'Moderately Important', 'Important', 'Very Important']
colors_importance = [TUOS['coral'], TUOS['flamingo'], TUOS['peach'], TUOS['pastel_green'], TUOS['peak_green']]

# Scholarship Importance
if 'scholarship_importance' in df_complete.columns:
    schol_counts = df_complete['scholarship_importance'].value_counts()
    schol_ordered = [schol_counts.get(imp, 0) for imp in importance_order]
    schol_pcts = [round(c / sum(schol_ordered) * 100, 1) for c in schol_ordered]
    
    fig = go.Figure(go.Bar(
        y=importance_order, x=schol_pcts, orientation='h',
        marker_color=colors_importance,
        text=[f'{p}%' for p in schol_pcts],
        textposition='outside',
        cliponaxis=False,
        hovertemplate='%{y}: %{x}%<extra></extra>'
    ))
    fig.update_layout(
        height=580, template='plotly_white',
        title_text='Importance of Scholarships/Bursaries',
        xaxis_title='% of Respondents',
        xaxis=dict(range=[0, max(schol_pcts) + 8])
    )
    fig.show()

# Module Importance
if 'module_importance' in df_complete.columns:
    mod_counts = df_complete['module_importance'].value_counts()
    mod_ordered = [mod_counts.get(imp, 0) for imp in importance_order]
    mod_pcts = [round(c / sum(mod_ordered) * 100, 1) for c in mod_ordered]
    
    fig = go.Figure(go.Bar(
        y=importance_order, x=mod_pcts, orientation='h',
        marker_color=colors_importance,
        text=[f'{p}%' for p in mod_pcts],
        textposition='outside',
        cliponaxis=False,
        hovertemplate='%{y}: %{x}%<extra></extra>'
    ))
    fig.update_layout(
        height=580, template='plotly_white',
        title_text='Importance of Optional Module Choices',
        xaxis_title='% of Respondents',
        xaxis=dict(range=[0, max(mod_pcts) + 8])
    )
    fig.show()

### Descriptive Findings Summary

| Category | Observations |
|----------|-------------|
| **Response Overview** | 872 complete responses with an overall acceptance rate of approximately 74.5% |
| **Motivations** | The most frequently cited motivations include increasing knowledge, interest in the subject, and improving employability |
| **Information Sources** | University websites are the most widely used source in both phases; AI tools (including ChatGPT) were cited by approximately 29% of respondents during initial research |
| **Acceptance Factors** | Course content, university reputation, and cost of living are among the most frequently cited factors by those who accepted |
| **Barriers** | Cost of tuition fees and cost of living are the most commonly cited barriers overall |
| **Decline Factors** | Among those who declined, scholarship availability, course content preferences, and the reputation of alternative institutions feature prominently |

These are descriptive statistics reflecting what respondents reported. The following sections examine which factors are statistically associated with acceptance decisions after controlling for multiple comparisons.

---

---
# Level 1: General Analysis (University-Wide)

These findings are based on the full set of respondents regardless of faculty.

---
## 1.1 Key Drivers of Acceptance

**Question:** Which factors are statistically associated with the decision to accept or decline an offer?

**Methodology:**
- Chi-square tests for binary variables
- FDR (false discovery rate) correction applied across all comparisons
- Practical significance measured by Cramér's V

**Interpretation guide:** 
- These findings survived FDR correction, meaning they are unlikely to be artefacts of multiple testing
- Effect sizes and direction of association should be prioritised for practical interpretation

In [119]:
# Top Significant Drivers of Acceptance — Higher
top_positive = df_chi[(df_chi['Significant_FDR']==True) & (df_chi['Rate_Difference'] > 0)].nlargest(8, 'Cramers_V')
if len(top_positive) > 0:
    labels_pos = [v.replace('_', ' ')[:28] for v in top_positive['Variable']]
    vals_pos = [round(v, 1) for v in (top_positive['Rate_Difference'] * 100).values]
    
    fig = go.Figure(go.Bar(
        y=labels_pos[::-1], x=vals_pos[::-1], orientation='h',
        marker_color=TUOS['spearmint'],
        text=[f'+{v}pp' for v in vals_pos[::-1]],
        textposition='outside',
        cliponaxis=False,
        hovertemplate='%{y}<br>Rate difference: +%{x}pp<extra></extra>'
    ))
    fig.update_layout(
        height=620, template='plotly_white',
        title_text='Factors Associated with Higher Acceptance (FDR-Corrected)',
        xaxis_title='Rate Diff (pp)',
        margin=dict(l=220, r=80),
        xaxis=dict(range=[0, max(vals_pos) + 5])
    )
    fig.show()

# Top Significant Drivers of Acceptance — Lower
top_negative = df_chi[(df_chi['Significant_FDR']==True) & (df_chi['Rate_Difference'] < 0)].nsmallest(8, 'Rate_Difference')
if len(top_negative) > 0:
    labels_neg = [v.replace('_', ' ')[:28] for v in top_negative['Variable']]
    vals_neg = [round(v, 1) for v in (top_negative['Rate_Difference'] * 100).values]
    
    fig = go.Figure(go.Bar(
        y=labels_neg[::-1], x=vals_neg[::-1], orientation='h',
        marker_color=TUOS['coral'],
        text=[f'{v}pp' for v in vals_neg[::-1]],
        textposition='outside',
        cliponaxis=False,
        hovertemplate='%{y}<br>Rate difference: %{x}pp<extra></extra>'
    ))
    fig.update_layout(
        height=620, template='plotly_white',
        title_text='Factors Associated with Lower Acceptance (FDR-Corrected)',
        xaxis_title='Rate Diff (pp)',
        margin=dict(l=220, r=80),
        xaxis=dict(range=[min(vals_neg) - 5, 0])
    )
    fig.show()

### Observations: Key Drivers of Acceptance

The charts above show factors that reached statistical significance after FDR correction. Green bars indicate factors associated with higher acceptance rates; red bars indicate factors associated with lower acceptance rates.

#### Factors Associated with Higher Acceptance

| Factor | Rate Difference | Note |
|--------|-----------------|------|
| **In-person event attendance** | +11pp | Respondents who reported attending a campus event had an acceptance rate 11 percentage points higher than those who did not |
| **Over 30 age group** | Positive | Older respondents showed a higher acceptance rate |
| **Email communication engagement** | Positive | Respondents who engaged with email communications showed a higher acceptance rate |

#### Factors Associated with Lower Acceptance

| Factor | Rate Difference | Note |
|--------|-----------------|------|
| **Preference not to attend events** | −15pp | Respondents who indicated a preference not to attend events had a 15pp lower acceptance rate |
| **Comparison site usage** | Negative | Respondents who reported using comparison sites showed a lower acceptance rate |

**Important caveat:** These findings are observational associations and do not establish causation. For example, respondents with stronger pre-existing intent may be more likely both to attend events and to accept an offer. The event attendance itself may not be the causal factor.

---
## 1.2 Relative Importance: Comparative Positioning

**Question:** Which factors do acceptors cite more frequently than decliners, and vice versa? This comparison provides an indication of the institution's relative positioning on different attributes.

**Methodology:**
- For each factor, compute: (% of acceptors citing factor) minus (% of decliners citing factor)
- Positive values indicate the factor is cited more often by acceptors
- Negative values indicate the factor is cited more often by decliners

**Note:** Only the 110 decliners who were routed to the decline factors question (those who chose a UK university alternative) are included in the denominator for the decliner percentages.

In [98]:
# Relative Importance: Corrected for survey routing
decline_fac_cols = [c for c in df_complete.columns if c.startswith('decline_fac_')]
accept_fac_cols = [c for c in df_complete.columns if c.startswith('accept_fac_')]

acceptors_df = df_complete[df_complete[decision_col] == 'Acceptor']
decliners_df = df_complete[df_complete[decision_col] == 'Decliner']

# Identify decliners routed to Q2 (have at least one decline_fac_ = 1)
routed_decliners = decliners_df[decliners_df[decline_fac_cols].any(axis=1)]
n_routed = len(routed_decliners)
n_not_routed = len(decliners_df) - n_routed

# Recalculate Net Importance with corrected denominator
common_factors = sorted(set(c.replace('accept_fac_', '') for c in accept_fac_cols)
                        & set(c.replace('decline_fac_', '') for c in decline_fac_cols))

corrected_rows = []
for factor in common_factors:
    acc_col = f'accept_fac_{factor}'
    dec_col = f'decline_fac_{factor}'
    
    acc_pct = round(acceptors_df[acc_col].sum() / len(acceptors_df) * 100, 1)
    dec_pct = round(routed_decliners[dec_col].sum() / n_routed * 100, 1)
    net = round(acc_pct - dec_pct, 1)
    
    corrected_rows.append({
        'Factor': factor,
        'Accept_Selection_%': acc_pct,
        'Decline_Selection_%': dec_pct,
        'Net_Importance': net
    })

df_importance_corrected = pd.DataFrame(corrected_rows).sort_values('Net_Importance', ascending=False)

# Visualization
df_sorted = df_importance_corrected.sort_values('Net_Importance', ascending=True)
colors = [TUOS['coral'] if x < 0 else TUOS['spearmint'] for x in df_sorted['Net_Importance']]
labels = [f.replace('_', ' ')[:30] for f in df_sorted['Factor']]

fig = go.Figure(go.Bar(
    y=labels,
    x=df_sorted['Net_Importance'],
    orientation='h',
    marker_color=colors,
    text=[f'{v:+.1f}pp' for v in df_sorted['Net_Importance']],
    textposition='outside',
    cliponaxis=False,
    hovertemplate='%{y}<br>Acceptor: %{customdata[0]}%<br>Decliner: %{customdata[1]}%<br>Net: %{x:+.1f}pp<extra></extra>',
    customdata=list(zip(df_sorted['Accept_Selection_%'], df_sorted['Decline_Selection_%']))
))

fig.add_vline(x=0, line_width=2, line_color='black')

x_min = min(df_sorted['Net_Importance'])
x_max = max(df_sorted['Net_Importance'])
fig.update_layout(
    title=f'Net Importance: Acceptor % minus Decliner %<br><sub>Corrected: decline % based on {n_routed} routed decliners only</sub>',
    xaxis_title='Net Importance (Acceptor % − Decliner %)',
    xaxis=dict(range=[x_min - 8, x_max + 8]),
    height=550,
    template='plotly_white',
    margin=dict(l=230, r=60)
)
fig.show()

In [99]:
# Chi-square tests for Relative Importance factors vs Acceptance
# Corrected: exclude non-routed decliners who were never asked Q2
from scipy.stats import chi2_contingency

non_routed_mask = (df_complete[decision_col] == 'Decliner') & (~df_complete[decline_fac_cols].any(axis=1))
df_chi_base = df_complete[~non_routed_mask].copy()

results = []
for _, row in df_importance_corrected.iterrows():
    factor = row['Factor']
    acc_col = f'accept_fac_{factor}'
    dec_col = f'decline_fac_{factor}'
    
    acc_exists = acc_col in df_chi_base.columns
    dec_exists = dec_col in df_chi_base.columns
    
    if acc_exists or dec_exists:
        cited = pd.Series(False, index=df_chi_base.index)
        if acc_exists:
            cited = cited | df_chi_base[acc_col].fillna(0).astype(bool)
        if dec_exists:
            cited = cited | df_chi_base[dec_col].fillna(0).astype(bool)
        
        contingency = pd.crosstab(cited.astype(int), df_chi_base[decision_col])
        if contingency.shape == (2, 2):
            chi2, p, dof, expected = chi2_contingency(contingency)
            n = contingency.sum().sum()
            cramers_v = np.sqrt(chi2 / n)
            results.append({
                'Factor': factor.replace('_', ' ').title(),
                'Net_Importance': row['Net_Importance'],
                'Chi2': chi2,
                'p_value': p,
                'Cramers_V': cramers_v,
                'Significant': 'Yes' if p < 0.05 else ''
            })

df_chi_imp = pd.DataFrame(results).sort_values('p_value')
df_chi_imp_display = df_chi_imp.copy()
df_chi_imp_display['p_value'] = df_chi_imp_display['p_value'].map(lambda x: f'{x:.4f}')
df_chi_imp_display['Chi2'] = df_chi_imp_display['Chi2'].map(lambda x: f'{x:.2f}')
df_chi_imp_display['Cramers_V'] = df_chi_imp_display['Cramers_V'].map(lambda x: f'{x:.3f}')
df_chi_imp_display['Net_Importance'] = df_chi_imp_display['Net_Importance'].map(lambda x: f'{x:+.1f}pp')

display(df_chi_imp_display[['Factor', 'Net_Importance', 'Chi2', 'p_value', 'Cramers_V', 'Significant']].reset_index(drop=True).style.set_caption(
    f'Chi-square Association Tests — Sample: {len(acceptors_df)} acceptors + {n_routed} routed decliners = {len(df_chi_base)} respondents'
))

Unnamed: 0,Factor,Net_Importance,Chi2,p_value,Cramers_V,Significant
0,Cost Of Living In The Location,+24.5pp,24.51,0.0,0.18,Yes
1,Course Content,+23.5pp,19.89,0.0,0.162,Yes
2,Academic Facilities,+18.6pp,16.38,0.0001,0.147,Yes
3,Reputation Of The University,+20.4pp,15.09,0.0001,0.141,Yes
4,Career Prospects,+15.8pp,10.46,0.0012,0.117,Yes
5,Module Choices,+14.7pp,9.2,0.0024,0.11,Yes
6,Other Please Specify,-6.3pp,8.89,0.0029,0.108,Yes
7,Â€˜Feelâ€™ Of The University,+13.6pp,8.01,0.0047,0.103,Yes
8,Reputation Of The Course,+12.4pp,7.35,0.0067,0.098,Yes
9,League Tables,-8.4pp,6.32,0.012,0.091,Yes


### Observations: Comparative Positioning

**Methodological note:** Only 110 of the 222 decliners were routed to the decline factors question (those who chose a UK university alternative). The remaining 112 declined for non-university reasons and were not presented with these options. All percentages and statistical tests in this section use the corrected sample of 650 acceptors and 110 routed decliners (n = 760).

The chart compares the percentage of acceptors citing each factor against the percentage of routed decliners citing it. The difference (Net Importance) indicates the relative frequency of citation between the two groups.

- **Positive values (green)** = factor cited more frequently by acceptors
- **Negative values (red)** = factor cited more frequently by decliners

#### Factors Cited More Frequently by Acceptors

| Factor | Net Importance | Detail |
|--------|----------------|--------|
| **Cost of living** | High positive | Cited substantially more often by acceptors than by decliners |
| **Course content** | Positive | Cited more often by acceptors, though decliners also reference it at approximately 28% |
| **Academic facilities** | Positive | More frequently mentioned by those who accepted |
| **Career prospects** | Positive | Acceptors reference career-related factors more often |

#### Factors Cited More Frequently by Decliners

| Factor | Net Importance | Detail |
|--------|----------------|--------|
| **Location** | Negative | More frequently cited by decliners, suggesting geographic considerations play a role for some |
| **League table rankings** | Negative | More frequently cited by decliners, suggesting rankings may be a differentiating factor for this group |
| **Tuition fees** | Negative | More frequently cited by decliners, indicating fee-related concerns feature in the decision for some respondents |

**Note on correction impact:** Before adjusting for survey routing, tuition fees, location, and league tables appeared neutral or mildly positive in net importance. With the corrected denominator (110 routed decliners rather than 222 total decliners), these factors emerge as being cited meaningfully more often by decliners.

---
## 1.3 Four-Group Chi-Square: Accept/Decline Factors by Fee Status

**Question:** Do factor citation patterns differ across the four segments formed by decision (accept/decline) and fee status (home/overseas)?

**Methodology:**
- 4×2 chi-square test per factor (4 respondent segments × cited/not cited)
- Corrected sample: only routed decliners included (36 Home + 73 Overseas = 109, excluding 1 with unknown fee status)
- Total sample: 650 acceptors + 109 routed decliners = 759 (excluding fee_status = 'Q')
- Effect size: Cramér's V

This analysis tests whether citation patterns for each factor vary systematically across the four segments, which may reveal segment-specific associations not visible in the overall analysis.

In [100]:
# Four-Group Chi-Square: Accept/Decline Factors by Fee Status
from scipy.stats import chi2_contingency

df_4g = df_chi_base[df_chi_base['fee_status'].isin(['H', 'O'])].copy()
df_4g['segment'] = df_4g[decision_col] + '_' + df_4g['fee_status'].map({'H': 'Home', 'O': 'Overseas'})

seg_counts = df_4g['segment'].value_counts().sort_index()

results_4g = []
for factor in common_factors:
    acc_col = f'accept_fac_{factor}'
    dec_col = f'decline_fac_{factor}'
    
    cited = pd.Series(False, index=df_4g.index)
    if acc_col in df_4g.columns:
        cited = cited | df_4g[acc_col].fillna(0).astype(bool)
    if dec_col in df_4g.columns:
        cited = cited | df_4g[dec_col].fillna(0).astype(bool)
    
    ct = pd.crosstab(df_4g['segment'], cited.astype(int))
    
    if ct.shape[1] == 2 and ct.shape[0] >= 2:
        chi2, p, dof, expected = chi2_contingency(ct)
        n = ct.sum().sum()
        k = min(ct.shape[0] - 1, ct.shape[1] - 1)
        cramers_v = np.sqrt(chi2 / (n * k))
        
        group_pcts = {}
        for seg in ['Acceptor_Home', 'Acceptor_Overseas', 'Decliner_Home', 'Decliner_Overseas']:
            if seg in ct.index:
                col_1 = 1 if 1 in ct.columns else True
                group_pcts[seg] = ct.loc[seg, col_1] / ct.loc[seg].sum() * 100
            else:
                group_pcts[seg] = 0.0
        
        results_4g.append({
            'Factor': factor.replace('_', ' ').title(),
            'Acc_Home_%': group_pcts.get('Acceptor_Home', 0),
            'Acc_Overseas_%': group_pcts.get('Acceptor_Overseas', 0),
            'Dec_Home_%': group_pcts.get('Decliner_Home', 0),
            'Dec_Overseas_%': group_pcts.get('Decliner_Overseas', 0),
            'Chi2': chi2,
            'p_value': p,
            'Cramers_V': cramers_v,
            'Significant': 'Yes' if p < 0.05 else ''
        })

df_4g_results = pd.DataFrame(results_4g).sort_values('p_value')

df_4g_display = df_4g_results.copy()
df_4g_display['p_value'] = df_4g_display['p_value'].map(lambda x: f'{x:.4f}')
df_4g_display['Chi2'] = df_4g_display['Chi2'].map(lambda x: f'{x:.2f}')
df_4g_display['Cramers_V'] = df_4g_display['Cramers_V'].map(lambda x: f'{x:.3f}')
for col in ['Acc_Home_%', 'Acc_Overseas_%', 'Dec_Home_%', 'Dec_Overseas_%']:
    df_4g_display[col] = df_4g_display[col].map(lambda x: f'{x:.1f}%')

seg_str = ', '.join([f'{seg}: {n}' for seg, n in seg_counts.items()])
display(df_4g_display[['Factor', 'Acc_Home_%', 'Acc_Overseas_%', 'Dec_Home_%', 'Dec_Overseas_%',
                        'Chi2', 'p_value', 'Cramers_V', 'Significant']].reset_index(drop=True).style.set_caption(
    f'4-Group Chi-Square: Factor Citation Rates by Segment ({seg_str})'
))

sig_4g = df_4g_results['Significant'].str.contains('Yes').sum()
print(f"{sig_4g}/{len(df_4g_results)} factors show significant variation across the 4 groups (p < 0.05)")

Unnamed: 0,Factor,Acc_Home_%,Acc_Overseas_%,Dec_Home_%,Dec_Overseas_%,Chi2,p_value,Cramers_V,Significant
0,Scholarships And Bursaries,10.4%,39.6%,8.3%,38.4%,63.11,0.0,0.288,Yes
1,Course Content,63.9%,46.9%,25.0%,30.1%,35.75,0.0,0.217,Yes
2,Reputation Of The Course,38.8%,19.7%,16.7%,11.0%,35.07,0.0,0.215,Yes
3,Module Choices,45.4%,25.5%,16.7%,16.4%,34.85,0.0,0.214,Yes
4,Location,26.2%,9.6%,27.8%,21.9%,34.44,0.0,0.213,Yes
5,Â€˜Feelâ€™ Of The University,41.5%,25.5%,22.2%,13.7%,26.05,0.0,0.185,Yes
6,Cost Of Living In The Location,37.7%,35.8%,13.9%,11.0%,25.49,0.0,0.183,Yes
7,League Tables,16.4%,6.0%,22.2%,15.1%,24.36,0.0,0.179,Yes
8,Reputation Of The University,55.7%,43.3%,22.2%,28.8%,24.15,0.0,0.178,Yes
9,Academic Facilities,32.8%,25.7%,2.8%,12.3%,21.75,0.0001,0.169,Yes


12/14 factors show significant variation across the 4 groups (p < 0.05)


In [101]:
# Heatmap: Factor citation rates across 4 segments
heatmap_df = df_4g_results[['Factor', 'Acc_Home_%', 'Acc_Overseas_%', 'Dec_Home_%', 'Dec_Overseas_%']].copy()
heatmap_df = heatmap_df.set_index('Factor')

# Shortened column headers to prevent x-axis overlap
heatmap_df.columns = [f'Acc Home ({seg_counts.get("Acceptor_Home", "?")})',
                       f'Acc Overseas ({seg_counts.get("Acceptor_Overseas", "?")})',
                       f'Dec Home ({seg_counts.get("Decliner_Home", "?")})',
                       f'Dec Overseas ({seg_counts.get("Decliner_Overseas", "?")})']

heatmap_df['sort'] = heatmap_df.mean(axis=1)
heatmap_df = heatmap_df.sort_values('sort', ascending=True).drop(columns='sort')

# Add significance annotations
sig_map = dict(zip(df_4g_results['Factor'], df_4g_results['Significant']))
annotations_text = []
for factor in heatmap_df.index:
    row_texts = []
    for col in heatmap_df.columns:
        val = heatmap_df.loc[factor, col]
        marker = ' *' if sig_map.get(factor) == 'Yes' else ''
        row_texts.append(f'{val:.1f}{marker}')
    annotations_text.append(row_texts)

fig = go.Figure(data=go.Heatmap(
    z=heatmap_df.values,
    x=heatmap_df.columns.tolist(),
    y=[y[:28] for y in heatmap_df.index.tolist()],
    colorscale=TUOS_SCALE_WARM,
    zmin=0,
    text=[[f'{v:.1f}%' for v in row] for row in heatmap_df.values],
    texttemplate='%{text}',
    hovertemplate='Factor: %{y}<br>Segment: %{x}<br>Citation rate: %{z:.1f}%<extra></extra>',
    colorbar=dict(title='% citing')
))

fig.update_layout(
    title='Factor Citation Rates (%) by Segment<br><sub>Corrected: routed decliners only | * = statistically significant (p < 0.05)</sub>',
    height=600,
    template='plotly_white',
    margin=dict(l=200, b=90),
    yaxis=dict(dtick=1),
    xaxis=dict(tickangle=-15)
)
fig.show()

### Observations: Four-Group Factor Analysis

12 of the 14 factors tested show statistically significant variation across the four segments. The patterns observed can be grouped as follows.

#### Scholarships and Bursaries — largest segment effect (V = 0.288)
- Overseas respondents cite this factor at much higher rates (~39%) regardless of whether they accepted or declined
- Home respondents cite it infrequently (~10%)
- The variation is primarily a home-versus-overseas distinction rather than an accept-versus-decline one

#### Course Content and Module Choices — home acceptors distinct
- Home acceptors cite course content (63.9%) and module choices (45.4%) at notably higher rates than other segments
- Overseas acceptors also cite these factors, but at lower rates
- Both decliner groups cite them at approximately 25–30% and ~16% respectively

#### Location — more pronounced among overseas decliners
- Home decliners (27.8%) and home acceptors (26.2%) cite location at similar rates
- Among overseas respondents, decliners (21.9%) cite location at more than twice the rate of acceptors (9.6%), a 12pp gap
- This suggests location may be a more salient differentiator within the overseas segment

#### League Tables — cited more frequently by decliners in both segments
- Home decliners (22.2%) and overseas decliners (15.1%) cite league tables at higher rates than their respective acceptor groups (16.4% and 6.0%)
- Home respondents engage with league tables more than overseas respondents overall

#### Tuition Fees — no significant segment variation (p = 0.35)
- Citation rates are broadly similar (17–25%) across all four segments
- This suggests tuition fee concerns are not concentrated in any particular segment

#### Cost of Living — consistent acceptor-versus-decliner pattern
- Approximately 36–38% of acceptors cite this factor regardless of fee status
- Only 11–14% of decliners cite it
- The pattern is an accept/decline differentiator rather than a home/overseas one

**Summary:** The most notable segment-specific pattern relates to scholarships and bursaries, which is predominantly an overseas concern. Most other factors divide primarily along accept/decline lines, with the exceptions of location (a more pronounced differentiator within the overseas segment) and course content/module choices (particularly associated with home acceptors).

---
## 1.4 Accept/Decline Factors by Faculty

**Question:** Do factor citation patterns vary across faculties for acceptors and decliners?

**Methodology:**
- 10-group chi-square per factor (5 faculties × accept/decline, cited/not cited)
- Corrected sample: only routed decliners included (per faculty: Arts & Humanities 26, Engineering 11, Health 19, Science 16, Social Sciences 38)
- Effect size: Cramér's V

**Caution:** Some faculty-level decliner cells have very small sample sizes (particularly Engineering decliners, n = 11). These results should be considered exploratory and interpreted with appropriate caution.

In [102]:
# Accept/Decline Factors Chi-Square across Faculty (10-group analysis)
from scipy.stats import chi2_contingency

df_fac_chi = df_chi_base[df_chi_base['faculty'].notna()].copy()
df_fac_chi['segment'] = df_fac_chi[decision_col] + ' – ' + df_fac_chi['faculty']

faculties = sorted(df_fac_chi['faculty'].unique())
seg_ct = pd.crosstab(df_fac_chi['faculty'], df_fac_chi[decision_col])

results_fac = []
for factor in common_factors:
    acc_col = f'accept_fac_{factor}'
    dec_col = f'decline_fac_{factor}'
    
    cited = pd.Series(False, index=df_fac_chi.index)
    if acc_col in df_fac_chi.columns:
        cited = cited | df_fac_chi[acc_col].fillna(0).astype(bool)
    if dec_col in df_fac_chi.columns:
        cited = cited | df_fac_chi[dec_col].fillna(0).astype(bool)
    
    ct = pd.crosstab(df_fac_chi['segment'], cited.astype(int))
    
    if ct.shape[1] == 2 and ct.shape[0] >= 2:
        chi2, p, dof, expected = chi2_contingency(ct)
        n = ct.sum().sum()
        k = min(ct.shape[0] - 1, ct.shape[1] - 1)
        cramers_v = np.sqrt(chi2 / (n * k))
        
        col_1 = 1 if 1 in ct.columns else True
        group_pcts = {}
        for seg in ct.index:
            group_pcts[seg] = ct.loc[seg, col_1] / ct.loc[seg].sum() * 100
        
        row_data = {
            'Factor': factor.replace('_', ' ').title(),
            'Chi2': chi2, 'p_value': p, 'dof': dof,
            'Cramers_V': cramers_v,
            'Significant': 'Yes' if p < 0.05 else ''
        }
        for fac in faculties:
            row_data[f'Acc_{fac[:6]}'] = group_pcts.get(f'Acceptor – {fac}', 0)
            row_data[f'Dec_{fac[:6]}'] = group_pcts.get(f'Decliner – {fac}', 0)
        results_fac.append(row_data)

df_fac_results = pd.DataFrame(results_fac).sort_values('p_value')

df_fac_display = df_fac_results.copy()
df_fac_display['p_value'] = df_fac_display['p_value'].map(lambda x: f'{x:.4f}')
df_fac_display['Chi2'] = df_fac_display['Chi2'].map(lambda x: f'{x:.2f}')
df_fac_display['Cramers_V'] = df_fac_display['Cramers_V'].map(lambda x: f'{x:.3f}')

display(seg_ct.style.set_caption('Faculty × Decision Sample Sizes (routed decliners only)'))
display(df_fac_display[['Factor', 'Chi2', 'p_value', 'Cramers_V', 'Significant']].reset_index(drop=True).style.set_caption(
    '10-Group Chi-Square: Factor Citation Rates by Faculty × Decision'
))

sig_fac = df_fac_results['Significant'].str.contains('Yes').sum()
print(f"{sig_fac}/{len(df_fac_results)} factors show significant variation across faculty × decision groups (p < 0.05)")

accept_decline,Acceptor,Decliner
faculty,Unnamed: 1_level_1,Unnamed: 2_level_1
Arts & Humanities,71,26
Engineering,102,11
Health,99,19
Science,78,16
Social Sciences,299,38


Unnamed: 0,Factor,Chi2,p_value,Cramers_V,Significant
0,Cost Of Living In The Location,32.75,0.0001,0.208,Yes
1,Course Content,31.08,0.0003,0.202,Yes
2,Module Choices,26.99,0.0014,0.189,Yes
3,Reputation Of The University,21.92,0.0091,0.17,Yes
4,Other Please Specify,21.81,0.0095,0.17,Yes
5,Academic Facilities,19.63,0.0203,0.161,Yes
6,Work Experience Opportunities,18.67,0.0281,0.157,Yes
7,Career Prospects,17.86,0.0368,0.153,Yes
8,Location,17.37,0.0432,0.151,Yes
9,Scholarships And Bursaries,16.34,0.0602,0.147,


9/14 factors show significant variation across faculty × decision groups (p < 0.05)


In [103]:
# Heatmap: Acceptor Factor Citation Rates by Faculty
n_per_fac_acc = seg_ct['Acceptor'] if 'Acceptor' in seg_ct.columns else seg_ct.iloc[:, 0]
n_per_fac_dec = seg_ct['Decliner'] if 'Decliner' in seg_ct.columns else seg_ct.iloc[:, 1]

fac_abbrev = {
    'Arts and Humanities': 'Arts & Hum',
    'Arts & Humanities': 'Arts & Hum',
    'Engineering': 'Engineering',
    'Medicine, Dentistry and Health': 'Med/Dent/Health',
    'Medicine Dentistry and Health': 'Med/Dent/Health',
    'Science': 'Science',
    'Social Sciences': 'Social Sci',
    'Health': 'Health',
}

# --- Acceptor heatmap ---
hm_data_acc = {}
for _, row in df_fac_results.iterrows():
    factor = row['Factor']
    for fac in faculties:
        col_key = f'Acc_{fac[:6]}'
        short_fac = fac_abbrev.get(fac, fac[:14])
        col_label = f'{short_fac} (n={n_per_fac_acc.get(fac, 0)})'
        if col_label not in hm_data_acc:
            hm_data_acc[col_label] = {}
        hm_data_acc[col_label][factor] = row[col_key]

hm_df_acc = pd.DataFrame(hm_data_acc)
hm_df_acc['sort'] = hm_df_acc.mean(axis=1)
hm_df_acc = hm_df_acc.sort_values('sort', ascending=True).drop(columns='sort')

fig = go.Figure(data=go.Heatmap(
    z=hm_df_acc.values,
    x=hm_df_acc.columns.tolist(),
    y=[y[:28] for y in hm_df_acc.index.tolist()],
    colorscale=TUOS_SCALE_POS,
    zmin=0,
    text=[[f'{v:.0f}%' for v in row] for row in hm_df_acc.values],
    texttemplate='%{text}',
    hovertemplate='Factor: %{y}<br>Faculty: %{x}<br>Citation: %{z:.1f}%<extra></extra>',
    colorbar=dict(title='% citing')
))

fig.update_layout(
    title='Acceptor Factor Citation Rates by Faculty<br><sub>Corrected: routed decliners only</sub>',
    height=550,
    template='plotly_white',
    margin=dict(l=200, b=70),
    yaxis=dict(dtick=1)
)
fig.show()

In [104]:
# Heatmap: Decliner Factor Citation Rates by Faculty
hm_data_dec = {}
for _, row in df_fac_results.iterrows():
    factor = row['Factor']
    for fac in faculties:
        col_key = f'Dec_{fac[:6]}'
        short_fac = fac_abbrev.get(fac, fac[:14])
        col_label = f'{short_fac} (n={n_per_fac_dec.get(fac, 0)})'
        if col_label not in hm_data_dec:
            hm_data_dec[col_label] = {}
        hm_data_dec[col_label][factor] = row[col_key]

hm_df_dec = pd.DataFrame(hm_data_dec)
hm_df_dec['sort'] = hm_df_dec.mean(axis=1)
hm_df_dec = hm_df_dec.sort_values('sort', ascending=True).drop(columns='sort')

fig = go.Figure(data=go.Heatmap(
    z=hm_df_dec.values,
    x=hm_df_dec.columns.tolist(),
    y=[y[:28] for y in hm_df_dec.index.tolist()],
    colorscale=TUOS_SCALE_NEG,
    zmin=0,
    text=[[f'{v:.0f}%' for v in row] for row in hm_df_dec.values],
    texttemplate='%{text}',
    hovertemplate='Factor: %{y}<br>Faculty: %{x}<br>Citation: %{z:.1f}%<extra></extra>',
    colorbar=dict(title='% citing')
))

fig.update_layout(
    title='Decliner Factor Citation Rates by Faculty<br><sub>Corrected: routed decliners only</sub>',
    height=550,
    template='plotly_white',
    margin=dict(l=200, b=70),
    yaxis=dict(dtick=1)
)
fig.show()

### Observations: Accept/Decline Factors by Faculty

9 of the 14 factors tested show statistically significant variation across the 10 faculty × decision groups. The heatmaps reveal both consistent and faculty-specific patterns.

#### Patterns consistent across faculties (acceptor side)
- **Course content** is the most frequently cited factor for acceptors across all faculties (48–59%)
- **Reputation of the university** is cited at consistently high rates by acceptors (38–52%)
- **Cost of living** is cited by acceptors at broadly similar rates across faculties (34–42%)

#### Notable faculty-level divergences among decliners

| Factor | Spread | Pattern |
|--------|--------|---------|
| **Tuition fees** | ~35pp | Cited most by Engineering decliners (45%), least by Health decliners (11%) |
| **Course content** | ~35pp | Cited most by Science decliners (44%), least by Engineering decliners (9%) |
| **Location** | ~28pp | Cited most by Health decliners (37%), least by Engineering decliners (9%) |
| **Scholarships** | ~27pp | Cited most by Engineering decliners (45%), least by Social Sciences decliners (18%) |
| **Cost of living** | ~27pp | Engineering decliners 27%, Science decliners 0% |
| **League tables** | ~26pp | Cited most by Engineering decliners (36%), least by Health decliners (11%) |

#### Engineering decliners — distinct profile
Engineering decliners (n = 11; interpret cautiously given the small sample) show notably high citation of financial and prestige-related factors:
- Tuition fees (45%) and scholarships (45%)
- League tables (36%) and university reputation (36%)
- Conversely low citation of course content (9%) and location (9%)

#### Health decliners — location-related pattern
- Location (37%) is the most frequently cited factor among Health decliners
- Financial factors are cited at relatively lower rates: tuition fees (11%), league tables (11%)

**Caveat:** Decliner cell sizes are small (Engineering n = 11, Science n = 16). These observations are exploratory and should be treated as hypotheses for further investigation rather than definitive conclusions.

---
## 1.5 Decision Journey: When Do Applicants Begin Researching?

Understanding when applicants first begin researching postgraduate study can help contextualise the role of timing in the decision process.

The chart below shows the acceptance rate for each research start time, ordered chronologically across the applicant lifecycle.

In [120]:
# Acceptance Rate by Research Start Time
if 'when_begin_research' in df_complete.columns:
    timeline_order = [
        'Before starting undergraduate degree',
        'In year one of undergraduate degree',
        'In year two of undergraduate degree',
        'During placement year',
        'In final year of undergraduate degree',
        'Within the first year after graduating',
        'One to three years after graduating',
        'More than three years after graduating'
    ]
    
    short_labels = [
        'Before UG', 'UG Yr 1', 'UG Yr 2', 'Placement',
        'UG Final Yr', '<1yr post-grad', '1–3yrs post-grad', '3+ yrs post-grad'
    ]
    
    timing_data = []
    overall_rate = round(df_complete[decision_col].eq('Acceptor').mean() * 100, 1)
    for i, cat in enumerate(timeline_order):
        subset = df_complete[df_complete['when_begin_research'] == cat]
        if len(subset) > 0:
            acc_rate = round(subset[decision_col].eq('Acceptor').mean() * 100, 1)
            timing_data.append({
                'Category': cat, 'Short': short_labels[i],
                'N': len(subset), 'Accept_Rate': acc_rate
            })
    
    df_timing = pd.DataFrame(timing_data)
    bar_colors = [TUOS['spearmint'] if r >= overall_rate else TUOS['coral'] for r in df_timing['Accept_Rate']]
    
    fig = go.Figure()
    fig.add_trace(go.Bar(
        x=df_timing['Short'],
        y=df_timing['Accept_Rate'],
        marker_color=bar_colors,
        text=[f'{r}%' for r in df_timing['Accept_Rate']],
        textposition='outside',
        textfont_size=10,
        hovertemplate='%{x}<br>Acceptance rate: %{y}%<br>n=%{customdata}<extra></extra>',
        customdata=df_timing['N']
    ))
    
    fig.add_hline(y=overall_rate, line_dash='dash', line_color='black',
                  annotation_text=f'Overall: {overall_rate}%', annotation_position='top right')
    
    fig.update_layout(
        title='Acceptance Rate by When Applicants First Researched Postgraduate Study',
        yaxis_title='Acceptance Rate (%)',
        height=620,
        template='plotly_white',
        xaxis_tickangle=-25,
        xaxis_tickfont_size=9,
        yaxis=dict(range=[0, max(df_timing['Accept_Rate']) + 10]),
        margin=dict(b=100),
        annotations=[dict(
            x=0.5, y=-0.22, xref='paper', yref='paper',
            text='← Earlier in lifecycle ——— Later in lifecycle →',
            showarrow=False, font=dict(size=11, color='gray')
        )]
    )
    fig.show()

### Observations: Research Timing and Acceptance

The data suggest some variation in acceptance rate by when respondents reported first beginning their postgraduate research.

| When Research Began | N | Acceptance Rate | vs Overall | Note |
|---------------------|---|-----------------|------------|------|
| Before starting UG | 59 | 79.7% | +5.2pp | Above average |
| UG Year 1 | 53 | 81.1% | +6.6pp | Highest observed rate |
| UG Year 2 | 116 | 69.0% | −5.5pp | Below average |
| Placement year | 33 | 72.7% | −1.8pp | Small sample; close to average |
| UG Final year | 257 | 73.5% | −1.0pp | Largest group; close to average |
| Within 1yr post-grad | 117 | 75.2% | +0.7pp | Broadly average |
| 1–3yrs post-grad | 98 | 73.5% | −1.0pp | Close to average |
| 3+ yrs post-grad | 124 | 77.4% | +2.9pp | Above average |

**Observations:**
- Respondents who reported beginning their research during UG Year 1 or earlier show the highest acceptance rates (79–81%), although sample sizes for these groups are relatively small (n = 53–59)
- Those who began in UG Year 2 show a notably lower rate (69.0%), which may reflect a group still broadly exploring options
- Post-graduation returners (3+ years after graduating) show an above-average rate (77.4%), which may reflect clearer professional motivations
- The overall spread between the highest and lowest groups is 12.2 percentage points

**Caveats:** These rates are descriptive and unadjusted. The observed patterns may reflect pre-existing differences in motivation or circumstances across timing groups rather than an effect of timing itself.

---
# Level 2: Faculty Analysis

These findings are stratified by faculty to examine faculty-specific patterns in acceptance rates and associated factors.

---
## 2.1 Faculty Acceptance Rates

Acceptance rates vary across faculties. This section presents the observed variation alongside confidence intervals and sample sizes.

In [121]:
# Faculty Acceptance Rates
overall_rate = 74.5

# Abbreviate faculty names for subplots
fac_short = {
    'Arts and Humanities': 'Arts & Hum',
    'Arts & Humanities': 'Arts & Hum',
    'Engineering': 'Engineering',
    'Medicine, Dentistry and Health': 'Med/Dent/Health',
    'Medicine Dentistry and Health': 'Med/Dent/Health',
    'Science': 'Science',
    'Social Sciences': 'Social Sci',
}

# Acceptance rates with confidence intervals
df_fac_sorted = df_faculty.sort_values('Accept_Rate').copy()
df_fac_sorted['Short'] = df_fac_sorted['Faculty'].map(lambda f: fac_short.get(f, f[:15]))
colors = [TUOS['spearmint'] if r > overall_rate else TUOS['coral'] for r in df_fac_sorted['Accept_Rate']]

fig = go.Figure(go.Bar(
    y=df_fac_sorted['Short'], x=[round(r, 1) for r in df_fac_sorted['Accept_Rate']],
    orientation='h', marker_color=colors,
    text=[f'{round(r, 1)}%' for r in df_fac_sorted['Accept_Rate']],
    textposition='outside',
    cliponaxis=False,
    error_x=dict(
        type='data',
        array=[round(v, 1) for v in (df_fac_sorted['CI_Upper'] - df_fac_sorted['Accept_Rate']).tolist()] if 'CI_Upper' in df_fac_sorted.columns else None,
        arrayminus=[round(v, 1) for v in (df_fac_sorted['Accept_Rate'] - df_fac_sorted['CI_Lower']).tolist()] if 'CI_Lower' in df_fac_sorted.columns else None,
        color='black', thickness=1.5
    ) if 'CI_Upper' in df_fac_sorted.columns else None,
    hovertemplate='%{y}<br>Acceptance rate: %{x}%<extra></extra>'
))
fig.add_vline(x=overall_rate, line_dash='dash', line_color='black',
              annotation_text=f'Overall: {overall_rate}%')
fig.update_layout(
    height=600, template='plotly_white',
    title_text='Faculty Acceptance Rates (with 95% CI)',
    xaxis_title='Acceptance Rate (%)',
    margin=dict(l=140, r=80),
    xaxis=dict(range=[0, 100])
)
fig.show()

# Faculty Sample Sizes
df_fac_n = df_faculty.sort_values('N', ascending=True).copy()
df_fac_n['Short'] = df_fac_n['Faculty'].map(lambda f: fac_short.get(f, f[:15]))
colors_n = [TUOS['teal'] if n >= 50 else TUOS['powder_blue'] for n in df_fac_n['N']]

fig = go.Figure(go.Bar(
    y=df_fac_n['Short'], x=df_fac_n['N'],
    orientation='h', marker_color=colors_n,
    text=[f'n={n}' for n in df_fac_n['N']],
    textposition='outside',
    cliponaxis=False,
    hovertemplate='%{y}: n=%{x}<extra></extra>'
))
fig.add_vline(x=50, line_dash='dash', line_color=TUOS['coral'],
              annotation_text='Min (n=50)')
fig.update_layout(
    height=600, template='plotly_white',
    title_text='Faculty Sample Sizes',
    xaxis_title='Sample Size (n)',
    margin=dict(l=140, r=80)
)
fig.show()

### Observations: Faculty Acceptance Rates

**Reading the charts:**
- Error bars represent 95% confidence intervals — wider intervals indicate greater uncertainty due to smaller samples
- Blue bars indicate faculties with sufficient sample sizes (n ≥ 50) for more detailed analysis
- Grey bars indicate smaller samples that should be interpreted with additional caution

#### Summary

| Grouping | Faculties | Interpretation |
|----------|-----------|----------------|
| **Above overall average** | (Green bars) | These faculties show acceptance rates above the university-wide average |
| **Below overall average** | (Red bars) | These faculties show lower acceptance rates, though confidence intervals may overlap with the overall rate |

The variation across faculties is consistent with the possibility that discipline-specific factors (such as programme content, competitor landscape, and applicant demographics) influence acceptance rates alongside university-wide attributes.

---
## 2.2 Faculty-Specific Drivers (Exploratory)

**Important:** Faculty-level findings use raw (uncorrected) p-values due to the small sample sizes at this level. These should be treated as exploratory and hypothesis-generating rather than confirmatory.

### Effect Size Reference (Cramér's V)
| V Value | Interpretation |
|---------|----------------|
| < 0.10 | Negligible |
| 0.10–0.20 | Small |
| 0.20–0.30 | Medium |
| > 0.30 | Large |

In [122]:
# Faculty Driver Heatmap (Exploratory)
exploratory = df_fac_drivers[df_fac_drivers['p_value'] < 0.10].copy()

if len(exploratory) > 0:
    top_vars = exploratory.nlargest(15, 'Cramers_V')['Variable'].unique()[:10]
    heatmap_data = pd.pivot_table(
        df_fac_drivers[df_fac_drivers['Variable'].isin(top_vars)],
        values='Rate_Difference', index='Variable', columns='Faculty', aggfunc='first'
    ) * 100
    heatmap_data.index = [v.replace('_', ' ')[:28] for v in heatmap_data.index]
    heatmap_data.columns = [c[:18] for c in heatmap_data.columns]
    
    fig = go.Figure(data=go.Heatmap(
        z=heatmap_data.values,
        x=heatmap_data.columns.tolist(),
        y=heatmap_data.index.tolist(),
        colorscale=TUOS_SCALE_DIV,
        zmid=0,
        text=[[f'{v:.0f}pp' if not np.isnan(v) else '' for v in row] for row in heatmap_data.values],
        texttemplate='%{text}',
        hovertemplate='Factor: %{y}<br>Faculty: %{x}<br>Rate diff: %{z:.1f}pp<extra></extra>',
        colorbar=dict(title='Rate Diff (pp)')
    ))
    
    fig.update_layout(
        title='Top Exploratory Drivers by Faculty (raw p < 0.10)',
        height=620,
        template='plotly_white',
        margin=dict(l=200, b=80),
        yaxis=dict(dtick=1),
        xaxis=dict(tickangle=-15)
    )
    fig.show()

### Observations: Faculty-Specific Drivers

**Reading the heatmap:**
- Green cells indicate a positive association (the factor is associated with higher acceptance in that faculty)
- Red cells indicate a negative association (the factor is associated with lower acceptance)
- Numbers represent rate differences in percentage points
- Empty cells indicate insufficient data or no notable association at p < 0.10

#### Selected Faculty-Level Patterns (Exploratory)

| Faculty | Factor | Effect | Detail |
|---------|--------|--------|--------|
| **Engineering** | One-to-one online events | −31pp | Engineering respondents who attended 1:1 online events showed a 31pp lower acceptance rate |
| **Arts & Humanities** | Group in-person events | +27pp | Arts & Humanities respondents who attended in-person group events showed a 27pp higher acceptance rate |
| **Science** | University course websites | −24pp | Science respondents who used course websites showed a 24pp lower acceptance rate |
| **Health** | Group online events | −17pp | Health respondents who attended online group events showed a 17pp lower acceptance rate |
| **Social Sciences** | Group in-person events | +11pp | Social Sciences respondents who attended in-person events showed an 11pp higher acceptance rate |

#### Cross-Faculty Pattern

In-person event attendance shows positive associations across multiple faculties, consistent with the university-wide observation.

**Caveat:** These are exploratory findings from small faculty-level samples using raw (uncorrected) p-values. They indicate possible patterns that would require further investigation and larger samples to confirm.

---
## 2.3 Home vs Overseas Comparison

This section examines which factors differ between acceptors and decliners within each fee status segment (Home and Overseas).

**Reading the charts:**
- Green bars indicate factors more common among acceptors within that segment
- Red bars indicate factors more common among decliners within that segment
- Longer bars indicate larger differences between acceptors and decliners

In [123]:
# Home: Acceptor − Decliner Diffs
if len(df_home) > 0 and 'Difference' in df_home.columns:
    home_top = df_home.nlargest(10, 'Difference', keep='first')
    colors_h = [TUOS['spearmint'] if x > 0 else TUOS['coral'] for x in home_top['Difference']]
    home_vals = [round(v, 1) for v in home_top['Difference'].values[::-1]]
    
    fig = go.Figure(go.Bar(
        y=[f[:25] for f in home_top['Feature']][::-1],
        x=home_vals,
        orientation='h', marker_color=colors_h[::-1],
        text=[f'{v:+}pp' for v in home_vals],
        textposition='outside',
        cliponaxis=False,
        hovertemplate='%{y}<br>Difference: %{x:+.1f}pp<extra></extra>'
    ))
    fig.add_vline(x=0, line_color='black', line_width=1)
    fig.update_layout(
        height=620, template='plotly_white',
        title_text='Home Students: Acceptor − Decliner Differences',
        xaxis_title='Acceptor % − Decliner %',
        margin=dict(l=200, r=80)
    )
    fig.show()

# Overseas: Acceptor − Decliner Diffs
if len(df_overseas) > 0 and 'Difference' in df_overseas.columns:
    overseas_top = df_overseas.nlargest(10, 'Difference', keep='first')
    colors_o = [TUOS['spearmint'] if x > 0 else TUOS['coral'] for x in overseas_top['Difference']]
    overseas_vals = [round(v, 1) for v in overseas_top['Difference'].values[::-1]]
    
    fig = go.Figure(go.Bar(
        y=[f[:25] for f in overseas_top['Feature']][::-1],
        x=overseas_vals,
        orientation='h', marker_color=colors_o[::-1],
        text=[f'{v:+}pp' for v in overseas_vals],
        textposition='outside',
        cliponaxis=False,
        hovertemplate='%{y}<br>Difference: %{x:+.1f}pp<extra></extra>'
    ))
    fig.add_vline(x=0, line_color='black', line_width=1)
    fig.update_layout(
        height=620, template='plotly_white',
        title_text='Overseas Students: Acceptor − Decliner Differences',
        xaxis_title='Acceptor % − Decliner %',
        margin=dict(l=200, r=80)
    )
    fig.show()

### Observations: Home vs Overseas Segments

Home (UK) and overseas (international) students have different fee structures, visa requirements, and geographic considerations. This section examines which factors differentiate acceptors from decliners within each group.

#### Home Students (UK)

**Acceptance Rate:** ~78.9%

| Observation | Detail |
|-------------|--------|
| **Event attendance** | Home respondents who attended events showed higher acceptance rates |
| **League tables** | Home decliners cited rankings more frequently than home acceptors |
| **Location factors** | Geographic proximity and regional factors appeared more frequently in home respondents' answers |

#### Overseas Students (International)

**Acceptance Rate:** ~73.1%

| Observation | Detail |
|-------------|--------|
| **Cost factors** | Overseas respondents showed higher frequency of cost-related responses |
| **Comparison sites** | Overseas applicants reported higher use of comparison websites |
| **Timeline considerations** | Overseas applicants may face different decision timelines due to visa and logistical requirements |

**Summary:** Home respondents show a 5.8pp higher acceptance rate than overseas respondents. Several factors may contribute, including fee levels, geographic proximity, and the competitive landscape with institutions in other countries. However, this is an observed difference and multiple confounding factors may be at play.

---
# Summary and Limitations

## Summary of Findings

| Finding | Confidence Level | Basis |
|---------|-----------------|-------|
| In-person event attendance associated with higher acceptance | **High** | FDR-significant with a meaningful effect size |
| Course content cited more frequently by acceptors | **High** | Consistent pattern across multiple analyses |
| Respondents classified as "Rankings Researchers" show lower acceptance | **Moderate** | −8pp in persona analysis; based on a provisional classification |
| League tables cited more frequently by decliners | **Moderate** | Consistent across relative importance and persona analyses |
| Faculty-specific patterns exist | **Exploratory** | Small samples, raw (uncorrected) p-values |

## Limitations

| Limitation | Implication |
|------------|-------------|
| **Observational data** | All findings are associations, not established causal effects. Confounding variables may explain observed patterns. |
| **Self-selected sample** | Survey respondents may differ systematically from non-respondents, introducing potential response bias. |
| **Small faculty-level samples** | Faculty-level findings are exploratory only and should be used for hypothesis generation. |
| **Point-in-time snapshot** | Findings reflect this particular cohort and time period. Results may not generalise to future cohorts. |
| **Correlation ≠ Causation** | For example, high-intent applicants may both attend events and accept offers — event attendance may not itself cause acceptance. |
| **Survey routing** | Some questions (barriers, accept/decline factors) were presented only to specific subgroups. Comparisons across groups must account for this design. |

---

*Report generated from PGT Decision Survey Analysis.*
*Statistical methods: Chi-square tests, Mann-Whitney U, logistic regression with FDR correction.*
*For detailed methodology, see the full analysis notebook.*