## **IRF5-TNPO3 Regional Association Analysis**
Stepwise conditional analysis and LD estimation for the IRF5-TNPO3 locus (chr7:128,930,429-129,058,173, GRCh38) using PLINK2 Firth logistic regression on UK Biobank WGS data.

In [None]:
# Install PLINK2
wget https://s3.amazonaws.com/plink2-assets/alpha6/plink2_linux_avx2_20251019.zip
unzip -o plink2_linux_avx2_20251019.zip
chmod a+x plink2 # Make PLINK2 executable
./plink2 --version

### **Step 1: Set Variables**
- Window: chr7:128,930,429-129,058,173 (GRCh38)
- Defined by IRF5-TNPO3 gene boundaries Â± 3kb (H3K27Ac-informed)
- Phenotype: 565 SLE cases, 404,883 White British controls

In [None]:
# Set variables
CHR=7
WINDOW_START=128930429
WINDOW_END=129058173
BGEN_DIR="/mnt/project/Bulk/DRAGEN WGS/DRAGEN population level WGS variants, BGEN format [500k release]"
PHENO_FILE="/mnt/project/02.Phenotype_SampleQC/sle_pqc.txt"
EXTRACT_FILE="/mnt/project/03.Variant_QC/WGS_QC/ukb24309_c7_b0_v1_qc_pass.snplist"

### **Step 2: Extract WGS Variants**

In [None]:
./plink2 \
  --bgen "${BGEN_DIR}/ukb24309_c${CHR}_b0_v1.bgen" ref-last \
  --sample "${BGEN_DIR}/ukb24309_c${CHR}_b0_v1.sample" \
  --chr ${CHR} \
  --from-bp ${WINDOW_START} \
  --to-bp ${WINDOW_END} \
  --extract ${EXTRACT_FILE} \
  --keep ${PHENO_FILE} \
  --force-intersect \
  --make-pgen \
  --out irf5_tnpo3_window

In [None]:
# Verify variant count
echo "Variants in window:"
tail -n +2 irf5_tnpo3_window.pvar | wc -l

### **Step 3: Round 0 - Unconditional Association Analysis**

In [None]:
./plink2 \
  --pfile irf5_tnpo3_window \
  --glm firth firth-residualize hide-covar \
  --pheno ${PHENO_FILE} --pheno-name has_sle_icd10 \
  --1 \
  --covar ${PHENO_FILE} --covar-name sex age ethnic_group pc1-pc10 \
  --covar-variance-standardize \
  --mac 20 \
  --out irf5_tnpo3_round0

In [None]:
# Sort by p-value
(head -n 1 irf5_tnpo3_round0.has_sle_icd10.glm.firth && \
 tail -n +2 irf5_tnpo3_round0.has_sle_icd10.glm.firth | \
 awk '$15 != "NA"' | \
 sort -g -k15,15) > irf5_tnpo3_round0_sorted.txt

# View top 10
echo "=== Round 0 top 10 ==="
head -11 irf5_tnpo3_round0_sorted.txt

In [None]:
# Count significant variants
echo "P < 0.001: $(awk 'NR>1 && $15 < 0.001' irf5_tnpo3_round0_sorted.txt | wc -l)"
echo "P < 0.01:  $(awk 'NR>1 && $15 < 0.01'  irf5_tnpo3_round0_sorted.txt | wc -l)"

In [None]:
# Bonferroni-significant variants (0.05 / 444 variants)
BONF=$(echo "scale=10; 0.05/444" | bc)
awk -v threshold="${BONF}" 'NR==1 || (NR>1 && $15 != "NA" && $15 < threshold)' \
  irf5_tnpo3_round0_sorted.txt > irf5_tnpo3_round0_bonferroni.txt
echo "Bonferroni-significant variants: $(tail -n +2 irf5_tnpo3_round0_bonferroni.txt | wc -l)"

In [None]:
# Extract top variant and initialise condition list
TOP=$(awk 'NR==2 {print $3}' irf5_tnpo3_round0_sorted.txt)
echo "Round 0 top variant: ${TOP}"
echo "${TOP}" > top_variants.txt

### **Step 4: Round 1 - Conditional Analysis**

In [None]:
./plink2 \
  --pfile irf5_tnpo3_window \
  --glm firth firth-residualize hide-covar \
  --condition-list top_variants.txt \
  --pheno ${PHENO_FILE} --pheno-name has_sle_icd10 \
  --1 \
  --covar ${PHENO_FILE} --covar-name sex age ethnic_group pc1-pc10 \
  --covar-variance-standardize \
  --mac 20 \
  --out irf5_tnpo3_round1

In [None]:
# Sort by p-value
(head -n 1 irf5_tnpo3_round1.has_sle_icd10.glm.firth && \
 tail -n +2 irf5_tnpo3_round1.has_sle_icd10.glm.firth | \
 awk '$15 != "NA"' | \
 sort -g -k15,15) > irf5_tnpo3_round1_sorted.txt

# View top 10
echo "=== Round 1 top 10 ==="
head -11 irf5_tnpo3_round1_sorted.txt

In [None]:
# Count significant variants
echo "P < 0.001: $(awk 'NR>1 && $15 < 0.001' irf5_tnpo3_round1_sorted.txt | wc -l)"
echo "P < 0.01:  $(awk 'NR>1 && $15 < 0.01'  irf5_tnpo3_round1_sorted.txt | wc -l)"

In [None]:
# Extract top variant and update condition list
TOP_1=$(awk 'NR==2 {print $3}' irf5_tnpo3_round1_sorted.txt)
echo "Round 1 top variant: ${TOP_1}"
echo "${TOP_1}" >> top_variants.txt

echo "Top variants list:"
cat top_variants.txt

### **Step 5: Round 2 - Conditional Analysis**

In [None]:
./plink2 \
  --pfile irf5_tnpo3_window \
  --glm firth firth-residualize hide-covar \
  --condition-list top_variants.txt \
  --pheno ${PHENO_FILE} --pheno-name has_sle_icd10 \
  --1 \
  --covar ${PHENO_FILE} --covar-name sex age ethnic_group pc1-pc10 \
  --covar-variance-standardize \
  --mac 20 \
  --out irf5_tnpo3_round2

In [None]:
# Sort by p-value
(head -n 1 irf5_tnpo3_round2.has_sle_icd10.glm.firth && \
 tail -n +2 irf5_tnpo3_round2.has_sle_icd10.glm.firth | \
 awk '$15 != "NA"' | \
 sort -g -k15,15) > irf5_tnpo3_round2_sorted.txt

# View top 10
echo "=== Round 2 top 10 ==="
head -11 irf5_tnpo3_round2_sorted.txt

In [None]:
# Count significant variants
echo "P < 0.001: $(awk 'NR>1 && $15 < 0.001' irf5_tnpo3_round2_sorted.txt | wc -l)"
echo "P < 0.01:  $(awk 'NR>1 && $15 < 0.01'  irf5_tnpo3_round2_sorted.txt | wc -l)"

In [None]:
# Extract top variant
TOP_2=$(awk 'NR==2 {print $3}' irf5_tnpo3_round2_sorted.txt)
echo "Round 2 top variant: ${TOP_2}"

In [None]:
# Check LD with previous top signals
./plink2 \
  --pfile irf5_tnpo3_window \
  --ld ${TOP_2} ${TOP} \
  --out edge_ld_signal1

./plink2 \
  --pfile irf5_tnpo3_window \
  --ld ${TOP_2} ${TOP_1} \
  --out edge_ld_signal2

In [None]:
# Update condition list
echo "${TOP_2}" >> top_variants.txt

echo "Top variants list:"
cat top_variants.txt

### **Step 6: Round 3 - Conditional Analysis**

In [None]:
./plink2 \
  --pfile irf5_tnpo3_window \
  --glm firth firth-residualize hide-covar \
  --condition-list top_variants.txt \
  --pheno ${PHENO_FILE} --pheno-name has_sle_icd10 \
  --1 \
  --covar ${PHENO_FILE} --covar-name sex age ethnic_group pc1-pc10 \
  --covar-variance-standardize \
  --mac 20 \
  --out irf5_tnpo3_round3

In [None]:
# Sort by p-value
(head -n 1 irf5_tnpo3_round3.has_sle_icd10.glm.firth && \
 tail -n +2 irf5_tnpo3_round3.has_sle_icd10.glm.firth | \
 awk '$15 != "NA"' | \
 sort -g -k15,15) > irf5_tnpo3_round3_sorted.txt

# View top 10
echo "=== Round 3 top 10 ==="
head -11 irf5_tnpo3_round3_sorted.txt

In [None]:
# Count significant variants
echo "P < 0.001: $(awk 'NR>1 && $15 < 0.001' irf5_tnpo3_round3_sorted.txt | wc -l)"
echo "P < 0.01:  $(awk 'NR>1 && $15 < 0.01'  irf5_tnpo3_round3_sorted.txt | wc -l)"

### **Step 7: Calculate LD Matrices**
- Signed r matrix for SuSIE
- r2 matrix for LD analysis

In [None]:
./plink2 \
  --pfile irf5_tnpo3_window \
  --r-unphased square \
  --out irf5_tnpo3_ld_r

./plink2 \
  --pfile irf5_tnpo3_window \
  --r2-unphased square \
  --out irf5_tnpo3_ld_r2

In [None]:
# Check variant counts
wc -l < irf5_tnpo3_ld_r.unphased.vcor1.vars
wc -l < irf5_tnpo3_ld_r2.unphased.vcor2.vars

# Check matrix dimensions (number of columns in first data row)
awk 'NR==1 {print NF}' irf5_tnpo3_ld_r.unphased.vcor1
awk 'NR==1 {print NF}' irf5_tnpo3_ld_r2.unphased.vcor2