# AS3MT IP-MS Pipeline Test Notebook

Testing the `ipms` package from [github.com/rncassidy10/IP-MS_Pipeline](https://github.com/rncassidy10/IP-MS_Pipeline) with local data.

**Experiment:** HA-tagged AS3MT (WT) vs AS3MT-d2d3 IP-MS  
**Pipeline:** `prep_ip → qc_ip → drop_samples → qc_ip → norm_ip → stat_ip → viz_ip → venn_ip → boxplot_ip → summary_ip`

---
## 1. Workflow Overview

| Step | Function | What it does |
|------|----------|-------------|
| 1 | `prep_ip()` | Load Excel, parse conditions, filter peptides/missingness, map gene symbols (mygene), remove contaminants (manual + CRAPome) |
| 2 | `qc_ip()` | Missing value heatmap, correlation heatmap, PCA (all + treatments only) |
| 3 | `drop_samples()` | Remove outlier samples identified during QC |
| 4 | `qc_ip()` | Re-run QC after dropping |
| 5 | `norm_ip()` | Log2 transform + imputation (mindet) |
| 6 | `stat_ip()` | Welch t-test per comparison, BH correction, hit calling |
| 7 | `viz_ip()` | Volcano plots (labeled + clean), heatmaps |
| 8 | `venn_ip()` | Venn diagram of enriched protein overlap |
| 9 | `boxplot_ip()` | Boxplots of enriched protein intensities |
| 10 | `summary_ip()` | Analysis report |


---
## 2. Data Paths and Libraries

In [1]:
#import
import sys
sys.path.append('..')



from ipms import *



In [None]:
data = prep_ip('/Users/richard.cassidy/ipms_pipeline/config/example_config.yaml')

In [None]:
qc_ip(data)

**Drop**

EV- 4 (#5)

d2d3- 2 (#12)

d2d3- 3 (#13)

WT- 5 (#10)

In [None]:
data2 =drop_samples(data)

In [None]:
save_data(data2, '/Users/richard.cassidy/ipms_pipeline/results/data_after_qc.pkl')

In [None]:
data=qc_ip(data2)

In [None]:
data = load_data('/Users/richard.cassidy/ipms_pipeline/results/data_after_qc.pkl')

data=norm_ip(data, method='log2', imputation='mindet')


In [None]:
data = load_data('/Users/richard.cassidy/ipms_pipeline/results/data_after_norm.pkl')

data= stat_ip(data)


In [None]:
data = load_data('/Users/richard.cassidy/ipms_pipeline/results/data_after_stat.pkl')


viz_ip(data)

In [None]:
data = load_data('/Users/richard.cassidy/ipms_pipeline/results/data_after_stat.pkl')


boxplot_ip(data, top_n=40, group_by='protein')

In [None]:
data = load_data('/Users/richard.cassidy/ipms_pipeline/results/data_after_stat.pkl')


summary_ip(data)

In [None]:
data = load_data('/Users/richard.cassidy/ipms_pipeline/results/data_after_stat.pkl')

venn_ip(data, show_names=True, top_n=100,labels_in_diagram=True, max_labels_per_region=20)

In [None]:
# data = load_data('/Users/richard.cassidy/ipms_pipeline/results/data_after_stat.pkl')

# string_ip(data)

In [None]:
data = load_data('/Users/richard.cassidy/ipms_pipeline/results/data_after_stat.pkl')



## Exclude epithelial contaminants (e.g., in BEAS-2B cells)
possible_epithelial_contaminants = ['DSG1', 'DSC1', 'PKP1', 'SPRR1B', 'SPRR2D', 'FLG2', 'ECM1', 'S100A8', 'LYZ']
string_results = string_ip(data, exclude_genes=possible_epithelial_contaminants)
    




## High confidence, custom exclusions

# string_results = string_ip(data, score_threshold=700, exclude_genes=['DSG1', 'DSC1'])




In [None]:
# from ipms.string_ip import string_query


# output_dir = '/Users/richard.cassidy/ipms_pipeline/results/figures/viz'

# # ── Query 1: p53/DNA damage hypothesis ──
# # Your hits that touch the p53/checkpoint axis + known pathway anchors
# # p53_result = string_query(
# #     genes=['USP7', 'SMC5'],
# #     add_genes=['TP53', 'MDM2', 'CHEK1', 'ATR'],
# #     save_to=output_dir,
# #     label='p53_hypothesis'
# # )


# # ── Query 2: Let STRING auto-discover bridge proteins ──
# # Same hits but let STRING add the 10 best connectors from its database
# # If p53 is a real hub, it should get auto-added
# bridge_result = string_query(
#     genes=['PCM1','USP7',],
#     add_nodes=5,
#     save_to=output_dir,
#     label='auto_bridge_5_d2d3'
# )




# # ── Query 3: RNA metabolism hypothesis ──
# rna_result = string_query(
#     genes=['ZFP36L1','RBPMS'],
#     add_genes=['DIS3', 'XRN1', 'CNOT1'],
#     save_to=output_dir,
#     label='rna_metabolism'
# )


STRING TARGETED QUERY: auto_bridge_10_d2d3

  Your IP-MS hits (6): USP7, ZFP36L1, TOPBP1, CDC45, SMC5, PCM1
  STRING auto-add: 10 best connectors
  Total query: 6 genes

[1/4] Mapping to STRING IDs...
  ✓ Mapped 6/6 genes to STRING IDs

[2/4] Testing PPI enrichment...
    (tested on YOUR 6 genes only, not added genes)
    Observed interactions: 4
    Expected (random):     0
    PPI enrichment p-value: 0.00077
    → ✓ Significant!

[3/4] Retrieving network...
    Interactions found: 49

    --- Edges connecting YOUR hits to hypothesis genes ---
      (none found at score ≥ 400)

    --- Edges among YOUR hits ---
      TOPBP1 ↔ SMC5 (score: 0.481)
      TOPBP1 ↔ CDC45 (score: 0.999)
      PCM1 ↔ SMC5 (score: 0.440)
      SMC5 ↔ CDC45 (score: 0.476)

    --- Edges among hypothesis genes ---
      (none)

    --- STRING auto-added bridge proteins ---
    ★ CLSPN connects to: TOPBP1(0.99), RAD18(0.59), RAD9A(0.98), MDC1(0.62), USP7(0.65), RAD17(0.95), GINS3(0.97), CDC45(0.99)
    ★ GINS3 


STRING TARGETED QUERY: auto_bridge_5_d2d3_shared

  Your IP-MS hits (15): AS3MT, FAM91A1, USP7, ZFP36L1, APPL2, NDRG2, DDX49, ARHGAP32, BTN1A1, RBPMS, SMC5, DBN1, PCM1, NBAS, TOPBP1
  STRING auto-add: 5 best connectors
  Total query: 15 genes

[1/4] Mapping to STRING IDs...
  ✓ Mapped 15/15 genes to STRING IDs

[2/4] Testing PPI enrichment...
    (tested on YOUR 15 genes only, not added genes)
    Observed interactions: 3
    Expected (random):     1
    PPI enrichment p-value: 0.054

[3/4] Retrieving network...
    Interactions found: 13

    --- Edges connecting YOUR hits to hypothesis genes ---
      (none found at score ≥ 400)

    --- Edges among YOUR hits ---
      TOPBP1 ↔ SMC5 (score: 0.481)
      PCM1 ↔ SMC5 (score: 0.440)
      FAM91A1 ↔ DBN1 (score: 0.400)

    --- Edges among hypothesis genes ---
      (none)

    --- STRING auto-added bridge proteins ---
    ★ EID3 connects to: SMC5(1.00), MAGEL2(0.44)
    ★ MAGEL2 connects to: USP7(0.96), SMC5(0.58), EID3(0.44)
    ★ MDC


STRING TARGETED QUERY: auto_bridge_0added_d2d3_shared

  Your IP-MS hits (15): AS3MT, FAM91A1, USP7, ZFP36L1, APPL2, NDRG2, DDX49, ARHGAP32, BTN1A1, RBPMS, SMC5, DBN1, PCM1, NBAS, TOPBP1
  Total query: 15 genes

[1/4] Mapping to STRING IDs...
  ✓ Mapped 15/15 genes to STRING IDs

[2/4] Testing PPI enrichment...
    (tested on YOUR 15 genes only, not added genes)
    Observed interactions: 3
    Expected (random):     1
    PPI enrichment p-value: 0.054

[3/4] Retrieving network...
    Interactions found: 3

    Top interactions:
      TOPBP1 ↔ SMC5 (score: 0.481)
      PCM1 ↔ SMC5 (score: 0.440)
      FAM91A1 ↔ DBN1 (score: 0.400)

[4/4] Running functional enrichment...
    No enrichment results returned

  Downloading network image...
  ✓ Saved: string_auto_bridge_0added_d2d3_shared.pdf
  ✓ Saved: string_auto_bridge_0added_d2d3_shared_network.csv

QUERY COMPLETE: auto_bridge_0added_d2d3_shared
  Genes queried:  15 (15 yours + 0 hypothesis + 0 auto)
  Interactions:   3
  PPI enrichmen


STRING TARGETED QUERY: auto_bridge_10added_d2d3_shared

  Your IP-MS hits (15): AS3MT, FAM91A1, USP7, ZFP36L1, APPL2, NDRG2, DDX49, ARHGAP32, BTN1A1, RBPMS, SMC5, DBN1, PCM1, NBAS, TOPBP1
  STRING auto-add: 10 best connectors
  Total query: 15 genes

[1/4] Mapping to STRING IDs...
  ✓ Mapped 15/15 genes to STRING IDs

[2/4] Testing PPI enrichment...
    (tested on YOUR 15 genes only, not added genes)
    Observed interactions: 3
    Expected (random):     1
    PPI enrichment p-value: 0.054

[3/4] Retrieving network...
    Interactions found: 32

    --- Edges connecting YOUR hits to hypothesis genes ---
      (none found at score ≥ 400)

    --- Edges among YOUR hits ---
      TOPBP1 ↔ SMC5 (score: 0.481)
      PCM1 ↔ SMC5 (score: 0.440)
      FAM91A1 ↔ DBN1 (score: 0.400)

    --- Edges among hypothesis genes ---
      (none)

    --- STRING auto-added bridge proteins ---
    ★ BORCS7 connects to: AS3MT(0.85)
    ★ EID3 connects to: SMC5(1.00), MAGEL2(0.44)
    ★ MAGEL2 connects to:


STRING TARGETED QUERY: auto_bridge_20added_d2d3_shared

  Your IP-MS hits (15): AS3MT, FAM91A1, USP7, ZFP36L1, APPL2, NDRG2, DDX49, ARHGAP32, BTN1A1, RBPMS, SMC5, DBN1, PCM1, NBAS, TOPBP1
  STRING auto-add: 20 best connectors
  Total query: 15 genes

[1/4] Mapping to STRING IDs...
  ✓ Mapped 15/15 genes to STRING IDs

[2/4] Testing PPI enrichment...
    (tested on YOUR 15 genes only, not added genes)
    Observed interactions: 3
    Expected (random):     1
    PPI enrichment p-value: 0.054

[3/4] Retrieving network...
    Interactions found: 65

    --- Edges connecting YOUR hits to hypothesis genes ---
      (none found at score ≥ 400)

    --- Edges among YOUR hits ---
      TOPBP1 ↔ SMC5 (score: 0.481)
      PCM1 ↔ SMC5 (score: 0.440)
      FAM91A1 ↔ DBN1 (score: 0.400)

    --- Edges among hypothesis genes ---
      (none)

    --- STRING auto-added bridge proteins ---
    ★ BBIP1 connects to: PCM1(0.81)
    ★ BORCS7 connects to: AS3MT(0.85)
    ★ EID2 connects to: SMC5(0.71), NS


STRING TARGETED QUERY: auto_bridge_1added_d2d3_shared

  Your IP-MS hits (15): AS3MT, FAM91A1, USP7, ZFP36L1, APPL2, NDRG2, DDX49, ARHGAP32, BTN1A1, RBPMS, SMC5, DBN1, PCM1, NBAS, TOPBP1
  STRING auto-add: 1 best connectors
  Total query: 15 genes

[1/4] Mapping to STRING IDs...
  ✓ Mapped 15/15 genes to STRING IDs

[2/4] Testing PPI enrichment...
    (tested on YOUR 15 genes only, not added genes)
    Observed interactions: 3
    Expected (random):     1
    PPI enrichment p-value: 0.054

[3/4] Retrieving network...
    Interactions found: 4

    --- Edges connecting YOUR hits to hypothesis genes ---
      (none found at score ≥ 400)

    --- Edges among YOUR hits ---
      TOPBP1 ↔ SMC5 (score: 0.481)
      PCM1 ↔ SMC5 (score: 0.440)
      FAM91A1 ↔ DBN1 (score: 0.400)

    --- Edges among hypothesis genes ---
      (none)

    --- STRING auto-added bridge proteins ---
    ★ SMIM34A connects to: RBPMS(0.48)

[4/4] Running functional enrichment...
    No enrichment results returned

