<a href="https://colab.research.google.com/github/neetushibu/IontheFold-Team6/blob/main/IonTheFold_SurfaceCharge.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Trail_1 ‚Äî Fixed Red/Orange Filter & Batch Generator

This repaired notebook provides a clean, Colab‚Äëfriendly workflow plus a companion Python script:
- Load a FullSequence Analysis CSV.
- Filter to Red/Orange `charge_distribution` (optional).
- Optionally restrict to specific PDB IDs.
- Run in **full** mode or **batch** mode.
- Emit an updated CSV and a **manifest** for design configs.

> The robust implementation lives in `/mnt/data/generate_ro_charges_batches.py` and is used below.

In [1]:
# If running in Colab, you may mount Drive; safe to skip locally
try:
    from google.colab import drive
    drive.mount('/content/drive')
except Exception:
    pass
print("‚úÖ Environment ready")

Mounted at /content/drive
‚úÖ Environment ready


## Install & Imports

In [2]:
import sys, os
import pandas as pd
from pathlib import Path

print(pd.__version__)
print("CWD:", os.getcwd())

2.2.2
CWD: /content


## Install All dependencies

In [3]:
# Run this cell first to install all required dependencies

# Install BioPython and other required packages
!pip install biopython pandas numpy requests

# Verify installations
!python -c "import Bio; print('‚úÖ BioPython version:', Bio.__version__)"
!python -c "import pandas; print('‚úÖ Pandas version:', pandas.__version__)"
!python -c "import numpy; print('‚úÖ NumPy version:', numpy.__version__)"
!python -c "import requests; print('‚úÖ Requests version:', requests.__version__)"

print("üéâ All dependencies installed successfully!")
print("You can now run the PDB analysis script.")

Collecting biopython
  Downloading biopython-1.85-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)
Downloading biopython-1.85-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m3.3/3.3 MB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: biopython
Successfully installed biopython-1.85
‚úÖ BioPython version: 1.85
‚úÖ Pandas version: 2.2.2
‚úÖ NumPy version: 2.0.2
‚úÖ Requests version: 2.32.4
üéâ All dependencies installed successfully!
You can now run the PDB analysis script.


## Working Individual Charge analyzer

In [10]:
!python /content/pdb_charge_analyzer002.py

[1;30;43mStreaming output truncated to the last 5000 lines.[0m

[2106/3687] Processing 8Y2K
‚úÖ 8Y2K: 0 interfaces, charge: +8.60

[2107/3687] Processing 8Y31
‚úÖ 8Y31: 0 interfaces, charge: +5.00

[2108/3687] Processing 8Y3G
‚úÖ 8Y3G: 0 interfaces, charge: +13.50

[2109/3687] Processing 8Y3Z
‚úÖ 8Y3Z: 0 interfaces, charge: -10.80

[2110/3687] Processing 8Y41
‚úÖ 8Y41: 0 interfaces, charge: -5.80

[2111/3687] Processing 8Y4V
‚úÖ 8Y4V: 0 interfaces, charge: -19.80

[2112/3687] Processing 8Y54
‚úÖ 8Y54: 0 interfaces, charge: -29.40

[2113/3687] Processing 8Y64
‚úÖ 8Y64: 0 interfaces, charge: -22.00

[2114/3687] Processing 8Y6T
‚úÖ 8Y6T: 0 interfaces, charge: +2.90

[2115/3687] Processing 8Y7P
‚úÖ 8Y7P: 0 interfaces, charge: -7.80

[2116/3687] Processing 8Y7Z
‚úÖ 8Y7Z: 0 interfaces, charge: +4.50

[2117/3687] Processing 8Y8M
‚úÖ 8Y8M: 0 interfaces, charge: -24.80

[2118/3687] Processing 8Y96
‚úÖ 8Y96: 0 interfaces, charge: -4.80

[2119/3687] Processing 8Y97
‚úÖ 8Y97: 0 interfaces, charg

## Ignore the below scripts

In [76]:
!python /content/enhanced_pdb_analyzer_script.py /content/FullSequence_Analysis_250825_revalidated.csv 50

üß¨ Enhanced PDB Interface Charge Analysis
‚úÖ All modules imported successfully
Usage:
  python script.py                    # Interactive mode
  python script.py demo               # Demo mode
  python script.py quick file.csv [batch_size]  # Quick processing


In [82]:
!python /content/enhanced_pdb_analyzer_script.py

üß¨ Enhanced PDB Interface Charge Analysis
‚úÖ All modules imported successfully
üß¨ Enhanced PDB Interface Charge Analysis
Features:
‚úÖ Single PDB analysis with detailed output
‚úÖ Batch processing with custom sizes
‚úÖ Full CSV processing with resume capability
‚úÖ Incremental saves to prevent data loss
‚úÖ Detailed interface-level reporting
‚úÖ Error handling and progress tracking

Work directory [default: ./pdb_cache]: 2

üöÄ Select Analysis Mode
1. Single PDB Analysis
2. Batch Processing (N proteins)
3. Full CSV Processing (with resume)
4. Exit

Select mode (1-4): 2

‚öôÔ∏è Analysis Parameters Configuration
Interface cutoff distance (√Ö) [default: 5.0]: 
Require surface residues? (y/N): y
Use DSSP for surface detection? (y/N): y
Count histidine as positive? (y/N): y

üìã Analysis Configuration:
  üéØ Interface cutoff: 5.0 √Ö
  üåä Require surface: True
  üî¨ Use DSSP: True
  ‚öóÔ∏è  HIS as positive: True
Batch size [default: 10]: 5

Enter PDB IDs (multiple options):
1. Ent

In [83]:
!python /content/enhanced_pdb_analyzer_script001.py

üß¨ Enhanced PDB Interface Charge Analysis
‚úÖ All modules imported successfully
üß¨ Enhanced PDB Interface Charge Analysis
Features:
‚úÖ Single PDB analysis with detailed output
‚úÖ Batch processing with custom sizes
‚úÖ Full CSV processing with resume capability
‚úÖ Incremental saves to prevent data loss
‚úÖ Detailed interface-level reporting
‚úÖ Error handling and progress tracking

Work directory [default: ./pdb_cache]: 2

üöÄ Select Analysis Mode
1. Single PDB Analysis
2. Batch Processing (N proteins)
3. Full CSV Processing (with resume)
4. Exit

Select mode (1-4): 2

‚öôÔ∏è Analysis Parameters Configuration
Interface cutoff distance (√Ö) [default: 5.0]: 
Require surface residues? (y/N): y
Use DSSP for surface detection? (y/N): y
Count histidine as positive? (y/N): y

üìã Analysis Configuration:
  üéØ Interface cutoff: 5.0 √Ö
  üåä Require surface: True
  üî¨ Use DSSP: True
  ‚öóÔ∏è  HIS as positive: True
Batch size [default: 10]: 10

Enter PDB IDs (multiple options):
1. En

In [85]:
!python /content/fixed_pdb_analyzer.py

üß¨ Enhanced PDB Interface Charge Analysis
‚úÖ All modules imported successfully
üß¨ Enhanced PDB Interface Charge Analysis
Features:
‚úÖ Single PDB analysis with detailed output
‚úÖ Batch processing with red/orange filtering
‚úÖ Full CSV processing with resume capability
‚úÖ Complete amino acid composition analysis
‚úÖ Actual charge calculations (not just counts)

Work directory [default: ./pdb_cache]: 2

üöÄ Select Analysis Mode
1. Single PDB Analysis
2. Batch Processing (N proteins)
3. Full CSV Processing (with resume)
4. Exit

Select mode (1-4): 2

‚öôÔ∏è Analysis Parameters Configuration
Interface cutoff distance (√Ö) [default: 5.0]: 5.0
Require surface residues? (y/N): y
Use DSSP for surface detection? (y/N): y
Count histidine as positive? (y/N): y

üìã Analysis Configuration:
  üéØ Interface cutoff: 5.0 √Ö
  üåä Require surface: True
  üî¨ Use DSSP: True
  ‚öóÔ∏è  HIS as positive: True
Batch size [default: 10]: 5

Enter PDB IDs (multiple options):
1. Enter comma-separated

In [86]:
!python /content/fixed_pdb_analyzer001.py

üß¨ Enhanced PDB Interface Charge Analysis
‚úÖ All modules imported successfully
üß¨ Enhanced PDB Interface Charge Analysis
Features:
‚úÖ Single PDB analysis with detailed output
‚úÖ Batch processing with red/orange filtering
‚úÖ Full CSV processing with resume capability
‚úÖ Complete amino acid composition analysis
‚úÖ Actual charge calculations (not just counts)

Work directory [default: ./pdb_cache]: 2

üöÄ Select Analysis Mode
1. Single PDB Analysis
2. Batch Processing (N proteins)
3. Full CSV Processing (with resume)
4. Exit

Select mode (1-4): 2

‚öôÔ∏è Analysis Parameters Configuration
Interface cutoff distance (√Ö) [default: 5.0]: 5
Require surface residues? (y/N): y
Use DSSP for surface detection? (y/N): y
Count histidine as positive? (y/N): y

üìã Analysis Configuration:
  üéØ Interface cutoff: 5.0 √Ö
  üåä Require surface: True
  üî¨ Use DSSP: True
  ‚öóÔ∏è  HIS as positive: True
Batch size [default: 10]: 5

Enter PDB IDs (multiple options):
1. Enter comma-separated l

## Detailed Charge Analysis

In [87]:
!python /content/fixed_pdb_analyzer002.py

[1;30;43mStreaming output truncated to the last 5000 lines.[0m

[67/1576] Processing 5NH9
‚úÖ 5NH9: 0 interfaces, charge: -51.60

[68/1576] Processing 5NHA
‚úÖ 5NHA: 0 interfaces, charge: -51.60

[69/1576] Processing 5NHB
‚úÖ 5NHB: 0 interfaces, charge: -51.60

[70/1576] Processing 5NHC
‚úÖ 5NHC: 0 interfaces, charge: -51.60
üíæ Saved 70 results to full_analysis_filtered_20250825_202223.csv
üíæ Saved 1 detailed results to full_analysis_filtered_20250825_202223_detailed.csv
üíæ Incremental save completed (70/1576)

[71/1576] Processing 5NHD
‚úÖ 5NHD: 0 interfaces, charge: -51.60

[72/1576] Processing 5NHE
‚úÖ 5NHE: 0 interfaces, charge: -51.60

[73/1576] Processing 5NHG
‚úÖ 5NHG: 0 interfaces, charge: -6.40

[74/1576] Processing 5NHM
‚úÖ 5NHM: 0 interfaces, charge: -51.60

[75/1576] Processing 5NI4
‚úÖ 5NI4: 0 interfaces, charge: -24.20

[76/1576] Processing 5NIE
‚úÖ 5NIE: 0 interfaces, charge: -28.20

[77/1576] Processing 5NIJ
‚úÖ 5NIJ: 0 interfaces, charge: -39.40

[78/1576] Proc