<a href="https://colab.research.google.com/github/eoinleen/Protein-design-random/blob/main/Copy_of_full_str_param_nearly_working.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
"""
Super Simple Protein Structure Analysis Tool for Google Colab
===========================================================

Hey! 👋 This code helps you analyze how proteins interact with each other.
Don't worry - we'll walk you through everything!

What Does This Code Do? 🤔
-------------------------
It looks at your protein structures and measures 5 important things:

1. Buried Surface Area (BSA)
   - Like two puzzle pieces fitting together
   - Measures how much of the protein surface is hidden when chains connect
   - Bigger number = stronger connection
   - Measured in Å² (square Angstroms)

2. Hydrogen Bonds
   - Like tiny magnets between protein chains
   - More bonds = stronger connection
   - We count how many there are
   - Must be within 3.5 Å distance to count

3. Hydrophobic Contacts
   - Like oil droplets sticking together
   - Happens between "water-hating" parts of proteins
   - We count contacts within 5.0 Å distance
   - More contacts = stronger connection

4. Salt Bridges
   - Like positive and negative magnets connecting
   - Forms between + and - charged amino acids
   - We count bridges within 4.0 Å distance
   - More bridges = stronger connection

5. Van der Waals (VDW) Energy
   - Like velcro at the atomic level
   - Measures how well atoms "stick" together
   - Negative number = good sticking
   - Measured in kcal/mol
   - We look at interactions within 10 Å distance

Parameters Explained! 📏
----------------------
Buried Surface Area (BSA):
- What: Surface area that gets hidden when proteins connect
- Units: Square Angstroms (Å²)
- Good values: Usually 1000-4000 Å² for strong interactions
- Bad values: Less than 500 Å² might be weak interaction

Hydrogen Bonds:
- What: Special bonds between H atoms and O or N atoms
- Distance cutoff: 3.5 Å (we ignore bonds longer than this)
- Good values: 5-20 bonds is common
- Bad values: 0-1 bonds might mean weak interaction

Hydrophobic Contacts:
- What: Connections between water-hating amino acids
- Which amino acids: ALA, VAL, LEU, ILE, MET, PHE, TRP, PRO
- Distance cutoff: 5.0 Å
- Good values: 10+ contacts is common
- Bad values: 0-2 contacts might mean weak interaction

Salt Bridges:
- What: Bonds between + and - charged amino acids
- + charged: LYS, ARG, HIS
- - charged: ASP, GLU
- Distance cutoff: 4.0 Å
- Good values: 1-5 bridges is common
- Bad values: 0 bridges isn't necessarily bad

Van der Waals Energy:
- What: Atomic-level attraction energy
- Distance cutoff: 10 Å (ignores atoms further apart)
- Epsilon (ε): 0.1 kcal/mol (strength of interaction)
- Sigma (σ): 3.4 Å (optimal distance between atoms)
- Good values: Negative numbers (more negative = better)
- Bad values: Positive numbers mean atoms are too close

How To Use This Code 🚀
----------------------
1. Put your PDB files in a Google Drive folder
2. Change the 'pdb_directory' path at the bottom to your folder
3. Run the code
4. Get a nice table with all the measurements!

Need Help? 🆘
-----------
- If you see "N/A" in results: That's okay! It means we couldn't calculate that
  particular measurement (usually happens with single-chain proteins)
- If you get errors: Check that your PDB files are valid and have multiple chains
- Want different parameters? Ask for help to modify the values above

That's it! You're ready to analyze some proteins! 🎉
"""

#[Rest of the code remains exactly the same...]
# Install required packages
!pip install biopython freesasa numpy

# Import required libraries
import os
import sys
from pathlib import Path
from typing import Dict, List, Optional, Tuple, Any
from Bio import PDB
from Bio.PDB.PDBIO import PDBIO
from Bio.PDB.Polypeptide import is_aa
from Bio.PDB.Structure import Structure
from Bio.PDB.Model import Model
from Bio.PDB.Chain import Chain
from Bio.PDB.Residue import Residue
import freesasa
import numpy as np
from google.colab import drive

class StructureValidationError(Exception):
    """Custom exception for structure validation failures."""
    pass

def validate_pdb_file(file_path: str) -> bool:
    """
    Validate that a file exists and has basic PDB format.
    """
    if not os.path.exists(file_path):
        raise FileNotFoundError(f"PDB file not found: {file_path}")

    try:
        with open(file_path, 'r') as f:
            first_line = f.readline()
            if not any(marker in first_line for marker in ['HEADER', 'ATOM', 'MODEL']):
                raise StructureValidationError(f"File does not appear to be a valid PDB: {file_path}")
    except UnicodeDecodeError:
        raise StructureValidationError(f"File is not a valid text file: {file_path}")

    return True

def calculate_vdw_energy(structure: Structure) -> Optional[float]:
    """
    Calculate van der Waals energy between chains in the structure.

    Args:
        structure: BioPython Structure object

    Returns:
        float: Total VDW energy, or None if calculation fails
    """
    try:
        # VDW parameters (using common AMBER parameters as default)
        # These could be customized based on atom types
        epsilon = 0.1  # kcal/mol
        sigma = 3.4    # Angstroms

        vdw_energy = 0.0
        chains = list(structure.get_chains())

        # Calculate VDW energy between all chain pairs
        for i, chain1 in enumerate(chains):
            for chain2 in chains[i+1:]:  # Only calculate each pair once
                # Get atoms from each chain
                atoms1 = [atom for atom in chain1.get_atoms()]
                atoms2 = [atom for atom in chain2.get_atoms()]

                # Calculate pairwise interactions
                for atom1 in atoms1:
                    for atom2 in atoms2:
                        try:
                            distance = np.linalg.norm(atom1.coord - atom2.coord)
                            if distance > 0 and distance < 10.0:  # Only consider interactions within 10Å
                                vdw_energy += 4 * epsilon * ((sigma / distance)**12 - (sigma / distance)**6)
                        except Exception as e:
                            print(f"Warning: Error calculating VDW interaction: {str(e)}")
                            continue

        return vdw_energy

    except Exception as e:
        print(f"Error calculating VDW energy: {str(e)}")
        return None

#[Previous functions remain exactly the same up until print_summary_report]

def print_summary_report(results: List[Dict[str, Any]]) -> None:
    """
    Print a formatted summary report of all results.
    """
    if not results:
        print("No results to display")
        return

    print("\nSummary Report:")
    print(f"{'PDB File':<30} {'Buried Surface Area (Å²)':<25} {'H-Bonds':<12} {'Hydrophobic':<12} "
          f"{'Salt Bridges':<12} {'VDW Energy':<15}")
    print("="*100)

    for result in results:
        try:
            bsa = f"{result['buried_surface_area']:.2f}" if result['buried_surface_area'] else "N/A"
            vdw = f"{result['vdw_energy']:.2f}" if result['vdw_energy'] is not None else "N/A"
            print(f"{result['file_name']:<30} {bsa:<25} {result['hydrogen_bonds']:<12} "
                  f"{result['hydrophobic_contacts']:<12} {result['salt_bridges']:<12} {vdw:<15}")
        except KeyError as e:
            print(f"Error displaying result for {result.get('file_name', 'unknown')}: Missing data {str(e)}")

def process_multiple_pdb_files(pdb_directory: str) -> List[Dict[str, Any]]:
    """
    Process multiple PDB files and analyze their structures.
    """
    if not os.path.exists(pdb_directory):
        raise FileNotFoundError(f"Directory not found: {pdb_directory}")

    results = []
    parser = PDB.PDBParser(QUIET=True)

    # Get list of PDB files
    pdb_files = [f for f in os.listdir(pdb_directory) if f.endswith('.pdb')]
    if not pdb_files:
        print(f"Warning: No PDB files found in {pdb_directory}")
        return results

    for file_name in pdb_files:
        pdb_file = os.path.join(pdb_directory, file_name)
        print(f"\nProcessing {file_name}")

        try:
            # Load structure
            structure = safe_structure_load(parser, pdb_file)
            if not structure:
                continue

            # Calculate metrics with error handling
            buried_surface_area, chain_areas = calculate_buried_surface_area(pdb_file)
            h_bonds = calculate_hydrogen_bonds(structure)
            hydrophobic = calculate_hydrophobic_contacts(structure)
            salt_bridges = calculate_salt_bridges(structure)
            vdw_energy = calculate_vdw_energy(structure)

            results.append({
                'file_name': file_name,
                'buried_surface_area': buried_surface_area,
                'hydrogen_bonds': h_bonds,
                'hydrophobic_contacts': hydrophobic,
                'salt_bridges': salt_bridges,
                'chain_areas': chain_areas,
                'vdw_energy': vdw_energy
            })

            # Print detailed results
            print(f"Buried Surface Area: {buried_surface_area:.2f} Å²" if buried_surface_area else "Buried Surface Area: Not applicable")
            print(f"Hydrogen Bonds: {h_bonds}")
            print(f"Hydrophobic Contacts: {hydrophobic}")
            print(f"Salt Bridges: {salt_bridges}")
            print(f"VDW Energy: {vdw_energy:.2f} kcal/mol" if vdw_energy is not None else "VDW Energy: Not calculated")

        except Exception as e:
            print(f"Error processing {file_name}: {str(e)}")
            continue

    return results

# Main execution with error handling
try:
    # Mount Google Drive with error handling
    try:
        drive.mount('/content/drive')
    except Exception as e:
        print(f"Warning: Drive mounting failed: {str(e)}")
        print("Proceeding with local filesystem access only")

    # Update this path to your PDB files directory in Google Drive
    pdb_directory = '/content/drive/MyDrive/PDB-files/all_pdb-2MBO-no-hot'

    print("Starting analysis...")
    results = process_multiple_pdb_files(pdb_directory)

    if results:
        print_summary_report(results)
    else:
        print("No results to display")

except Exception as e:
    print(f"Fatal error: {str(e)}")
    sys.exit(1)


Collecting biopython
  Downloading biopython-1.85-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)
Collecting freesasa
  Downloading freesasa-2.2.1.tar.gz (270 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m270.1/270.1 kB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Downloading biopython-1.85-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m81.1 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages: freesasa
  Building wheel for freesasa (setup.py) ... [?25l[?25hdone
  Created wheel for freesasa: filename=freesasa-2.2.1-cp311-cp311-linux_x86_64.whl size=888779 sha256=5a6cc190aa75d1bf6210620961ec50d60890ae38e60be5b4ce5a05fd3dc42a96
  Stored in directory: /root/.cache/pip/wheels/12/7e/68/f3f59a0c5946b122ecbf6098c87de4c8a1f73ec145c077815b
Successfully built frees