<a href="https://colab.research.google.com/github/reneegreen816/PholdAPhage/blob/main/PholdAPhage_CapsiCodeCracker.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1>PholdAPhage - CapsiCode Cracker</h1>

**Your genome to T-number and ( *h, k* ) code cracker**

Phage capsid structures are highly conserved, 96% are icosahedral in nature applying mathematical principles of symmetry to shape their protein capsids proportional to genome length.

PholdAPhage - CapsiCode Cracker, is the first step of the PholdAPhage predictive modelling system, which can help decode this evolutionary design principle by symmetry type and genome length.

PholdAPhage - CapsiCode Cracker, helps predict an unknown phages triangulation number (T-number or T#), and symmetry ( *h, k* ) paramters in support of computational simulation as part of PholdAPhage step 2.

<h3>Notes:</h3>

 - To run cells, press the play button on the left side.

 - Steps are best run sequentially, with T# required to find ( *h, k* ) parameters, and protein copy #.

 - If your genome length is matched to multiple T#'s, PholdAPhage - CapsiCode Cracker will calculate all numbers and paramters for each T# in aid of all required predictive comutational simulations.<br><br>

<h2>Instructions</h2>

<h2>1. Set up and installs</h2>
Cells 1 and 2 install required python scripts and libraries to support analysis and calculation. It also mounts the notebook in your GoogleDrive for single use before being removed.

In [None]:
#Installs
!pip install biopython

In [None]:
#Imports
from Bio import SeqIO
from google.colab import drive
from google.colab import files
import pandas as pd

<h2>2. Upload genome fasta (.fna) file</h2>
Cell 3 imports the phage genome .fna sequence, which can be found in the content folder to the left.

To run, click the play button and then choose your file for upload.

In [None]:
#Open file picker to upload your genome file (use .fna file)
uploaded = files.upload()

<h2>3. Read and calculate genome length in bps</h2>
Cell 4 reads the document and calculates genome length in base pairs (bps).

In [None]:
#Automatically get the uploaded filename
#If multiple files are uploaded, this picks the first one
file_name = list(uploaded.keys())[0]

print(f"Using uploaded file: {file_name}")

#Read the sequence
record = SeqIO.read(file_name, "fasta")  # use "fasta" format for .fna files

#Get the sequence
sequence = record.seq

#Calculate genome length
genome_length = len(sequence)

#Print the result
print(f"Genome length: {genome_length} bps")

<h2>4. Calculate T# from genome length</h2>
Cell 5 takes genome length and determines your T#/s.

In [None]:
#Example: genome_length already calculated
#genome_length = len(sequence)

#Define T-number based on predicted ranges
#Format: T_number: (min_length, max_length bps)
T_ranges = {
    1: (1240, 4412),
    3: (8112, 17095),
    4: (13207, 24480),
    7: (33507, 50072),
    9: (50052, 70221),
    12: (77248, 106077),
    13: (86659, 119668),
    16: (115527, 165366),
    19: (145222, 218122),
    21: (165464, 256993),
    25: (206993, 343138),
    27: (228263, 390244),
    28: (239019, 414775),
    # Add or modify ranges as needed
}

#Determine all matching T-numbers
matching_T_numbers = []

for T, (min_len, max_len) in T_ranges.items():
    if min_len <= genome_length <= max_len:
        matching_T_numbers.append(T)

#Print result
if matching_T_numbers:
    print(f"Genome length: {genome_length} bp → Matching T-number(s): {matching_T_numbers}")
else:
    print(f"Genome length: {genome_length} bp → No matching T-number range found")

<h2>5. Calculate ( h, k) parameterrs from T#</h2>
Cell 6 then takes your T# and calculates your (h,k) parameters.

Note: If there is no T# found for your genome length, consider:
* non full integer T# options outlined in supplementary data for different lattice types
* capsid sizes closest to your genome size.  

In [None]:
# Example: matching_T_numbers is a list of T-numbers
# matching_T_numbers = [1, 2]  # from previous script

def find_hk(T):
    """Find all integer (h, k) pairs such that T = h^2 + h*k + k^2"""
    pairs = []
    for h in range(T + 1):
        for k in range(T + 1):
            if h**2 + h*k + k**2 == T:
                pairs.append((h, k))
    return pairs

# Dictionary to store results
hk_results = {}

# Check if there are any T-numbers to process
if not matching_T_numbers:
    print("No matching T-numbers to check for (h, k) pairs.")
else:
    # Compute (h, k) pairs for each T-number
    for T in matching_T_numbers:
        hk_results[T] = find_hk(T)

    # Print results
    for T, pairs in hk_results.items():
        if pairs:
            print(f"T={T} → possible (h, k) pairs: {pairs}")
        else:
            print(f"T={T} → no integer (h, k) pairs found")

<h2>6. Calculate phage capsid Protein Copy #</h2>
Cell 7 takes your T# to calculate your protein copy number.

In [None]:
# Example: matching_T_numbers from previous steps
# matching_T_numbers = [1, 3]

protein_copies = {}

# Check if there are any T-numbers to process
if not matching_T_numbers:
    print("No matching T-numbers to calculate protein copy numbers.")
else:
    # Compute protein copy numbers
    for T in matching_T_numbers:
        protein_copies[T] = 60 * T

    # Print results
    for T, copies in protein_copies.items():
        print(f"T={T} → Protein copy number: {copies}")

<h2>7. Print summary</h2>
Cell 8 gives outcome summary of your file name, T#, (h,k) parameters, and protein copy #.

In [None]:
#Print concise summary
print("===== Genome Analysis Summary =====")
print(f"File analysed: {file_name}")
print(f"Genome length: {genome_length} bps\n")

if matching_T_numbers:
    for T in matching_T_numbers:
        print(f"T-number: {T}")
        print(f"  - (h, k) pairs: {find_hk(T)}")
        print(f"  - Protein copy number: {protein_copies[T]}\n")
else:
    print("No matching T-number ranges found for this genome length.")

<h2>8. Next step - return to GitHub to simulate your phage</h2>
Take your T# and h,k paramters back to PholdAPhage GitHub page and complete step 2.