<a href="https://colab.research.google.com/github/sreeramavasarala0-tech/bioinformatics-rosalind/blob/main/notebooks/Bioinformatics_Module03.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Bioinformatics Module 03



In [1]:
import sys, platform
print("Python:", sys.version.split()[0])
print("Platform:", platform.platform())

Python: 3.12.11
Platform: Linux-6.1.123+-x86_64-with-glibc2.35


## P1. print( ) and variables (assignment operator `=`)
- `print()` sends output to the screen.
- `=` assigns values to variables.
- Use brief, descriptive, lower_snake_case names.

In [2]:
print("hello world")

message = "DNA"
length_bp = 1000  # integer
gc_fraction = 0.42  # float
print(message, length_bp, gc_fraction)

hello world
DNA 1000 0.42


## P2. Data types and `type()`
Operators behave differently by type. Use `type()` to check.

In [3]:
x, y, z = 5, 5.0, "5"
print(type(x), type(y), type(z))

<class 'int'> <class 'float'> <class 'str'>


## P3. Input( )
When running in Colab/Jupyter, `input()` will prompt in the output area. Wrap with try/except for robust behavior.

In [4]:
try:
    user_name = input("Enter your name: ")
    print(f"Hello {user_name}")
except Exception as e:
    print("Input not available in this environment:", e)

Enter your name: Avasarala Sreeram
Hello Avasarala Sreeram


### Mini-ROSALIND style (hypotenuse):
Ask for two legs *a* and *b* and compute \(c = \sqrt{a^2 + b^2}\).

In [5]:
import math
def hypotenuse_via_input():
    try:
        a = float(input("a = "))
        b = float(input("b = "))
        c = math.sqrt(a*a + b*b)
        print("c =", c)
    except Exception as e:
        print("Skipping interactive example:", e)

# hypotenuse_via_input()  # uncomment to try interactively

## P4. Lists, indexing, and slicing
- Lists store multiple items.
- Zero-based indexing.
- Slices are non-inclusive on the end (`[start:stop]` returns `start..stop-1`).

In [6]:
names = ["Hello", "World", "Alfred R. Wallace", "Charles Darwin"]
print(names[1])        # 'World'
names[3] = "Alfred R. Wallace"  # replace index 3
names.append("Charles Darwin")  # append
print(names)
print(names[1:3])      # slice returns ['World', 'Alfred R. Wallace']

World
['Hello', 'World', 'Alfred R. Wallace', 'Alfred R. Wallace', 'Charles Darwin']
['World', 'Alfred R. Wallace']


## P5. Strings and slicing
Strings slice like lists.

In [7]:
my_string = "HelloWorldCharlesDarwinAlfred R. Wallace"
print(my_string[:5], my_string[5:10])  # 'Hello World'
print("Wallace", my_string[5:10])      # 'Wallace World' via mixed printing

Hello World
Wallace World


## P6. Conditionals (`if`/`else`)

In [8]:
a, b = 3, 3
if a == b:
    print(a + b)
else:
    print(a - b)

6


## P7. Loops (`while`, `for`, `range`)

In [9]:
# while loop that runs exactly 3 times
count = 0
while count < 3:
    print("Counting!")
    count += 1

# for over list
for nm in ["Ada", "Linus", "Rosalind"]:
    print("Hello", nm)

# for over range (10 iterations, numbers 0..9)
iterations = 10
for number in range(iterations):
    pass  # replace with work as needed
print("Looped", iterations, "times")

# every 3rd number from 9..27 inclusive
print(list(range(9, 28, 3)))

Counting!
Counting!
Counting!
Hello Ada
Hello Linus
Hello Rosalind
Looped 10 times
[9, 12, 15, 18, 21, 24, 27]


## P8. File I/O (`open`, `read`, `readline`, `readlines`, `join`)

In [10]:
# Create a small practice file
practice_path = "practice.txt"
with open(practice_path, "w") as f:
    f.write("Bravely\n")
    f.write("bold\n")
    f.write("Sir Robin...\n")

# read()
with open(practice_path, "r") as data:
    content = data.read()
print("[read()]\n", content)

# readline() then readlines() on same handle demo
with open(practice_path, "r") as data:
    first = data.readline()
    rest = data.readlines()
print("[readline()] ->", first.strip())
print("[readlines()] ->", rest)

# join() a list of lines
with open(practice_path, "r") as data:
    list_of_lines = data.readlines()
print("[join]\n", "".join(list_of_lines))

[read()]
 Bravely
bold
Sir Robin...

[readline()] -> Bravely
[readlines()] -> ['bold\n', 'Sir Robin...\n']
[join]
 Bravely
bold
Sir Robin...



## P9. Splitting strings and iterating words
Use `.split()` to split on whitespace and iterate over words.

In [11]:
line = "To be or not to be, that is the question"
for word in line.split():
    print(word)

To
be
or
not
to
be,
that
is
the
question


## P10. Dictionaries `{}` (key→value)
Create, update, iterate, and print.

In [12]:
sequences = {
    "H.sapiens": "ATGCGT",
    "P.troglodytes": "ATGAGT",
    "G.gorilla": "ATGCGC",
}
print(sequences["H.sapiens"])  # specific value
sequences["M.mulatta"] = "ATGTTT"  # add
sequences.update({"P.paniscus": "ATGCGA"})
for sp, seq in sequences.items():
    print(sp, seq)

ATGCGT
H.sapiens ATGCGT
P.troglodytes ATGAGT
G.gorilla ATGCGC
M.mulatta ATGTTT
P.paniscus ATGCGA


## P11. Counting with `count()`
Count nucleotides in a DNA string.

In [13]:
dna = "ACGTACGTAAAACCCCGGGGTTTT"
print("A C G T counts:", dna.count("A"), dna.count("C"), dna.count("G"), dna.count("T"))

A C G T counts: 6 6 6 6


---
# Biopython Essentials
The next sections use Biopython. In Colab, run the install cell first.

In [14]:
# If running in Colab or a fresh environment:
try:
    import Bio  # quick check
    print("Biopython already available")
except Exception:
    import sys
    !{sys.executable} -m pip install --quiet biopython
    import Bio
    print("Installed Biopython", Bio.__version__)

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/3.3 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.3/3.3 MB[0m [31m10.2 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━[0m [32m2.8/3.3 MB[0m [31m40.6 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m33.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalled Biopython 1.85


## BP1. `Seq` object, complement/transcribe/translate

In [15]:
from Bio.Seq import Seq

my_sequence = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG")
print("Seq:", my_sequence)
print("Complement:", my_sequence.complement())
print("RevComp:", my_sequence.reverse_complement())
rna = my_sequence.transcribe()
print("RNA:", rna)
print("Protein:", rna.translate(to_stop=False))

Seq: ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG
Complement: TACCGGTAACATTACCCGGCGACTTTCCCACGGGCTATC
RevComp: CTATCGGGCACCCTTTCAGCGGCCCATTACAATGGCCAT
RNA: AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG
Protein: MAIVMGR*KGAR*


## BP2. `SeqIO.parse` FASTA + SeqUtils.gc_fraction
Reads sequences from a FASTA file and prints ID, length, and GC%.

In [16]:
# Create a small FASTA for demonstration
fasta_path = "practice_9.fasta"
with open(fasta_path, "w") as f:
    f.write(">seq1\nATGCATGCATGC\n>seq2\nGGGCCCATAAAT\n")

from Bio.SeqIO import parse
from Bio.SeqUtils import gc_fraction

for rec in parse(fasta_path, "fasta"):
    seq = str(rec.seq)
    print(rec.id, len(seq), round(gc_fraction(seq)*100, 2))

seq1 12 50.0
seq2 12 50.0


## BP3. Mini tasks
- Transcribe DNA → RNA from a file.
- Compute GC% and return the highest-GC sequence ID.

These are simplified versions you can adapt for ROSALIND 7–9.

In [17]:
from typing import Tuple

def transcribe_fasta_to_rna(in_fasta: str, out_fasta: str) -> None:
    from Bio.SeqIO import parse, write
    from Bio.SeqRecord import SeqRecord
    records = []
    for rec in parse(in_fasta, "fasta"):
        rna_seq = rec.seq.transcribe()
        records.append(SeqRecord(rna_seq, id=rec.id+"_RNA", description=""))
    from Bio import SeqIO
    SeqIO.write(records, out_fasta, "fasta")

def max_gc_record(in_fasta: str) -> Tuple[str, float]:
    from Bio.SeqIO import parse
    from Bio.SeqUtils import gc_fraction
    best_id, best_gc = None, -1.0
    for rec in parse(in_fasta, "fasta"):
        gc = gc_fraction(str(rec.seq))
        if gc > best_gc:
            best_id, best_gc = rec.id, gc
    return best_id, best_gc*100

# Demo with our small FASTA
out = "practice_9_RNA.fasta"
transcribe_fasta_to_rna(fasta_path, out)
print("Wrote:", out)
bid, bgc = max_gc_record(fasta_path)
print("Max GC%:", bid, round(bgc, 2))

Wrote: practice_9_RNA.fasta
Max GC%: seq1 50.0
