<a href="https://colab.research.google.com/github/sairam028/beginning-bioinformatics/blob/main/Bioinformatics_Module03S.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Bioinformatics — Module 03 (Colab-ready)

**What this notebook is:** a ready-to-run Colab notebook that walks through the Module 03 exercises (P6–P23) and the Rosalind problems described in your assignment.

**How to use:**
1. Upload this `.ipynb` to Google Colab (File → Upload notebook) or open it from GitHub.
2. Run cells top to bottom (Runtime → Run all). If a cell asks for `input()` you can type responses in the prompt or leave defaults.
3. Edit/replace any sample data with your own input files if desired.
4. Save a copy to your GitHub repo: File → Save a copy in GitHub, and set the path to `notebooks/Bioinformatics_Module03.ipynb`.

**Files created by the demo cells:** `practice.txt`, `sample.fasta`, `sample_dna.txt` — these are small example files the notebook uses so you can run everything without extra uploads.

---
**NOTE:** After you run and verify everything, push the final notebook to your GitHub repo and create a tag/branch named `week3-submission` (see the separate step-by-step git instructions provided alongside this notebook).

In [1]:
# Install Biopython (only needed once in Colab)
!pip install biopython --quiet
print('biopython installed (or already available)')

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m23.3 MB/s[0m eta [36m0:00:00[0m
[?25hbiopython installed (or already available)


In [2]:
# Create small sample files used by later exercises
practice = '''Line one
Line two
Line three
This is a blank line next:

End of practice file.
'''
with open('practice.txt','w') as f:
    f.write(practice)

sample_fasta = '>seq1 sample sequence\nATGCGTATCGATCGATCGATCGATGCTAGCTAGCTAG\n>seq2 another sequence\nATGCGTATTTTTCCCCGGGGAAAACCCCTTTTGGGG\n'
with open('sample.fasta','w') as f:
    f.write(sample_fasta)

# small DNA file for tests
with open('sample_dna.txt','w') as f:
    f.write('ATGTTTCTT')

print('Created practice.txt, sample.fasta, sample_dna.txt')

Created practice.txt, sample.fasta, sample_dna.txt


## P6 — `input()` demo
This cell shows a safe pattern for using `input()` in notebooks. If you run non-interactively, the default value will be used.

In [3]:
name = input('Enter your name (or press Enter to use default): ') or 'Your Name'
print(f'Hello, {name}!')

# Example: a small Rosalind-style function that greets
def rosalind_greet():
    who = input('Who to greet? (default: Rosalind) ') or 'Rosalind'
    return f'Hello, {who}!'

print(rosalind_greet())

Enter your name (or press Enter to use default): Sairam Reddy Mothe
Hello, Sairam Reddy Mothe!
Who to greet? (default: Rosalind) 
Hello, Rosalind!


## P7 — Lists (create, append, insert, replace)

In [4]:
lst = ['apple', 'banana', 'cherry']
print('original:', lst)

# append
lst.append('date')
print('after append:', lst)

# insert at index 1
lst.insert(1, 'blueberry')
print('after insert at index 1:', lst)

# replace index 2
lst[2] = 'blackberry'
print('after replace index 2:', lst)

# slice example
print('slice [1:4]:', lst[1:4])

original: ['apple', 'banana', 'cherry']
after append: ['apple', 'banana', 'cherry', 'date']
after insert at index 1: ['apple', 'blueberry', 'banana', 'cherry', 'date']
after replace index 2: ['apple', 'blueberry', 'blackberry', 'cherry', 'date']
slice [1:4]: ['blueberry', 'blackberry', 'cherry']


## P8 — Slicing strings & lists (Rosalind #2 examples)

In [5]:
s = 'GATGGAACTTGACTACGTAAATT'
print('original string:', s)
print('first 3 chars:', s[:3])
print('last 5 chars:', s[-5:])
print('every second char:', s[::2])

# simple Rosalind string problem: reverse complement by slicing + translate mapping
comp_map = str.maketrans('ACGT','TGCA')
revcomp = s.translate(comp_map)[::-1]
print('reverse complement:', revcomp)

# Rosalind #2-like: return substring example
print('substring 3..7 (1-based indices 3-7):', s[2:7])

original string: GATGGAACTTGACTACGTAAATT
first 3 chars: GAT
last 5 chars: AAATT
every second char: GTGATGCAGAAT
reverse complement: AATTTACGTAGTCAAGTTCCATC
substring 3..7 (1-based indices 3-7): TGGAA


## P9 — If / Else

In [6]:
a = 10
b = 5
if a == b:
    print('equal — add them:', a+b)
else:
    print('not equal — subtract them:', a-b)

# small exercise: compare string lengths
x = 'AGCT'
y = 'AGCTG'
print('x longer?' , len(x) > len(y))

not equal — subtract them: 5
x longer? False


## P10 — While loop (run exactly 3 times)

In [7]:
i = 0
while i < 3:
    print('loop iteration', i+1)
    i += 1

# Avoid infinite loop example (do not run if you change condition)


loop iteration 1
loop iteration 2
loop iteration 3


## P11 & P12 — For loop and `range()`

In [8]:
fruits = ['apple','banana','cherry']
for f in fruits:
    print('fruit:', f)

print('\nrange 0..9:')
for i in range(10):
    print(i, end=' ')

print('\n\nevery 3rd number from 9 to 27 inclusive:')
for i in range(9, 28, 3):
    print(i, end=' ')


fruit: apple
fruit: banana
fruit: cherry

range 0..9:
0 1 2 3 4 5 6 7 8 9 

every 3rd number from 9 to 27 inclusive:
9 12 15 18 21 24 27 

## P13–P16 — File I/O, join(), split(), and line processing

In [9]:
# Read the practice.txt created earlier
with open('practice.txt','r') as f:
    data = f.read()
print('Full file content:\n', data)

# Readlines into list
with open('practice.txt','r') as f:
    lines = f.readlines()
print('\nreadlines() ->', lines)

# join example (put lines back together):
joined = ''.join(lines)
print('\njoined back:\n', joined)

# iterate lines and strip newline
print('\niterating lines and rstrip:')
with open('practice.txt','r') as f:
    for i, line in enumerate(f, start=1):
        print(i, line.rstrip())

# split words
print('\nwords in first line:')
for w in lines[0].split():
    print(w)


Full file content:
 Line one
Line two
Line three
This is a blank line next:

End of practice file.


readlines() -> ['Line one\n', 'Line two\n', 'Line three\n', 'This is a blank line next:\n', '\n', 'End of practice file.\n']

joined back:
 Line one
Line two
Line three
This is a blank line next:

End of practice file.


iterating lines and rstrip:
1 Line one
2 Line two
3 Line three
4 This is a blank line next:
5 
6 End of practice file.

words in first line:
Line
one


## P17–P19 — Dictionaries & counting (Rosalind #6 counting DNA nucleotides, word counting)

In [10]:
# Word-count example
text = 'this is a test this test is simple'
counts = {}
for w in text.split():
    counts[w] = counts.get(w, 0) + 1
print('word counts:', counts)

# Counting DNA nucleotides (Rosalind #6)
from collections import Counter
with open('sample_dna.txt') as f:
    seq = f.read().strip()
print('sample seq:', seq)
counts = Counter(seq)
print('A C G T counts:', counts.get('A',0), counts.get('C',0), counts.get('G',0), counts.get('T',0))


word counts: {'this': 2, 'is': 2, 'a': 1, 'test': 2, 'simple': 1}
sample seq: ATGTTTCTT
A C G T counts: 1 1 1 6


## P20–P23 — Biopython: Seq, SeqIO, GC content, transcribe/translate

In [11]:
from Bio.Seq import Seq
from Bio import SeqIO
from Bio.SeqUtils import gc_fraction

# Seq examples
s = Seq('ATGTACT')
print('Seq:', s)
print('complement:', s.complement())
print('reverse_complement:', s.reverse_complement())
print('transcribe:', s.transcribe())
print('translate:', s.translate())

# Functions for Rosalind problems

def count_nucleotides(sequence):
    from collections import Counter
    c = Counter(sequence)
    return c['A'], c['C'], c['G'], c['T']

print('\nCount nucleotides (from sample_dna.txt):', count_nucleotides(seq))

# Transcribe DNA -> RNA
def transcribe_dna_to_rna(dna):
    return str(Seq(dna).transcribe())

print('Transcribed RNA:', transcribe_dna_to_rna(seq))

# Translate RNA -> Protein (via DNA->RNA->Protein)
def translate_dna(dna):
    return str(Seq(dna).transcribe().translate())

print('Translated protein:', translate_dna(seq))

# Parse sample FASTA and compute GC content per record
records = list(SeqIO.parse('sample.fasta','fasta'))
for rec in records:
    seqstr = str(rec.seq)
    print(f">{rec.id} len={len(seqstr)} GC={gc_fraction(seqstr):.4f}")

# Rosalind #9: find ID with max GC
best = None
best_gc = -1.0
for rec in records:
    g = gc_fraction(str(rec.seq))
    if g > best_gc:
        best_gc = g
        best = rec.id
print('\nMax GC:', best, best_gc)


Seq: ATGTACT
complement: TACATGA
reverse_complement: AGTACAT
transcribe: AUGUACU
translate: MY

Count nucleotides (from sample_dna.txt): (1, 1, 1, 6)
Transcribed RNA: AUGUUUCUU
Translated protein: MFL
>seq1 len=37 GC=0.4865
>seq2 len=36 GC=0.5278

Max GC: seq2 0.5277777777777778




---

## Final steps (what you should do after running this notebook)

1. **Edit and complete** any cells with your own solutions or different input files as required by the assignment.
2. **Save** the notebook to GitHub: Colab → File → Save a copy in GitHub, and set the path to `notebooks/Bioinformatics_Module03.ipynb`.
3. In your GitHub repo add `docs/ai_usage.md` (brief AI usage note) and update `README.md` to include `Your Name + UTA ID + Course-Section`.
4. Create a branch or tag named `week3-submission` (either via Git locally or GitHub web UI). The fixed submission URL will be `https://github.com/<your_username>/bioinformatics-rosalind/tree/week3-submission`.
5. Submit the **two URLs** (one per line) in the Canvas Text Entry box: first the fixed version link, then the repo URL.

If you want, I can also fill in your `README.md` text and produce the exact `git` commands tailored to your GitHub username and UTA ID — tell me the details and I will produce the ready-to-run commands.