<a href="https://colab.research.google.com/github/ezenio01YT/PRA2031/blob/main/week1_exercises_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Week 1: Python Recap, Git/GitHub, and Introduction to OOP

## Learning Objectives
By the end of this week, you should be able to:
- Write Python functions and use common data structures
- Use basic Git commands for version control
- Understand the fundamentals of Object-Oriented Programming
- Create simple classes with attributes and methods

---

## Part 1: Python Recap

### 1.1 Functions Review

Let's start with a quick refresher on functions.

In [None]:
# Example: A simple function
def calculate_bmi(weight_kg, height_m):
    """
    Calculate Body Mass Index.

    Parameters:
    weight_kg (float): Weight in kilograms
    height_m (float): Height in meters

    Returns:
    float: BMI value
    """
    bmi = weight_kg / (height_m ** 2)
    return bmi

# Test it
print(calculate_bmi(70, 1.75))

**Exercise 1.1:** Write a function `calculate_gc_content(dna_sequence)` that calculates the GC content (percentage of G and C nucleotides) in a DNA sequence string.

In [1]:
def calculate_gc_content(dna_sequence):
    """
    Calculate the GC content of a DNA sequence.

    Parameters:
    dna_sequence (str): DNA sequence containing A, T, G, C

    Returns:
    float: GC content as a percentage
    """
    dna_sequence = dna_sequence.upper()
    total = len(dna_sequence)
    gc_count = dna_sequence.count('G') + dna_sequence.count('C')

    if total == 0:
        return 0.0

    return (gc_count / total) * 100

# Test your function
test_seq = "ATGCGATCGATCG"
print(f"GC content: {calculate_gc_content(test_seq):.2f}%")
# Output: GC content: 53.85%

GC content: 53.85%


### 1.2 Data Structures Review

Quick review of lists, dictionaries, and sets.

**Exercise 1.2:** Given a list of gene expression values, write code to:
1. Calculate the mean expression
2. Find genes with expression above the mean
3. Return a dictionary with gene names as keys and their expression values

In [2]:
# Sample data
genes = ['BRCA1', 'TP53', 'EGFR', 'MYC', 'KRAS']
expression_values = [120.5, 89.3, 156.7, 45.2, 203.1]

# 1. Calculate mean expression
mean_expression = sum(expression_values) / len(expression_values)

# 2. Find genes above mean
high_expression_genes = [gene for gene, value in zip(genes, expression_values) if value > mean_expression]

# 3. Create gene expression dictionary
gene_expression_dict = dict(zip(genes, expression_values))

print(f"Mean expression: {mean_expression:.2f}")
print(f"Highly expressed genes: {high_expression_genes}")
print(f"Gene expression dictionary: {gene_expression_dict}")

# Additional statistics
print(f"Max expression: {max(expression_values):.2f}")
print(f"Number of genes above mean: {len(high_expression_genes)}")

Mean expression: 122.96
Highly expressed genes: ['EGFR', 'KRAS']
Gene expression dictionary: {'BRCA1': 120.5, 'TP53': 89.3, 'EGFR': 156.7, 'MYC': 45.2, 'KRAS': 203.1}
Max expression: 203.10
Number of genes above mean: 2


### 1.3 Control Flow Review

**Exercise 1.3:** Write a function `classify_temperature(temp_celsius)` that returns:
- "Freezing" if temp < 0
- "Cold" if 0 <= temp < 10
- "Mild" if 10 <= temp < 20
- "Warm" if 20 <= temp < 30
- "Hot" if temp >= 30

In [3]:
def classify_temperature(temp_celsius):
    if temp_celsius < 0:
        return "Freezing"
    elif 0 <= temp_celsius < 10:
        return "Cold"
    elif 10 <= temp_celsius < 20:
        return "Mild"
    elif 20 <= temp_celsius < 30:
        return "Warm"
    else:  # temp >= 30
        return "Hot"

# Test cases
test_temps = [-5, 5, 15, 25, 35]
for temp in test_temps:
    print(f"{temp}°C: {classify_temperature(temp)}")

-5°C: Freezing
5°C: Cold
15°C: Mild
25°C: Warm
35°C: Hot


**Exercise 1.4:** Write a function `count_nucleotides(dna_sequence)` that returns a dictionary with counts of each nucleotide (A, T, G, C).

In [4]:
def count_nucleotides(dna_sequence):
    """
    Count occurrences of each nucleotide in a DNA sequence.

    Parameters:
    dna_sequence (str): DNA sequence

    Returns:
    dict: Dictionary with nucleotide counts
    """
    dna_sequence = dna_sequence.upper()
    counts = {'A': 0, 'T': 0, 'G': 0, 'C': 0}

    for nucleotide in dna_sequence:
        if nucleotide in counts:
            counts[nucleotide] += 1

    return counts

# Alternative using count() method:
def count_nucleotides_v2(dna_sequence):
    dna_sequence = dna_sequence.upper()
    return {
        'A': dna_sequence.count('A'),
        'T': dna_sequence.count('T'),
        'G': dna_sequence.count('G'),
        'C': dna_sequence.count('C')
    }

# Test
sequence = "ATGCGATCGATCGTAGCTA"
print(count_nucleotides(sequence))
# Output: {'A': 5, 'T': 5, 'G': 5, 'C': 4}
# Expected: {'A': 5, 'T': 5, 'G': 5, 'C': 4}

{'A': 5, 'T': 5, 'G': 5, 'C': 4}


---

## Part 2: Git/GitHub Introduction

### 2.1 Version Control Concepts

**Discussion Questions** (to be discussed in small groups):
1. Why is version control important in scientific computing?
2. What problems does Git solve?
3. What's the difference between Git and GitHub?

### 2.2 Essential Git Commands

Below are the commands you'll practice during the hands-on session. **Do not run these in the notebook** - use your terminal instead!

```bash
# Configure Git (do this once)
git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"

# Clone a repository
git clone <repository-url>

# Check status of your repository
git status

# Add files to staging area
git add <filename>
git add .  # adds all changed files

# Commit changes
git commit -m "Descriptive message about your changes"

# Push changes to remote repository
git push

# Pull changes from remote repository
git pull

# Create a new branch
git branch <branch-name>
git checkout <branch-name>
# Or do both at once:
git checkout -b <branch-name>

# View commit history
git log
```

### 2.3 Git Workflow Exercise

**Hands-on Task** (to be done in terminal):

1. Create a new directory for testing Git
2. Initialize a Git repository with `git init`
3. Create a simple Python file (e.g., `hello.py`)
4. Add and commit the file
5. Make a change to the file
6. Check the status, add, and commit again
7. View your commit history

### 2.4 Reflection Questions

Answer these in the markdown cell below after completing the Git exercises:

1. What does `git add` do, and why is it a separate step from `git commit`?
2. What makes a good commit message?
3. When would you use branches in your group project?

**Your answers here:**

1.
2.
3.

---

## Part 3: Introduction to Object-Oriented Programming

### 3.1 What is OOP and Why Use It?

**Object-Oriented Programming (OOP)** is a programming paradigm that organizes code into **objects** - bundles of data (attributes) and functions (methods) that work together.

**Why use OOP?**
- **Organization**: Group related data and functions together
- **Reusability**: Create templates (classes) to make multiple similar objects
- **Maintainability**: Changes to a class affect all its objects
- **Real-world modeling**: Objects can represent real entities (genes, experiments, organisms)

### 3.2 Classes and Objects/Instances: A Simple Example

Think of a **class** as a blueprint, and an **object** as something built from that blueprint.

In [9]:
# Example: A simple Gene class
class Gene:
    """A class to represent a gene."""

    def __init__(self, name, sequence):
        """Initialize a Gene object."""
        self.name = name
        self.sequence = sequence

    def get_length(self):
        """Return the length of the gene sequence."""
        return len(self.sequence)

    def describe(self):
        """Print information about the gene."""
        print(f"Gene: {self.name}")
        print(f"Length: {self.get_length()} bp")

# Creating objects (instances) of the Gene class
gene1 = Gene("BRCA1", "ATGCGATCGATCG")
gene2 = Gene("TP53", "GCTAGCTAGCTA")

# Using the objects
print("First Gene:")
gene1.describe()

print("\nSecond Gene:")
gene2.describe()

# Let's explore more about these objects
print("\n" + "="*50)
print("ADDITIONAL EXPLORATION OF THE GENE OBJECTS:")
print("="*50)

# Access attributes directly
print(f"\nGene1 name: {gene1.name}")
print(f"Gene1 sequence: {gene1.sequence}")
print(f"Gene1 sequence length (using method): {gene1.get_length()}")
print(f"Gene1 sequence length (directly): {len(gene1.sequence)}")

# Create more gene objects
gene3 = Gene("EGFR", "ATGCTAGCTAGCTAGC")
gene4 = Gene("MYC", "GCATCGATCGATCGAT")

# Store genes in a list
all_genes = [gene1, gene2, gene3, gene4]

print("\n" + "="*50)
print("ANALYZING MULTIPLE GENES:")
print("="*50)

# Loop through all genes
for gene in all_genes:
    print(f"- {gene.name}: {gene.get_length()} bp")

# Find the longest gene
longest_gene = max(all_genes, key=lambda g: g.get_length())
print(f"\nLongest gene: {longest_gene.name} ({longest_gene.get_length()} bp)")

# Find genes with specific patterns
print("\n" + "="*50)
print("SEARCHING FOR SPECIFIC SEQUENCE PATTERNS:")
print("="*50)

for gene in all_genes:
    if "ATGC" in gene.sequence:
        print(f"{gene.name} contains 'ATGC' pattern")
    if gene.get_length() > 10:
        print(f"{gene.name} is longer than 10 bp")

First Gene:
Gene: BRCA1
Length: 13 bp

Second Gene:
Gene: TP53
Length: 12 bp

ADDITIONAL EXPLORATION OF THE GENE OBJECTS:

Gene1 name: BRCA1
Gene1 sequence: ATGCGATCGATCG
Gene1 sequence length (using method): 13
Gene1 sequence length (directly): 13

ANALYZING MULTIPLE GENES:
- BRCA1: 13 bp
- TP53: 12 bp
- EGFR: 16 bp
- MYC: 16 bp

Longest gene: EGFR (16 bp)

SEARCHING FOR SPECIFIC SEQUENCE PATTERNS:
BRCA1 contains 'ATGC' pattern
BRCA1 is longer than 10 bp
TP53 is longer than 10 bp
EGFR contains 'ATGC' pattern
EGFR is longer than 10 bp
MYC is longer than 10 bp


### 3.3 Understanding `__init__` and `self`

- **`__init__`**: The constructor method, called automatically when you create a new object
- **`self`**: Refers to the specific object instance. It allows the object to access its own attributes and methods

Think of `self` as "this specific object" - it's how each object keeps track of its own data.

**Exercise 3.1:** Create a `Sample` class to represent a biological sample with the following:
- Attributes: `sample_id`, `organism`, `collection_date`
- Method: `display_info()` that prints all the sample information

In [10]:
class Sample:
    """A class to represent a biological sample."""

    def __init__(self, sample_id, organism, collection_date):
        self.sample_id = sample_id
        self.organism = organism
        self.collection_date = collection_date

    def display_info(self):
        print(f"Sample ID: {self.sample_id}")
        print(f"Organism: {self.organism}")
        print(f"Collection Date: {self.collection_date}")

# Test the class
sample1 = Sample("S001", "Arabidopsis thaliana", "2024-01-15")
sample1.display_info()

Sample ID: S001
Organism: Arabidopsis thaliana
Collection Date: 2024-01-15


### 3.4 Instance Methods

Methods are functions that belong to a class. They can access and modify the object's attributes using `self`.

**Exercise 3.2:** Create an `Experiment` class with:
- Attributes: `name`, `temperature`, `ph`, `measurements` (initialize as empty list)
- Methods:
  - `add_measurement(value)`: adds a value to the measurements list
  - `get_average()`: returns the average of all measurements
  - `condition_string()`: returns a formatted string with temperature and pH

In [11]:
class Experiment:
    """A class to represent a scientific experiment."""

    def __init__(self, name, temperature, ph):
        self.name = name
        self.temperature = temperature
        self.ph = ph
        self.measurements = []

    def add_measurement(self, value):
        self.measurements.append(value)

    def get_average(self):
        if len(self.measurements) == 0:
            return 0
        return sum(self.measurements) / len(self.measurements)

    def condition_string(self):
        return f"Temperature: {self.temperature}°C, pH: {self.ph}"

# Test the class
exp = Experiment("Growth rate study", 25, 7.0)
exp.add_measurement(1.2)
exp.add_measurement(1.5)
exp.add_measurement(1.3)

print(exp.condition_string())
print(f"Average measurement: {exp.get_average():.2f}")
print(f"All measurements: {exp.measurements}")

Temperature: 25°C, pH: 7.0
Average measurement: 1.33
All measurements: [1.2, 1.5, 1.3]


### 3.5 Multiple Objects

The power of OOP is creating multiple objects from the same class, each with their own data.

**Exercise 3.3:** Create a `Protein` class with:
- Attributes: `name`, `sequence` (string of amino acids), `molecular_weight`
- Methods:
  - `get_length()`: returns the number of amino acids
  - `has_motif(motif)`: returns True if the motif string is found in the sequence
  - `get_info()`: returns a formatted string with all protein information

Then create at least 3 different protein objects and test all methods.

In [13]:
class Protein:
    """A class to represent a protein."""

    def __init__(self, name, sequence, molecular_weight):
        self.name = name
        self.sequence = sequence
        self.molecular_weight = molecular_weight

    def get_length(self):
        return len(self.sequence)

    def has_motif(self, motif):
        return motif in self.sequence

    def get_info(self):
        info = f"Protein Name: {self.name}\n"
        info += f"Amino Acid Sequence: {self.sequence}\n"
        info += f"Length: {self.get_length()} amino acids\n"
        info += f"Molecular Weight: {self.molecular_weight} Da\n"
        return info

# Create and test protein objects
insulin = Protein("Insulin", "MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKT", 5807.57)
collagen = Protein("Collagen", "GPPGPPGPPGPPGPP", 14000)
actin = Protein("Actin", "MDDDIAALVVDNGSGMCKAGFAGDDAPRAVFPSIVGRPRHQGVMVGMGQKDSYVGDEAQSKRGILTLKYPIEHGIVTNWDDMEKIWHHTFYNELRVAPEEHPTLLTEAPLNPKANREKMTQIMFETFNTPAMYVAIQAVLSLYASGRTTGIVLDSGDGVTHNVPIYEGYALPHAIMRLDLAGRDLTDYLMKILTERGYSFVTTAEREIVRDIKEKLCYVALDFENEMATAASSSSLEKSYELPDGQVITIGNERFRCPETLFQPSFIGMESAGIHETTYNSIMKCDIDIRKDLYANTVLSGGTTMYPGIADRMQKEITALAPSTMKIKIIAPPERKYSVWIGGSILASLSTFQQMWISKQEYDESGPSIVHRKCF", 42000)

# Test all methods
print("Testing Insulin:")
print(insulin.get_info())
print(f"Has motif 'CGSH': {insulin.has_motif('CGSH')}")
print(f"Has motif 'XYZ': {insulin.has_motif('XYZ')}")
print(f"Length: {insulin.get_length()} amino acids")
print("\n" + "="*50 + "\n")

print("Testing Collagen:")
print(collagen.get_info())
print(f"Has motif 'GPP': {collagen.has_motif('GPP')}")
print("\n" + "="*50 + "\n")

print("Testing Actin:")
print(f"Actin length: {actin.get_length()} amino acids")
print(f"Has motif 'ATP': {actin.has_motif('ATP')}")

Testing Insulin:
Protein Name: Insulin
Amino Acid Sequence: MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKT
Length: 54 amino acids
Molecular Weight: 5807.57 Da

Has motif 'CGSH': True
Has motif 'XYZ': False
Length: 54 amino acids


Testing Collagen:
Protein Name: Collagen
Amino Acid Sequence: GPPGPPGPPGPPGPP
Length: 15 amino acids
Molecular Weight: 14000 Da

Has motif 'GPP': True


Testing Actin:
Actin length: 375 amino acids
Has motif 'ATP': False


### 3.6 Challenge Exercise

**Exercise 3.4:** Create a `GeneExpression` class that:
- Stores gene name and a dictionary of expression values for different conditions
- Has methods to:
  - Add a new condition with its expression value
  - Get the condition with highest expression
  - Get the condition with lowest expression
  - Calculate fold change between two conditions
  - Display all expression data

In [14]:
class GeneExpression:
    """A class to track gene expression across different conditions."""

    def __init__(self, gene_name):
        self.gene_name = gene_name
        self.expression_data = {}

    def add_condition(self, condition_name, expression_value):
        self.expression_data[condition_name] = expression_value

    def get_max_condition(self):
        if not self.expression_data:
            return None
        max_value = max(self.expression_data.values())
        for condition, value in self.expression_data.items():
            if value == max_value:
                return condition, value
        return None

    def get_min_condition(self):
        if not self.expression_data:
            return None
        min_value = min(self.expression_data.values())
        for condition, value in self.expression_data.items():
            if value == min_value:
                return condition, value
        return None

    def calculate_fold_change(self, condition1, condition2):
        if condition1 in self.expression_data and condition2 in self.expression_data:
            if self.expression_data[condition2] == 0:
                return None  # Avoid division by zero
            return self.expression_data[condition1] / self.expression_data[condition2]
        return None

    def display_expression(self):
        print(f"Expression Data for {self.gene_name}:")
        print("=" * 40)
        for condition, value in sorted(self.expression_data.items()):
            print(f"{condition:20} : {value:8.2f}")
        print("=" * 40)

# Test the class
gene = GeneExpression("BRCA1")

# Add different conditions
gene.add_condition("control", 100)
gene.add_condition("heat_stress", 250)
gene.add_condition("cold_stress", 80)
gene.add_condition("drought", 180)
gene.add_condition("nutrient_deficiency", 60)

# Display all data
gene.display_expression()

# Get highest and lowest expression
max_condition = gene.get_max_condition()
min_condition = gene.get_min_condition()

print(f"\nHighest Expression:")
print(f"  Condition: {max_condition[0]}, Value: {max_condition[1]:.2f}")

print(f"\nLowest Expression:")
print(f"  Condition: {min_condition[0]}, Value: {min_condition[1]:.2f}")

# Calculate fold changes
fold_change1 = gene.calculate_fold_change('heat_stress', 'control')
fold_change2 = gene.calculate_fold_change('cold_stress', 'control')

print(f"\nFold Changes:")
print(f"  Heat stress vs Control: {fold_change1:.2f}x")
print(f"  Cold stress vs Control: {fold_change2:.2f}x")

# Add some analysis
print(f"\nExpression Analysis:")
print(f"  Number of conditions tested: {len(gene.expression_data)}")
print(f"  Average expression: {sum(gene.expression_data.values())/len(gene.expression_data):.2f}")

# Check if gene is up-regulated in heat stress
if fold_change1 > 2:
    print(f"  {gene.gene_name} is strongly up-regulated in heat stress (>2x)")
elif fold_change1 > 1.5:
    print(f"  {gene.gene_name} is moderately up-regulated in heat stress")
else:
    print(f"  {gene.gene_name} shows minimal response to heat stress")

Expression Data for BRCA1:
cold_stress          :    80.00
control              :   100.00
drought              :   180.00
heat_stress          :   250.00
nutrient_deficiency  :    60.00

Highest Expression:
  Condition: heat_stress, Value: 250.00

Lowest Expression:
  Condition: nutrient_deficiency, Value: 60.00

Fold Changes:
  Heat stress vs Control: 2.50x
  Cold stress vs Control: 0.80x

Expression Analysis:
  Number of conditions tested: 5
  Average expression: 134.00
  BRCA1 is strongly up-regulated in heat stress (>2x)


---

## Part 4: Integration Exercise

**Exercise 4.1:** Combine everything you've learned today.

Create a `SequenceAnalyzer` class that:
1. Takes a DNA sequence as input
2. Has methods to:
   - Calculate GC content (use your function from Part 1)
   - Count nucleotides (use your function from Part 1)
   - Transcribe DNA to RNA (replace T with U)
   - Get the reverse complement
   - Generate a summary report as a formatted string

After creating the class, save it as a `.py` file and practice using Git to:
- Add it to your repository
- Commit with a good message
- (If working in groups) Push to your shared repository

In [16]:
class SequenceAnalyzer:
    """A comprehensive DNA sequence analysis tool."""

    def __init__(self, dna_sequence):
        """Initialize with a DNA sequence."""
        self.original_sequence = dna_sequence
        self.dna_sequence = dna_sequence.upper()
        self.validate_sequence()

    def validate_sequence(self):
        """Check if sequence contains only valid nucleotides."""
        valid_nucleotides = {'A', 'T', 'G', 'C'}
        for char in self.dna_sequence:
            if char not in valid_nucleotides:
                raise ValueError(f"Invalid nucleotide '{char}' found. Only A, T, G, C allowed.")

    def calculate_gc_content(self):
        """Calculate GC content percentage."""
        total = len(self.dna_sequence)
        if total == 0:
            return 0
        gc_count = self.dna_sequence.count('G') + self.dna_sequence.count('C')
        return (gc_count / total) * 100

    def count_nucleotides(self):
        """Count occurrences of each nucleotide."""
        counts = {
            'A': self.dna_sequence.count('A'),
            'T': self.dna_sequence.count('T'),
            'G': self.dna_sequence.count('G'),
            'C': self.dna_sequence.count('C')
        }
        return counts

    def transcribe_to_rna(self):
        """Transcribe DNA to RNA (replace T with U)."""
        return self.dna_sequence.replace('T', 'U')

    def get_reverse_complement(self):
        """Get the reverse complement of the DNA sequence."""
        complement_dict = {'A': 'T', 'T': 'A', 'G': 'C', 'C': 'G'}
        complement = ''
        for nucleotide in self.dna_sequence:
            complement += complement_dict[nucleotide]
        return complement[::-1]

    def generate_report(self):
        """Generate a comprehensive analysis report."""
        report = "=" * 50 + "\n"
        report += "DNA SEQUENCE ANALYSIS REPORT\n"
        report += "=" * 50 + "\n\n"

        # Basic information
        report += "1. SEQUENCE INFORMATION:\n"
        report += "-" * 40 + "\n"
        report += f"Original sequence: {self.original_sequence}\n"
        report += f"Length: {len(self.dna_sequence)} nucleotides\n\n"

        # Nucleotide composition
        counts = self.count_nucleotides()
        report += "2. NUCLEOTIDE COMPOSITION:\n"
        report += "-" * 40 + "\n"
        total = len(self.dna_sequence)
        for nucleotide in ['A', 'T', 'G', 'C']:
            count = counts[nucleotide]
            percentage = (count / total * 100) if total > 0 else 0
            report += f"{nucleotide}: {count:3d} ({percentage:6.2f}%)\n"

        # GC content
        gc_content = self.calculate_gc_content()
        report += f"\nGC Content: {gc_content:.2f}%\n\n"

        # Transcription
        report += "3. TRANSCRIPTION:\n"
        report += "-" * 40 + "\n"
        rna_sequence = self.transcribe_to_rna()
        report += f"RNA sequence: {rna_sequence}\n\n"

        # Reverse complement
        report += "4. REVERSE COMPLEMENT:\n"
        report += "-" * 40 + "\n"
        reverse_comp = self.get_reverse_complement()
        report += f"Reverse complement: {reverse_comp}\n\n"

        # Additional calculations
        report += "5. ADDITIONAL STATISTICS:\n"
        report += "-" * 40 + "\n"
        report += f"AT/GC Ratio: {(counts['A'] + counts['T']) / (counts['G'] + counts['C']):.2f}\n"

        # Check for common features
        report += "\n6. SEQUENCE FEATURES:\n"
        report += "-" * 40 + "\n"

        # Check for start codon
        if "ATG" in self.dna_sequence:
            report += "✓ Contains start codon (ATG)\n"
        else:
            report += "✗ No start codon (ATG) found\n"

        # Check for stop codons
        stop_codons = ["TAA", "TAG", "TGA"]
        found_stop = False
        for codon in stop_codons:
            if codon in self.dna_sequence:
                report += f"✓ Contains stop codon ({codon})\n"
                found_stop = True

        if not found_stop:
            report += "✗ No stop codons found\n"

        # Palindrome check
        is_palindrome = self.dna_sequence == self.dna_sequence[::-1]
        report += f"✓ Palindrome: {is_palindrome}\n"

        # Simple melting temperature approximation
        tm = 4 * (counts['G'] + counts['C']) + 2 * (counts['A'] + counts['T'])
        report += f"✓ Approx. Melting Temp (Tm): {tm:.1f}°C\n"

        report += "\n" + "=" * 50 + "\n"
        report += "END OF REPORT\n"
        report += "=" * 50

        return report

# Test your class
seq = SequenceAnalyzer("ATGCGATCGATCGTAGCTA")
print(seq.generate_report())

DNA SEQUENCE ANALYSIS REPORT

1. SEQUENCE INFORMATION:
----------------------------------------
Original sequence: ATGCGATCGATCGTAGCTA
Length: 19 nucleotides

2. NUCLEOTIDE COMPOSITION:
----------------------------------------
A:   5 ( 26.32%)
T:   5 ( 26.32%)
G:   5 ( 26.32%)
C:   4 ( 21.05%)

GC Content: 47.37%

3. TRANSCRIPTION:
----------------------------------------
RNA sequence: AUGCGAUCGAUCGUAGCUA

4. REVERSE COMPLEMENT:
----------------------------------------
Reverse complement: TAGCTACGATCGATCGCAT

5. ADDITIONAL STATISTICS:
----------------------------------------
AT/GC Ratio: 1.11

6. SEQUENCE FEATURES:
----------------------------------------
✓ Contains start codon (ATG)
✓ Contains stop codon (TAG)
✓ Palindrome: False
✓ Approx. Melting Temp (Tm): 56.0°C

END OF REPORT


---

## Reflection and Next Steps

### What we covered today:
✓ Python fundamentals: functions, data structures, control flow  
✓ Version control with Git/GitHub  
✓ OOP basics: classes, objects, attributes, methods  

### For your group project this week:
1. Form your groups (3-4 people)
2. Decide on a project topic
3. Set up a shared GitHub repository
4. Each member clones the repository
5. Draft a project outline (README.md file)
6. Decide who will work on what components
7. Practice the Git workflow: branch → add → commit → push → pull request

### Questions to discuss with your group:
- What classes might be useful for your project?
- How will you divide the work?
- What branching strategy will you use?
- How often will you integrate your code?

### Prepare for next week:
- Review OOP concepts
- Make sure everyone in your group can push/pull from the shared repository
- Start thinking about the data structures your project needs

---

## Additional Practice Exercises

### Bonus Exercise 1: Enhanced Experiment Class

Extend the `Experiment` class to include:
- A method to remove outliers (values that are >2 standard deviations from mean)
- A method to get the standard deviation
- A method to export data to a dictionary format

In [None]:
    pass



### Bonus Exercise 2: Organism Class

Create an `Organism` class that could be useful for ecological studies:

In [None]:
class Organism:
    """A class to represent an organism for ecological studies."""
    def __init__(self, species_name, common_name, kingdom):
    pass

