### Part I: Simulating DNA sequences

Welcome to the exciting world of biology and computer science! In this module, we will embark on a journey to understand the basics of genetic information and how it shapes the living organisms around us. DNA is at the heart of this exploration. It's a fascinating molecule that serves as the instruction manual for every organism on Earth.

DNA is structured as a long chain made up of smaller units called nucleotides. These are the critical building blocks of DNA, much like how letters form the building blocks of words. In the DNA "alphabet," there are four nucleotides: adenine (A), thymine (T), cytosine (C), and guanine (G). The order in which these nucleotides are arranged constitutes the genetic code of an organism, similar to how the arrangement of letters forms sentences and paragraphs that carry specific information.

The sequence of nucleotides in DNA determines everything from the color of a flower to the height of a human being. These sequences are passed from parents to offspring, carrying the instructions for the development, growth, and function of all living things. But despite this incredible variety, all life on Earth shares the same fundamental DNA structure, highlighting a deep, unifying thread of biological connection.

In today's lesson, we're going to combine principles of biology with computer programming to simulate DNA sequences. By doing this, we'll gain a deeper understanding of how genetic information is structured and how we can analyze and compare genetic material from different organisms using computational methods.

Here's what we'll be doing in this exercise:

1. **Simulating DNA**: We will use programming concepts to create our own DNA sequences. This will involve generating long chains of the letters A, T, C, and G in random sequences, mimicking the way DNA sequences look in nature.

2. **Mutating our DNA**: We will introduce changes or 'mutations' into our DNA sequences and observe how even small changes can have big impacts on the genetic code.

3. **Analysis and Comparison of our DNA**: Once we have our simulated DNA strands, we will compare them to each other to see how they differ. This is similar to what scientists do when they are looking at the DNA from different species or individuals to understand genetic relationships and evolutionary history.

By the end of this lesson, you will have a better understanding of how DNA works and how programming can be a powerful tool in the study of biology. So, grab your virtual lab coats and prepare to dive into the digital world of genetics! Let's embark on this adventure together and unlock the secrets of life’s code.

In [8]:
# load libraries

import random # helps us generate random numbers

from sequence_alignment_viewer import * # a program that lets us compare DNA sequences

### Coding Exercise: Generating Random DNA Sequences

#### Objective:
Your task is to write a Python function that generates a random DNA sequence of a given length. This exercise will help you understand the randomness and variety present in biological DNA sequences, as well as practice your skills in Python programming.

#### Instructions:

1. **Function Definition**: Begin by defining a function named `random_DNA`. This function should take one parameter, `length`, which represents the length of the DNA sequence to be generated.

2. **Initializing the Sequence**: Inside the function, create a variable named `DNA`. Initialize this variable with an empty string. This variable will be used to store the randomly generated DNA sequence.

3. **Generating the Sequence**:
    - Create a loop that iterates `length` times. Each iteration represents the addition of a new nucleotide to your DNA sequence.
    - Within the loop, add a randomly selected nucleotide to your DNA string. The nucleotides are represented by the letters 'C', 'G', 'T', and 'A'. You'll need to use the `random.choice()` method to select one of these letters randomly. Remember, for this to work, you'll need to import the `random` module at the beginning of your script.

4. **Returning the Sequence**: After the loop finishes, return the `DNA` string. This string should now contain a sequence of nucleotides of the specified length.

#### Task:
Complete the missing parts of the provided function template using Python code. Once you've written your function, test it by generating a DNA sequence of different lengths, such as 5, 10, or 20, and print out the results.

#### Tips:
- Make sure you understand how the `for` loop works and how it uses the `range()` function.
- Remember to import the `random` module, which is required for `random.choice()`.
- Check the syntax and ensure there are no typos in your code. Python is case-sensitive and requires proper indentation.

Complete this task in your Python environment, and observe the different DNA sequences your function generates. Reflect on how this randomness mirrors the diversity of life and its genetic underpinnings. Good luck!

In [9]:
# Define the function to generate a random DNA sequence of a given length

def random_DNA(length):
    # Your code here

# Test your function
starting_sequence = random_DNA(40)  # This should create a random DNA sequence of length 40
print(starting_sequence)

### Coding Exercise: Mutating a DNA Sequence

#### Objective:
In this exercise, you'll create a Python function to simulate mutations in a DNA sequence. Mutations are changes in the genetic material that can have various causes and effects. By completing this exercise, you'll get a better understanding of genetic variations and strengthen your Python programming skills.

#### Instructions:

1. **Function Definition**: Start by defining a function named `mutate_n`. This function will take two parameters: `dna`, which is a string representing a DNA sequence, and `N`, the number of mutations to introduce into the DNA sequence.

2. **Preparing the Sequence**:
    - Inside the function, convert the string `dna` into a list. This is necessary because strings in Python are immutable, meaning they cannot be changed once created. By converting the string into a list, you can modify individual elements (nucleotides).

3. **Introducing Mutations**:
    - Write a loop that will execute `N` times, each time simulating a single mutation.
    - In each iteration of the loop, select a random position in the DNA sequence to mutate. This position is referred to as the `mutation_site`. Use `random.randint()` to choose a position between 0 and the length of the DNA sequence minus one.
    - Select a new nucleotide randomly from 'A', 'T', 'C', and 'G'. This will be the new base (`new_base`) at the mutation site. Again, use `random.choice()` for this selection.
    - Replace the nucleotide in the selected position (mutation site) with the new nucleotide.

4. **Finalizing the Sequence**:
    - After all mutations have been introduced, convert the list back into a string. Use the `''.join()` method to do this.
    - Return the new mutated DNA sequence from the function.

#### Task:
Implement the missing parts of the provided function template using Python code. After completing the function, test it by applying a few mutations to a sample DNA sequence. Observe how the sequence changes with different numbers of mutations and different initial sequences.

#### Tips:
- Be careful with the indexing in Python; remember that lists are zero-indexed.
- Make sure the `random` module is imported to use `random.randint()` and `random.choice()`.
- Validate your function by printing out the original and mutated DNA sequences to see the changes clearly.

This task will help you understand the concept of mutations in genetics and how they can be simulated using programming. Enjoy exploring the possibilities!

In [None]:
# Define the function to mutate a DNA sequence N times

def mutate_n(dna, N):
    # Convert the DNA string into a list to allow modifications
    # Iterate N times to introduce N mutations
        # For each mutation:
        # - Randomly select a position in the DNA list to mutate (mutation_site)
        # - Randomly select a new nucleotide (new_base) and replace the one at the mutation_site
    # Convert the list back into a string and return the mutated DNA sequence
    # YOUR CODE HERE
    
# Test your function
mutations = 10
print("Original DNA: " + starting_sequence)
print("Mutated DNA: " + mutate_n(starting_sequence, mutations))


### Introduction to Tracking Genetic Mutations

#### Understanding the Evolution of DNA:

In this section of our module, we will delve into the concept of genetic mutation and its role in evolution. Mutations are changes in the DNA sequence of an organism. While the word "mutation" might sound alarming, mutations are a natural and essential part of life on Earth. They are one of the main sources of genetic variation, which is crucial for the survival and evolution of species.

Imagine a world where every creature is identical, with no variation at all. In such a world, a single change in the environment could wipe out entire species. Genetic variation, provided by mutations, gives populations the flexibility to adapt to changing environments, increasing their chances of survival.

#### Exploring Genetic Changes Through Python:

To help us understand the concept of mutations we will simulate the process of mutation in a DNA sequence that represents our hypothetical ancestor. By doing this, we'll get a glimpse into how genetic diversity arises and can be visualized.

Here's what we're going to do:

1. **Start with a Sequence**: We'll begin with a starting DNA sequence representing our "ancestor", which we created earlier. It serves as the starting point for our simulation.

2. **Simulate Mutations**: Using the mutate_n function you wrote, we'll introduce mutations into this ancestral sequence. Specifically, we'll create 10 different "mutant" sequences, each with a set number of random changes compared to the original. This simulates how, over generations, DNA can change and diversify.

3. **Store and View the Results**: We'll keep track of our original sequence and the new mutant sequences using a Python dictionary. This will help us organize and store the different versions of our DNA. Finally, we'll visualize these sequences to compare them directly, giving us a clear view of the mutations and how they differ from the original ancestor sequence.

#### Visualizing Genetic Diversity:

The visualization step is crucial. It will allow us to see the similarities and differences between the original sequence and the mutated versions. This visual comparison is akin to what scientists do when they compare DNA sequences from different organisms or individuals to study their evolutionary relationships.

By completing this activity, you will gain a deeper understanding of genetic mutations, how they contribute to genetic diversity, and how they can be simulated and analyzed using programming.

Let's get started and see what happens when we introduce mutations into the DNA of our ancestral sequence. Keep an open mind, and observe how even small changes in DNA can lead to significant diversity.

In [11]:
# generate a random DNA sequence of a given length

# choose a length for the DNA sequence
seq_length = 42

# generate a random DNA sequence of the chosen length
starting_sequence = random_DNA(length=seq_length)
print("DNA sequence of our ancestor: " + starting_sequence)

# create a dictionary of sequences
sequences = {'Ancestor': starting_sequence}  
# add 20 mutant sequences to the dictionary
for i in range(20):
    seq_name = 'mutant_' + str(i)
    sequences[seq_name] = mutate_n(starting_sequence, 15)

# View the DNA sequences using a visualization tool
p = view_alignment(sequences, language='DNA', plot_width=1000)
pn.pane.Bokeh(p)

DNA sequence of our ancestor: AAACCACCACGCGTGCAGAGCGAGTGGCATAACGTTCAGG


BokehModel(combine_events=True, render_bundle={'docs_json': {'eb89e8ea-4483-4213-a7b9-f688a6cb6c4b': {'version…

Try playing with different numbers of mutations to see how it impacts the similarity of the DNA sequences.