### Part I: Simulating DNA sequences

Welcome to the exciting world of biology and computer science! In this module, we will embark on a journey to understand the basics of genetic information and how it shapes the living organisms around us. DNA, or deoxyribonucleic acid, is at the heart of this exploration. It's a fascinating molecule that serves as the instruction manual for every organism on Earth.

DNA is structured as a long chain made up of smaller units called nucleotides. These are the critical building blocks of DNA, much like how letters form the building blocks of words. In the DNA "alphabet," there are four nucleotides: adenine (A), thymine (T), cytosine (C), and guanine (G). The order in which these nucleotides are arranged constitutes the genetic code of an organism, similar to how the arrangement of letters forms sentences and paragraphs that carry specific information.

The sequence of nucleotides in DNA determines everything from the color of a flower to the height of a human being. These sequences are passed from parents to offspring, carrying the instructions for the development, growth, and function of all living things. But despite this incredible variety, all life on Earth shares the same fundamental DNA structure, highlighting a deep, unifying thread of biological connection.

In today's lesson, we're going to combine principles of biology with computer programming to simulate DNA sequences. By doing this, we'll gain a deeper understanding of how genetic information is structured and how we can analyze and compare genetic material from different organisms using computational methods.

Here's what we'll be doing in this exercise:

1. **Simulating DNA**: We will use programming concepts to create our own DNA sequences. This will involve generating long chains of the letters A, T, C, and G in random sequences, mimicking the way DNA sequences look in nature.

2. **Analysis and Comparison**: Once we have our simulated DNA strands, we will compare them to each other to see how they differ. This is similar to what scientists do when they are looking at the DNA from different species or individuals to understand genetic relationships and evolutionary history.

3. **Understanding Mutations**: We will introduce changes or 'mutations' into our DNA sequences and observe how even small changes can have big impacts on the genetic code.

By the end of this lesson, you will have a better understanding of how DNA works and how programming can be a powerful tool in the study of biology. So, grab your virtual lab coats and prepare to dive into the digital world of genetics! Let's embark on this adventure together and unlock the secrets of life’s code.

In [8]:
# load libraries

import random # helps us generate random numbers

from sequence_alignment_viewer import * # a program that lets us compare DNA sequences

### Coding Exercise: Generating Random DNA Sequences

#### Objective:
Your task is to write a Python function that generates a random DNA sequence of a given length. This exercise will help you understand the randomness and variety present in biological DNA sequences, as well as practice your skills in Python programming.

#### Instructions:

1. **Function Definition**: Begin by defining a function named `random_DNA`. This function should take one parameter, `length`, which represents the length of the DNA sequence to be generated.

2. **Initializing the Sequence**: Inside the function, create a variable named `DNA`. Initialize this variable with an empty string. This variable will be used to store the randomly generated DNA sequence.

3. **Generating the Sequence**:
    - Create a loop that iterates `length` times. Each iteration represents the addition of a new nucleotide to your DNA sequence.
    - Within the loop, add a randomly selected nucleotide to your DNA string. The nucleotides are represented by the letters 'C', 'G', 'T', and 'A'. You'll need to use the `random.choice()` method to select one of these letters randomly. Remember, for this to work, you'll need to import the `random` module at the beginning of your script.

4. **Returning the Sequence**: After the loop finishes, return the `DNA` string. This string should now contain a sequence of nucleotides of the specified length.

#### Sample Code Structure:
```python
# Import the necessary module
import random

# Define the function to generate a random DNA sequence
def random_DNA(length):
    # Your code here

# Test your function
print(random_DNA(10))  # This should print a random DNA sequence of length 10
```

#### Task:
Complete the missing parts of the provided function template using Python code. Once you've written your function, test it by generating a DNA sequence of different lengths, such as 5, 10, or 20, and print out the results.

#### Tips:
- Make sure you understand how the `for` loop works and how it uses the `range()` function.
- Remember to import the `random` module, which is required for `random.choice()`.
- Check the syntax and ensure there are no typos in your code. Python is case-sensitive and requires proper indentation.

Complete this task in your Python environment, and observe the different DNA sequences your function generates. Reflect on how this randomness mirrors the diversity of life and its genetic underpinnings. Good luck!

In [9]:
# generate a random DNA sequence of a given length

def random_DNA(length):
    # create a variable to store the DNA sequence
    DNA=""
    # loop over the length of the sequence
    for count in range(length):
        # add a random base to the sequence
        DNA+=random.choice("CGTA")
    return DNA

In [10]:
# mutate a DNA sequence N times

def mutate_n(dna, N):
    # convert the DNA sequence to a list
    dna = list(dna)
    # loop over the number of mutations
    for i in range(N):
        # choose a random site to mutate
        mutation_site = random.randint(0, len(dna) - 1)
        # choose a random base to mutate to
        new_base = random.choice('ATCG')
        # mutate the DNA sequence
        dna[mutation_site] = new_base
    return ''.join(dna)

In [11]:
# generate a random DNA sequence of a given length

# choose a length for the DNA sequence
seq_length = 40

# generate a random DNA sequence of the chosen length
seq1 = random_DNA(length=seq_length)
print("DNA sequence of our ancestor: " + seq1)

# create a dictionary of sequences
sequences = {'Ancestor': seq1}  
# add 10 mutant sequences to the dictionary
for i in range(10):
    seq_name = 'mutant_' + str(i)
    sequences[seq_name] = mutate_n(seq1, 10)

# view the DNA sequences
p = view_alignment(sequences, language='DNA', plot_width=1000)
pn.pane.Bokeh(p)

DNA sequence of our ancestor: AAACCACCACGCGTGCAGAGCGAGTGGCATAACGTTCAGG


BokehModel(combine_events=True, render_bundle={'docs_json': {'eb89e8ea-4483-4213-a7b9-f688a6cb6c4b': {'version…

Try playing with different numbers of mutations to see how it impacts the similarity of the DNA sequences.