# Python Basics in Computational Genomics

Computational genomics, also known as bioinformatics, involves the use of computational and statistical techniques to decipher biological processes. Python, with its rich ecosystem of libraries and tools, is a popular language for bioinformatics.

In this notebook, we will cover some basic Python concepts and apply them to simple computational genomics tasks.

# Environment Setup

In [None]:
import os
from collections import Counter
import logging
import sys
from pathlib import Path
from dotenv import load_dotenv

In [None]:
def find_comp_gen_dir():
    """Find the computational_genetic_genealogy directory by searching up from current directory."""
    current = Path.cwd()
    
    # Search up through parent directories
    while current != current.parent:
        # Check if target directory exists in current path
        target = current / 'computational_genetic_genealogy'
        if target.is_dir():
            return target
        # Move up one directory
        current = current.parent
    
    raise FileNotFoundError("Could not find computational_genetic_genealogy directory")

def load_env_file():
    """Find and load the .env file from the computational_genetic_genealogy directory."""
    try:
        # Find the computational_genetic_genealogy directory
        comp_gen_dir = find_comp_gen_dir()
        
        # Look for .env file
        env_path = comp_gen_dir / '.env'
        if not env_path.exists():
            print(f"Warning: No .env file found in {comp_gen_dir}")
            return None
        
        # Load the .env file
        load_dotenv(env_path, override=True)
        print(f"Loaded environment variables from: {env_path}")
        return env_path
        
    except FileNotFoundError as e:
        print(f"Error: {e}")
        return None

# Use the function
env_path = load_env_file()

In [None]:
working_directory = os.getenv('PROJECT_WORKING_DIR', default=None)
data_directory = os.getenv('PROJECT_DATA_DIR', default=None)
references_directory = os.getenv('PROJECT_REFERENCES_DIR', default=None)
results_directory = os.getenv('PROJECT_RESULTS_DIR', default=None)
utils_directory = os.getenv('PROJECT_UTILS_DIR', default=None)

print(f"Working Directory: {working_directory}")
print(f"Data Directory: {data_directory}")
print(f"References Directory: {references_directory}")
print(f"Results Directory: {results_directory}")
print(f"Utils Directory: {utils_directory}")

# Practice

## Built-in Python Methods

Python comes with a rich set of built-in functions and methods that are always available for use. These functions provide essential functionality and are an integral part of the Python language. They are designed to simplify common tasks, making it easier for developers to write code without having to reinvent the wheel.

### The `print()` Function

One of the most commonly used built-in functions in Python is the `print()` function. It is used to display information to the console. The `print()` function can take multiple arguments, and these arguments can be of various data types, including strings, numbers, and other objects.

The `print()` function also supports various formatting options, allowing developers to display structured data in a more readable manner. For example, using f-strings (formatted string literals) in Python 3.6 and above, developers can embed expressions inside string literals.

Beyond its basic usage, the `print()` function serves two critical roles in programming:

1. **Monitoring Progress:** Especially in long-running scripts or loops, using `print()` statements can provide feedback about the progress of the code. This can be invaluable in ensuring that a script is running as expected.

2. **Troubleshooting and Debugging:** When encountering unexpected behavior or errors, `print()` can be used to display variable values, flow control checkpoints, or other relevant information to help identify the source of the issue.

Let's see some examples of how to use the `print()` function in different scenarios.

## Comments in Python

Comments are an essential tool for developers to annotate and explain their code. They provide context and make the code more readable for others (or even for the same developer revisiting the code later). In Python, comments are not executed by the interpreter, so they don't affect the program's output.

There are two primary types of comments in Python:

1. **Single-line comments:** These are used for brief annotations and are prefixed with the `#` symbol. Everything after the `#` on that line is considered a comment.

2. **Multi-line comments:** Python does not have a specific syntax for multi-line comments. However, developers often use triple quotes (`'''` or `"""`) to write comments that span multiple lines. While this is technically a multi-line string, it acts as a comment when not assigned to a variable or used in an expression.

Let's see some examples of how to use single-line and multi-line comments in Python.

In [None]:
print('Hello, World!')

In [None]:
# This is a single-line comment
print('Notice that the commented line in the cell did not execute.')

In [None]:
'''
This is a multi-line comment.
It spans multiple lines.
'''
print('Again, notice that the commented lines in the cell did not execute.')

In [None]:
# Example 2: Using print() with a different message
print("Welcome to Computational Genomics with Python.")

## Using Single and Double Quotes in Python Strings

In Python, strings can be defined using either single (`'`) or double (`"`) quotes. Both are valid, and the choice often comes down to personal preference or specific use cases. However, there are scenarios where one might be more advantageous than the other.

For instance, if you observe the examples in the cells [here](https://app.noteable.io/f/e8355dd0-a538-4b09-8652-a306d3432437/Python_Basics_Computational_Genomics.ipynb?cellID=aa4d267c-59c7-4104-88c8-a0efbc048c5b) and [here](https://app.noteable.io/f/e8355dd0-a538-4b09-8652-a306d3432437/Python_Basics_Computational_Genomics.ipynb?cellID=26a9ae23-48a0-415e-b673-b73ff7b8dc3f), you'll notice that the first example uses single quotes, while the second uses double quotes.

### Why choose one over the other?

1. **Presence of Quotes Inside the String:** If your string contains a single quote (e.g., `I'm learning Python`), it's easier to wrap the string in double quotes to avoid having to escape the single quote. Conversely, if your string contains double quotes (e.g., `He said, "Python is amazing!"`), you can use single quotes.

2. **Consistency:** Some developers prefer to stick to one type of quote throughout their code for consistency. This is purely a stylistic choice.

3. **Escape Sequences:** If you need to include both single and double quotes in your string, you can use escape sequences (`\'` or `\"`).

In the following cells, we'll see examples of using both single and double quotes in the `print()` command.

In [None]:
# Example using single quotes for the string
print('This is a string using single quotes.')

In [None]:
# Example using double quotes for the string
print("This is a string using double quotes.")

In [None]:
# Example using both single and double quotes in the string
print('He said, "Python is amazing!"')

In [None]:
# Example using escape sequences to include both single and double quotes in the string
print('They said, "I\'m learning Python."')

## Variables and Data Types in Python

In Python, a variable allows you to store a value by assigning it to a name, which can be used to refer to the value later in the program. The value stored in a variable can be of various types, such as numbers, strings, lists, and more. These types are known as data types.

Understanding data types is crucial because Python treats data differently based on its type. For instance, the operations you can perform on a number (like addition or subtraction) are different from the operations you can perform on a string (like concatenation).

In the context of genetic similarity, relatedness, and pedigrees, variables play a pivotal role in storing and manipulating genetic data. Let's explore some examples:

1. **Genotype Data:** A genotype can be represented as a string, with alleles being represented by letters (e.g., 'AA', 'Aa').

2. **DNA Sequence:** A DNA sequence can also be represented as a string, consisting of the nucleotide bases (e.g., 'ATCGATCGA').

3. **Pedigree Information:** Pedigree data can be stored in more complex data structures, like dictionaries or lists, where each entry represents an individual and their relationships to other individuals.

Let's see some examples of variables in the context of genetics:

In [None]:
# Example of using variables with genotype data
individual_1_genotype = 'AA'
individual_2_genotype = 'Aa'
individual_3_genotype = 'aa'

print(individual_1_genotype)
print(individual_2_genotype)
print(individual_3_genotype)

In [None]:
# Defining a DNA sequence
dna_sequence = 'ATGCGTAACGTCGTA'
print(dna_sequence)

In [None]:
# Example of a list
genotypes_list = ['AA', 'Aa', 'aa', 'Aa', 'AA']

print(genotypes_list)

In [None]:
# Example of a dictiionary
family_genotypes = {
    'Father': 'AA',
    'Mother': 'Aa',
    'Child1': 'Aa',
    'Child2': 'AA'
}

print(family_genotypes)

# Explanation of f-strings
In Python, f-strings, also known as formatted string literals, are a way to embed expressions inside string literals. They are prefixed with an 'f' character and use curly braces {} to enclose the expressions. The expressions inside the curly braces are evaluated at runtime and then formatted using the specified format string.

In bioinformatics, f-strings are particularly useful because they allow for dynamic generation of strings based on variable values. This is especially handy when generating output messages, file names, or any other string that needs to incorporate variable data. For example, when analyzing genetic data, one might want to generate a message that includes the specific genotype of an individual. Using f-strings makes this process concise and readable.

In the following examples, we will call the same variables that we assigned earlier to demonstrate the use of f-strings. F-strings provide a concise way to embed expressions inside string literals, using `{}`. This is particularly useful in bioinformatics when we want to dynamically display data within strings.

In [None]:
print(f'Genotype of Individual 1: {individual_1_genotype}')
print(f'Genotype of Individual 2: {individual_2_genotype}')
print(f'Genotype of Individual 3: {individual_3_genotype}')

In [None]:
dna_sequence = 'ATGCGTA'
print(f'The DNA sequence is: {dna_sequence}')

In [None]:
print(f'The list of genotypes is: {genotypes_list}')

In [None]:
print(f'Family Genotypes: {family_genotypes}')

## Operations on Variables and Importance of Data Types

Variables in Python are not just placeholders for data; we can also perform operations on them. The type of operations we can perform depends on the data type of the variable. For instance, we can add two numbers, concatenate two strings, or even combine lists. However, trying to perform an inappropriate operation for a given data type will result in an error. This is why understanding data types is crucial. In the context of computational genomics, we often deal with large datasets, and using variables efficiently can help streamline our analyses.

In [None]:
# Get the datatype by using the type() method
print(f'Data type of sequence: {type(dna_sequence)}')
print(f'Data type of genotypes_list: {type(genotypes_list)}')

In [None]:
# The len() method gets the length of the variable value.
print(f'The DNA sequence: {dna_sequence}')
print(f'Length of dna_sequence: {len(dna_sequence)}')
print(f"Notice that there are {len(dna_sequence)} characters (in this case, nucleotides) in the DNA sequence.")
print('\n')
print(f'The gentype list: {genotypes_list}')
print(f'Length of genotypes_list: {len(genotypes_list)}')
print(f"Notice that there are {len(genotypes_list)} elements (in this case, genotypes) in the list.")

## Accessing Elements in a List

In Python, you can access individual elements in a list using their index.

**Python Counting Starts at 0**

In many programming languages, including Python, counting starts at 0 rather than 1. This concept is fundamental and crucial to understand, especially when working with data structures like lists, arrays, and strings.

For example, consider a list `['A', 'C', 'G', 'T']`:
- The first element, 'A', is at position 0.
- The second element, 'C', is at position 1.
- The third element, 'G', is at position 2.
- The fourth element, 'T', is at position 3.

It's essential to keep this in mind when accessing elements in a list or when using loops and other programming constructs that rely on indexing.

**Using Brackets to Access Elements**

In Python, the square brackets `[]` are used to access the elements in a list. The index of the element you want to access is placed inside the brackets.

```python
my_list = ['A', 'C', 'G', 'T']
# Accessing the first element (index 0)
print(my_list[0])  # Output: 'A'
```

The brackets signify that you are trying to index into the list, and the number inside the brackets specifies which element you want to retrieve.

In [None]:
# Accessing Various Positions in Lists
print(f'Full dna_sequence: {dna_sequence}')
print(f'First nucleotide in dna_sequence dna_sequence[0]: {dna_sequence[0]}')
print(f'Second nucleotide in dna_sequence dna_sequence[1]: {dna_sequence[1]}')
print(f'Third nucleotide in dna_sequence dna_sequence[2]: {dna_sequence[2]}')
print(f'Last nucleotide in dna_sequence dna_sequence[-1]: {dna_sequence[-1]}')

print('\n')

print(f'Full genotypes_list: {genotypes_list}')
print(f'First genotype in genotypes_list genotypes_list[0]: {genotypes_list[0]}')
print(f'Second genotype in genotypes_list genotypes_list[1]: {genotypes_list[1]}')
print(f'Third genotype in genotypes_list genotypes_list[2]: {genotypes_list[2]}')
print(f'Last genotype in genotypes_list genotypes_list[-1]: {genotypes_list[-1]}')

print('\n')

print(f'Full individual_2_genotype: {individual_2_genotype}')
print(f'First nucleotide in individual_2_genotype individual_2_genotype[0]: {individual_2_genotype[0]}')
print(f'Second nucleotide in individual_2_genotype individual_2_genotype[1]: {individual_2_genotype[1]}')
print(f'Last genotype in genotypes_list genotypes_list[-1]: {individual_2_genotype[-1]}')

## Understanding Data Length and Subsetting
Earlier, we determined the length of our data structures. The length of a data structure, whether it's a string like `dna_sequence` or a list like `genotypes_list`, gives us an idea of how many elements or characters it contains. This is especially useful in bioinformatics when dealing with long DNA sequences or large datasets.

Subsetting, also known as slicing or splitting, allows us to extract specific portions of our data. For instance, if we wanted to analyze only a segment of a DNA sequence or a subset of genotypes from a larger list, subsetting would be the method of choice. This is a fundamental operation in Python and is crucial for data manipulation in bioinformatics.

In [None]:
# Subsetting dna_sequence
# To get a subset of a list, you can use slicing. The syntax is list[start:stop].
# This will return a new list containing elements from index 'start' to 'stop-1'.
subset_dna_sequence = dna_sequence[0:3]
print(f'Subset of first 3 nucleotides in dna_sequence: {subset_dna_sequence}')

# You can also omit the start or stop index to slice from the beginning or to the end.
subset_dna_sequence = dna_sequence[:3]
print(f'Subset of first 3 nucleotides in dna_sequence using omitted start index: {subset_dna_sequence}')

subset_dna_sequence = dna_sequence[3:]
print(f'Subset of nucleotides from index 3 to end in dna_sequence: {subset_dna_sequence}')

In [None]:
# Subsetting genotypes_list
# Similar to dna_sequence, you can subset genotypes_list using slicing.
# Here, we get the first 3 genotypes from the list.
subset_genotypes_list = genotypes_list[0:3]
print(f'Subset of first 3 genotypes in genotypes_list: {subset_genotypes_list}')

# You can also use negative indices to slice from the end of the list.
# This will get the last 3 genotypes from the list.
subset_genotypes_list = genotypes_list[-3:]
print(f'Subset of last 3 genotypes in genotypes_list: {subset_genotypes_list}')

# To get genotypes from index 2 to 4 (inclusive), you can use the following slice.
subset_genotypes_list = genotypes_list[2:5]
print(f'Subset of genotypes from index 2 to 4 in genotypes_list: {subset_genotypes_list}')

In [None]:
# Advanced Subsetting with Steps
# You can also specify a step value to skip elements while slicing.
# The syntax is list[start:stop:step].

# Here, we get every 2nd nucleotide from the first 6 nucleotides in dna_sequence.
subset_dna_sequence = dna_sequence[0:6:2]
print(f'Subset of every 2nd nucleotide from the first 6 in dna_sequence: {subset_dna_sequence}')

# Similarly, we can get every 2nd genotype from the first 6 in genotypes_list.
subset_genotypes_list = genotypes_list[0:6:2]
print(f'Subset of every 2nd genotype from the first 6 in genotypes_list: {subset_genotypes_list}')

# You can also use negative step values to reverse the list.
reversed_dna_sequence = dna_sequence[::-1]
print(f'Reversed dna_sequence: {reversed_dna_sequence}')

reversed_genotypes_list = genotypes_list[::-1]
print(f'Reversed genotypes_list: {reversed_genotypes_list}')

## Understanding the Dictionary Data Structure

In Python, a dictionary is a mutable, unordered collection of key-value pairs. Each key must be unique, and it can be of any immutable type such as strings, numbers, or tuples. The values can be of any type, including other dictionaries or lists.

Dictionaries are defined using curly braces `{}` with key-value pairs separated by colons `:`. Multiple key-value pairs are separated by commas `,`.

Here's an example of a dictionary that stores genotypes for a family:

In [None]:
family_genotypes = {
    'Father': 'AA',
    'Mother': 'Aa',
    'Child1': 'Aa',
    'Child2': 'AA'
}
print(family_genotypes)

### Accessing Values
You can access the value associated with a specific key using square brackets `[]`.

In [None]:
father_genotype = family_genotypes['Father']
mother_genotype = family_genotypes['Mother']
print(f"Father's genotype: {father_genotype}")
print(f"Mother's genotype: {mother_genotype}")

## Modifying a Dictionary
Dictionaries are mutable, meaning you can change their elements. You can add new key-value pairs, modify existing ones, or delete key-value pairs.

In [None]:
family_genotypes['Father'] = 'Aa'
father_genotype = family_genotypes['Father']
print(f"Father's genotype: {father_genotype}")

### Adding and Removing Elements
You can add a new key-value pair by assigning a value to a new key. To remove a key-value pair, you can use the `del` keyword.

In [None]:
print("family_genotypes dictionary")
print(family_genotypes)
family_genotypes['Child3'] = 'aa'
family_genotypes['Father'] = 'Aa'
del family_genotypes['Child1']

print('\n')

print("family_genotypes dictionary after changes")
print(family_genotypes)

## Count Method and Variable.Method() Syntax
In Python, objects like strings, lists, and dictionaries come with a set of built-in methods. These methods are functions that are associated with the object and can be called on it to perform specific operations.
The syntax for calling a method on an object (or variable) is `variable.method()`. The `.` indicates that we are accessing a method (or attribute) of the object.
One such method for strings is the `count()` method. It allows us to count the number of occurrences of a specific substring within the string.
For example, if we want to count the number of occurrences of the letter 'A' in a DNA sequence, we can use the `count()` method on the DNA sequence string.
Let's see it in action:

In [None]:
# Counting nucleotides in the DNA sequence

# Count the number of Adenine nucleotides
adenine_count = dna_sequence.count('A')

# Count the number of Cytosine nucleotides
cytosine_count = dna_sequence.count('C')

# Count the number of Guanine nucleotides
guanine_count = dna_sequence.count('G')

# Count the number of Thymine nucleotides
thymine_count = dna_sequence.count('T')

# Print the counts using f-strings
print(f'Adenine (A) count: {adenine_count}')
print(f'Cytosine (C) count: {cytosine_count}')
print(f'Guanine (G) count: {guanine_count}')
print(f'Thymine (T) count: {thymine_count}')

# Using the count method on a list
The `count` method can also be used on lists to determine the number of times a specific element appears in the list. The syntax is similar to that of strings:
```python
list_name.count(element)
```
Where `list_name` is the name of the list and `element` is the item you want to count in the list.

In [None]:
# Counting the occurrences of each genotype
homozygous_dominant_count = genotypes_list.count('AA')
heterozygous_count = genotypes_list.count('Aa')
homozygous_recessive_count = genotypes_list.count('aa')

# Printing the counts
print(f'genotypes_list: {genotypes_list}')
print(f'Homozygous Dominant (AA) count: {homozygous_dominant_count}')
print(f'Heterozygous (Aa) count: {heterozygous_count}')
print(f'Homozygous Recessive (aa) count: {homozygous_recessive_count}')

# Understanding Functions in Python

In Python, a function is a reusable block of code that performs a specific task. Functions are essential for code reusability and organization.

## Defining a Function
You define a function using the `def` keyword, followed by the function name, parentheses `()`, and a colon `:`. The code block within the function is indented.

```python
def greet():
    print('Hello, world!')
```

## Calling a Function
To execute the code inside a function, you call the function by its name followed by parentheses.

```python
greet()  # Output: Hello, world!
```

## Function Parameters
Functions can take parameters, which are variables that you pass into the function. You specify parameters inside the parentheses when defining the function.

```python
def greet(name):
    print(f'Hello, {name}!')
```

## Function Return Values
Functions can also return values using the `return` keyword. The function stops executing after the `return` statement.

```python
def add(a, b):
    return a + b
```

## Function Scope
Variables defined inside a function are local to that function and cannot be accessed outside the function. However, you can pass them as return values to access them outside the function.

## Function Documentation
You can add a documentation string (docstring) to a function using triple quotes `'''` to describe what the function does.

```python
def greet(name):
    '''This function greets the person passed in as a parameter.'''
    print(f'Hello, {name}!')
```

In [None]:
# Defining a simple function to greet
def greet():
    print('Hello, world!')

# Calling the function
greet()

In [None]:
# Function with a parameter
def greet(name):
    print(f'Hello, {name}!')

# Calling the function with a parameter
greet('Alice')

Notice that you must first define the function and then call the function. In the above examples, we created function called great.

In [None]:
# Function with a return value
def add(a, b):
    return a + b

# Calling the function and storing the return value
result = add(5, 3)
print(f'The sum is: {result}')

In [None]:
# More Code Examples to Explain Functions

# Function with default parameters
def greet(name='World'):
    print(f'Hello, {name}!')

# Calling the function with and without parameters
greet()
greet('Alice')

# Function with multiple return values
def coordinates():
    return 40.7128, -74.0060

# Unpacking multiple return values
lat, lon = coordinates()
print(f'Latitude: {lat}, Longitude: {lon}')

In [None]:
# Function with variable number of arguments
def sum_all(*args):
    return sum(args)

# Calling the function with different number of arguments
print(sum_all(1, 2, 3))  # Output: 6
print(sum_all(1, 2, 3, 4, 5))  # Output: 15

# Function with keyword arguments
def print_info(**kwargs):
    for key, value in kwargs.items():
        print(f'{key}: {value}')

# Calling the function with keyword arguments
print_info(name='Alice', age=30, email='alice@email.com')

In [None]:
# Function with both positional and keyword arguments
def display_info(name, age, **kwargs):
    print(f'Name: {name}')
    print(f'Age: {age}')
    for key, value in kwargs.items():
        print(f'{key}: {value}')

# Calling the function with both positional and keyword arguments
display_info('Bob', 40, email='bob@email.com', country='USA')

# Understanding Classes in Python

In Python, a class is a blueprint for creating objects. Classes encapsulate data and behavior that operate on the data. They are a fundamental concept in object-oriented programming (OOP).

## Key Concepts
- **Class Definition**: A class is defined using the `class` keyword.
- **Object**: An instance of a class.
- **Attributes**: Variables that belong to the class.
- **Methods**: Functions that belong to the class.
- **Constructor**: A special method called `__init__` used for initializing objects.
- **Inheritance**: A way to form new classes using classes that have already been defined.
- **Encapsulation**: Hiding the private details of a class from other objects.
- **Polymorphism**: The ability of different objects to be treated as objects of a common superclass.

In [None]:
# Defining a Simple Class for DNA Sequence
class DNASequence:
    def __init__(self, sequence):
        self.sequence = sequence

    def display_sequence(self):
        print(f'DNA Sequence: {self.sequence}')

# Creating an Object of the Class
human_dna = DNASequence('ATCGGCTA')

# Calling a Method of the Object
human_dna.display_sequence()

In [None]:
# Class with Constructor for DNA Sequence and Organism
class DNASequence:
    def __init__(self, sequence, organism):
        self.sequence = sequence
        self.organism = organism

# Creating an Object with Constructor
human_dna = DNASequence(sequence='ATCGGCTA', organism='Homo sapiens')

# Accessing Object Attributes
print(f'Sequence: {human_dna.sequence}, Organism: {human_dna.organism}')

In [None]:
# Class with Multiple Methods for DNA Analysis
class DNASequence:
    def __init__(self, sequence, organism):
        self.sequence = sequence
        self.organism = organism

    def display_sequence(self):
        print(f'DNA Sequence: {self.sequence}')

    def display_organism(self):
        print(f'Organism: {self.organism}')

# Creating an Object and Calling Methods
human_dna = DNASequence(sequence='ATCGGCTA', organism='Homo sapiens')
human_dna.display_sequence()
human_dna.display_organism()

In [None]:
# Class Inheritance in Genomics
class Sequence:
    def __init__(self, sequence):
        self.sequence = sequence

    def display_sequence(self):
        print(f'Sequence: {self.sequence}')

class DNASequence(Sequence):
    def display_sequence(self):
        print(f'DNA Sequence: {self.sequence}')

# Creating Object of Subclass and Calling Overridden Method
human_dna = DNASequence('ATCGGCTA')
human_dna.display_sequence()

In [None]:
# Encapsulation in Genomics: Private and Public Attributes
class DNASequence:
    def __init__(self, sequence, organism):
        self.sequence = sequence  # Public Attribute
        self.__organism = organism  # Private Attribute

    def get_organism(self):
        return self.__organism

# Creating Object and Accessing Attributes
human_dna = DNASequence(sequence='ATCGGCTA', organism='Homo sapiens')
print(f'Sequence: {human_dna.sequence}')
print(f'Organism: {human_dna.get_organism()}')

In [None]:
# Polymorphism in Genomics: Using Methods from Different Classes
class Sequence:
    def display(self):
        print('This is a generic sequence.')

class DNASequence(Sequence):
    def display(self):
        print('This is a DNA sequence.')

class RNASequence(Sequence):
    def display(self):
        print('This is an RNA sequence.')

# Using Polymorphism
def sequence_display(seq):
    seq.display()

sequence_display(DNASequence())
sequence_display(RNASequence())

In [None]:
# Class Attributes vs Instance Attributes in Genomics
class DNASequence:
    sequence_type = 'DNA'  # Class Attribute
    def __init__(self, sequence):
        self.sequence = sequence  # Instance Attribute

# Creating Object and Accessing Attributes
human_dna = DNASequence('ATCGGCTA')
print(f'Sequence Type: {DNASequence.sequence_type}')
print(f'Sequence: {human_dna.sequence}')

## Understanding For Loops in Python
In Python, a `for` loop is used for iterating over a sequence (that is either a list, a tuple, a dictionary, a set, or a string). It allows you to execute a block of code repeatedly for each element in the sequence. In the context of genomics, `for` loops can be particularly useful for iterating through DNA sequences, comparing genotypes, or calculating relatedness among individuals.

In [None]:
# For Loop to Iterate Through a DNA Sequence
dna_sequence = 'ATCGGCTA'
for base in dna_sequence:
    print(f'Base: {base}')

In [None]:
# Comparing DNA Sequences for Relatedness
dna_sequence1 = 'ATCGGCTA'
dna_sequence2 = 'ATCGACTA'

match_count = 0

for base1, base2 in zip(dna_sequence1, dna_sequence2):
    if base1 == base2:
        match_count += 1

relatedness = (match_count / len(dna_sequence1)) * 100
print(f'Relatedness between sequences: {relatedness}%')

In [None]:
# For Loop to Count Frequency of Each Base in a DNA Sequence
from collections import Counter

dna_sequence = 'ATCGGCTA'
base_count = Counter(dna_sequence)

for base, count in base_count.items():
    print(f'Base {base} appears {count} times.')

In [None]:
# Iterating Through a Dictionary of Genotypes
family_genotypes = {'Father': 'AA', 'Mother': 'Aa', 'Child1': 'Aa', 'Child2': 'AA'}

for member, genotype in family_genotypes.items():
    print(f'{member} has genotype {genotype}')

In [None]:
# For Loop to Compare Two DNA Sequences for Similarity
dna_sequence1 = 'ATCGGCTA'
dna_sequence2 = 'ATCGACTA'
similar_bases = 0

for base1, base2 in zip(dna_sequence1, dna_sequence2):
    if base1 == base2:
        similar_bases += 1

print(f'Number of similar bases: {similar_bases}')

In [None]:
# Using Enumerate to Get Index and Value
dna_sequences = ['ATCG', 'TGCA', 'CGAT', 'GACT']

for index, seq in enumerate(dna_sequences):
    print(f'Sequence {index+1}: {seq}')

In [None]:
# For Loop to Calculate GC Content in a DNA Sequence
dna_sequence = 'ATCGGCTA'
gc_content = 0

for base in dna_sequence:
    if base in ['G', 'C']:
        gc_content += 1

gc_percentage = (gc_content / len(dna_sequence)) * 100
print(f'GC Content: {gc_percentage}%')

In [None]:
# Using Range Function to Iterate Over Indices
dna_sequences = ['ATCG', 'TGCA', 'CGAT', 'GACT']

for i in range(len(dna_sequences)):
    print(f'Sequence {i+1}: {dna_sequences[i]}')

In [None]:
# For Loop to Find All Occurrences of a Subsequence in a DNA Sequence
dna_sequence = 'ATCGGCTAGCTAGCTA'
subsequence = 'GCTA'
occurrences = []

for i in range(len(dna_sequence) - len(subsequence) + 1):
    if dna_sequence[i:i+len(subsequence)] == subsequence:
        occurrences.append(i)

print(f'Occurrences of {subsequence} are at positions: {occurrences}')

In [None]:
# Nested For Loop to Compare DNA Sequences
dna_sequences1 = ['ATCG', 'TGCA']
dna_sequences2 = ['CGAT', 'GACT']

for seq1 in dna_sequences1:
    for seq2 in dna_sequences2:
        if seq1 == seq2[::-1]:
            print(f'{seq1} and {seq2} are reverse complements.')

## Understanding If-Then-Else Conditions in Python
The `if`, `elif`, and `else` statements in Python allow you to perform conditional operations. These are particularly useful in genomics for tasks like filtering sequences, comparing genetic markers, and making decisions based on genetic data.

In [None]:
# For Loop to Translate DNA Sequence to Amino Acids
codon_table = {
    'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
    'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
    'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
    'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
    'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
    'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
    'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
    'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
    'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
    'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
    'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
    'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
    'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
    'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
    'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_',
    'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W',
}

dna_sequence = 'ATCGGCTA'
amino_acids = ''

for i in range(0, len(dna_sequence), 3):
    codon = dna_sequence[i:i+3]
    amino_acids += codon_table.get(codon, 'X')

print(f'Amino Acids: {amino_acids}')

In [None]:
# For Loop to Calculate Relatedness Score Between Two Genotype Lists
genotypes_list1 = ['AA', 'Aa', 'aa']
genotypes_list2 = ['AA', 'AA', 'aa']
relatedness_score = 0

for genotype1, genotype2 in zip(genotypes_list1, genotypes_list2):
    if genotype1 == genotype2:
        relatedness_score += 1

print(f'Relatedness Score: {relatedness_score}')

## Understanding While Loops in Python
The `while` loop in Python is used to execute a block of code as long as a condition is true. This type of loop is useful when you don't know the number of iterations in advance. In genomics, `while` loops can be used for tasks like sequence alignment, finding motifs, or even simulating genetic drift until a certain condition is met.

In [None]:
# While Loop to Find a Motif in a DNA Sequence
dna_sequence = 'ATCGGCTAGCTAGCTA'
motif = 'GCTA'
position = 0
occurrences = []

while position < len(dna_sequence) - len(motif) + 1:
    if dna_sequence[position:position+len(motif)] == motif:
        occurrences.append(position)
    position += 1

print(f'Occurrences of {motif} are at positions: {occurrences}')

In [None]:
# While Loop to Simulate Genetic Drift for an Allele
import random

allele_frequency = 0.5  # Initial frequency of allele A
generations = 0

while 0 < allele_frequency < 1:
    allele_frequency += random.uniform(-0.05, 0.05)  # Random drift
    allele_frequency = max(0, min(allele_frequency, 1))  # Keep within [0, 1]
    generations += 1

print(f'Allele A fixed or lost after {generations} generations.')

In [None]:
# While Loop to Calculate GC Content Until a Certain Threshold is Reached
dna_sequence = 'ATCGGCTAGCTAGCTA'
gc_content = 0
position = 0
threshold = 0.5

while gc_content < threshold and position < len(dna_sequence):
    if dna_sequence[position] in ['G', 'C']:
        gc_content += 1
    position += 1
    gc_content /= position

print(f'GC content reached {threshold} at position {position}')

In [None]:
# While Loop to Find the Longest Consecutive Run of a Base in a DNA Sequence
dna_sequence = 'ATCGGGGCTAGCTAGCTA'
longest_run = 0
current_run = 1
position = 1

while position < len(dna_sequence):
    if dna_sequence[position] == dna_sequence[position - 1]:
        current_run += 1
    else:
        longest_run = max(longest_run, current_run)
        current_run = 1
    position += 1

longest_run = max(longest_run, current_run)
print(f'Longest consecutive run of a base is {longest_run}')

In [None]:
# While Loop to Calculate Relatedness Score Until a Certain Threshold is Reached
genotypes_list1 = ['AA', 'Aa', 'aa', 'AA', 'Aa']
genotypes_list2 = ['AA', 'AA', 'aa', 'Aa', 'aa']
relatedness_score = 0
position = 0
threshold = 3

while relatedness_score < threshold and position < len(genotypes_list1):
    if genotypes_list1[position] == genotypes_list2[position]:
        relatedness_score += 1
    position += 1

print(f'Relatedness score reached {threshold} at position {position}')

## Understanding If-Then-Else Conditions in Python
The `if-then-else` condition in Python is used to execute a block of code based on whether a condition is true or false. In genomics, these conditional statements can be used to filter sequences, compare genotypes, or even to decide which algorithm to use for a particular analysis.

In [None]:
# If-Then-Else to Determine Dominant or Recessive Genotype
genotype = 'Aa'

if genotype == 'AA' or genotype == 'Aa':
    print('Dominant Genotype')
elif genotype == 'aa':
    print('Recessive Genotype')
else:
    print('Invalid Genotype')

In [None]:
# If-Then-Else to Filter DNA Sequences Based on Length
dna_sequence = 'ATCGGCTA'
min_length = 5

if len(dna_sequence) >= min_length:
    print(f'Sequence is long enough: {dna_sequence}')
else:
    print(f'Sequence is too short: {dna_sequence}')

In [None]:
# If-Then-Else to Classify DNA Sequences Based on GC Content
dna_sequence = 'ATCGGCTA'
gc_content = 0

for base in dna_sequence:
    if base in ['G', 'C']:
        gc_content += 1

gc_percentage = (gc_content / len(dna_sequence)) * 100

if gc_percentage > 50:
    print(f'High GC content: {gc_percentage}%')
elif gc_percentage == 50:
    print(f'Medium GC content: {gc_percentage}%')
else:
    print(f'Low GC content: {gc_percentage}%')

In [None]:
# If-Then-Else to Determine Relatedness Level Based on Score
relatedness_score = 4

if relatedness_score >= 5:
    print('Highly Related')
elif relatedness_score >= 3 and relatedness_score < 5:
    print('Moderately Related')
else:
    print('Not Related')

In [None]:
# If-Then-Else to Check for Presence of a Motif in a DNA Sequence
dna_sequence = 'ATCGGCTA'
motif = 'GCTA'

if motif in dna_sequence:
    print(f'Motif {motif} is present in the sequence.')
else:
    print(f'Motif {motif} is not present in the sequence.')