# Day 4, Exercise 1 - scope of variables, positional and keyword arguments

### There are 3 parts to this exercise, with answers listed under each part.


1. Redo the exercise Day3_Exercise2_2 (check GC content level for DNA sequences) by using global variables 
2. Rewrite the function `check_gc_content_level` with default values
3. Extra exercise: Write any function you like and experiment with positional and keyword arguments (optional)

<hr style="border: 2px solid #000080;">

## 1.  Redo the exercise Day3_Exercise2_2 (check GC content level for DNA sequences) by using global variables 

Get the code from the answers of second exercise of Day 3, Exercise 2 if you haven't done it yet. Use threshold_low and threshold_high as global variables. Save the code in a script file and run it. Think about the advantages and disadvantages of using the global variables.

___


### The answers

In [2]:
# Define global variables for GC content thresholds
THRESHOLD_LOW = 40.0 # It's a convention that global variables are defined using uppercase letters.
THRESHOLD_HIGH = 60.0 # These variables are accessible throughout the module.

def readseq(seqfile):
    with open(seqfile, "r", encoding="utf-8") as fpin:
        seq = ""
        for line in fpin:
            if not line.startswith(">"):
                seq += line.strip()
    return seq.upper()

def get_gc_content(seq):
    return (seq.count('G') + seq.count('C')) / len(seq) * 100

def check_gc_content_level(seqfile): # global variables can be accessed within the function
    seq = readseq(seqfile)
    gc_content = get_gc_content(seq)
    if gc_content >= THRESHOLD_HIGH:
        return f"The GC content of the sequence from {seqfile} is {gc_content:.2f}%, level is high"
    elif gc_content <= THRESHOLD_LOW:
        return f"The GC content of the sequence from {seqfile} is {gc_content:.2f}%, level is low"
    else:
        return f"The GC content of the sequence from {seqfile} is {gc_content:.2f}%, level is moderate"

# Test the function to see if it works as expected
level_message = check_gc_content_level("../../downloads/one_dna_sequence.fa")
print(level_message)

The GC content of the sequence from ../../downloads/one_dna_sequence.fa is 53.53%, level is moderate


___
Using global variables in a Python program has both advantages and disadvantages when compared to passing variables explicitly through function arguments. 

#### Global Variables are:

- Best for smaller scripts or when there's a clear and fixed set of configuration values needed across multiple functions. By doing so, the code is easier to read and write because you don't have to pass the same parameters around multiple functions.
- Although it can lead to simpler function signatures but add potential for bugs and maintenance issues.

#### Explicit Passing:

- Recommended for larger, more complex programs where maintainability, readability, and testability are key concerns.
- In this way, functions remain modular, reusable, and easier to test, but may require more verbose code.

<hr style="border: 2px solid #000080;">

## 2. Rewrite the function `check_gc_content_level` with default values


The function `check_gc_content_level` takes three arguments, set default values to `threshold_high` and `threshold_low` with default values and try to call the function with as many variations as possible

In [None]:
def check_gc_content_level(seqfile, threshold_high, threshold_low):
    seq = readseq(seqfile)
    gc_content = get_gc_content(seq)
    if gc_content >= threshold_high:
        return f"The GC content of the sequence from {seqfile} is {gc_content:.2f}%, level is high"
    elif gc_content <= threshold_low:
        return f"The GC content of the sequence from {seqfile} is {gc_content:.2f}%, level is low"
    else:
        return f"The GC content of the sequence from {seqfile} is {gc_content:.2f}%, level is moderate"

___

### The answers

In [17]:
def readseq(seqfile): #    
    with open(seqfile, "r", encoding="utf-8") as fpin:
        seq = ""
        for line in fpin:
            if not line.startswith(">"):
                seq += line.strip()
    return seq.upper()

def get_gc_content(seq):
    return (seq.count('G') + seq.count('C')) / len(seq) * 100

def check_gc_content_level(seqfile, threshold_high=60.0, threshold_low=40.0):
    seq = readseq(seqfile)
    gc_content = get_gc_content(seq)
    if gc_content >= threshold_high:
        return f"The GC content of the sequence from {seqfile} is {gc_content:.2f}%, level is high"
    elif gc_content <= threshold_low:
        return f"The GC content of the sequence from {seqfile} is {gc_content:.2f}%, level is low"
    else:
        return f"The GC content of the sequence from {seqfile} is {gc_content:.2f}%, level is moderate"
            
# You can call it without providing the last two arguments
level_message = check_gc_content_level("../../downloads/one_dna_sequence.fa")
print(level_message)

# Providing only threshold_high as a keyword argument, threshold_low uses the default value
level_message = check_gc_content_level("../../downloads/one_dna_sequence.fa", threshold_high=50.0)
print(level_message)

# Providing only threshold_low as a keyword argument, threshold_high uses the default value
level_message = check_gc_content_level("../../downloads/one_dna_sequence.fa", threshold_low=55.0)
print(level_message)

# Providing both threshold_low and threshold_high as keyword arguments, but swap the order
level_message = check_gc_content_level("../../downloads/one_dna_sequence.fa", threshold_low=55, threshold_high=60)
print(level_message)

# Providing both threshold_high and threshold_low as positional arguments
level_message = check_gc_content_level("../../downloads/one_dna_sequence.fa", 53, 40)
print(level_message)

# Providing only threshold_high as a positional argument, threshold_low uses the default value
# non-recommeneded usage
level_message = check_gc_content_level("../../downloads/one_dna_sequence.fa", 53)
print(level_message)

# Providing threshold_high as a positional argument and threshold_low as a keyword argument
# This overrides the default value for threshold_low while using a positional argument for threshold_high
# non-recommeneded usage
level_message = check_gc_content_level("../../downloads/one_dna_sequence.fa", 53, threshold_low=40)
print(level_message)

The GC content of the sequence from ../../downloads/one_dna_sequence.fa is 53.53%, level is moderate
The GC content of the sequence from ../../downloads/one_dna_sequence.fa is 53.53%, level is high
The GC content of the sequence from ../../downloads/one_dna_sequence.fa is 53.53%, level is low
The GC content of the sequence from ../../downloads/one_dna_sequence.fa is 53.53%, level is low
The GC content of the sequence from ../../downloads/one_dna_sequence.fa is 53.53%, level is high
The GC content of the sequence from ../../downloads/one_dna_sequence.fa is 53.53%, level is high


In [19]:
# This call will raise a SyntaxError because keyword arguments must come after positional arguments
# Correct order: Positional arguments first, followed by keyword arguments (if any)
level_message = check_gc_content_level("../../downloads/one_dna_sequence.fa", threshold_low=40, 65)
print(level_message)

SyntaxError: positional argument follows keyword argument (3674942016.py, line 1)

In [20]:
# This call will raise a TypeError because the function definition expects threshold_high as the second argument.
# You cannot provide threshold_high after threshold_low as a keyword argument, as both need to be positional.
# Correct order: threshold_high followed by threshold_low.

level_message = check_gc_content_level("../../downloads/one_dna_sequence.fa", 40, threshold_high=65)
print(level_message)

TypeError: check_gc_content_level() got multiple values for argument 'threshold_high'

<hr style="border: 2px solid #000080;">

## 3. Extra exercise: Write any function you like and experiment with positional and keyword arguments (optional)

Free style

___