Chapter 5: Code Modularizing
--------------------------------

Python offers several tools for code modularization:
- Functions – the basic reusable blocks.
- Modules
- Packages
- Classes – covered later.

## Functions

A function is a named block of code that performs a task.
It may take inputs (parameters) and may return outputs.

In [None]:
def function_name(argument1, argument2, ...):
    """ Optional Function description (Docstring) """
    # FUNCTION CODE
    return result


In [3]:
def greet(name):
    return f"Hello, {name}!"

In [4]:
greet('parisa')

'Hello, parisa!'

- 🔹 A Biology Example: Protein Net Charge

In [5]:
def protcharge(aa_seq):
    """Returns the net charge of a protein sequence"""
    protseq = aa_seq.upper()
    charge = -0.002
    aa_charge = {'C': -0.045, 'D': -0.999, 'E': -0.998, 'H': 0.091,
                 'K': 1, 'R': 1, 'Y': -0.001}
    for aa in protseq:
        charge += aa_charge.get(aa, 0)
    return charge


In [6]:
protcharge('EEARGPLRGKGDQKSAVSQKPRSRGILH')

4.094

In [7]:
protcharge() #TypeError

TypeError: protcharge() missing 1 required positional argument: 'aa_seq'

- 🔹 Protein Net Charge Returning Multiple Values

In [8]:
def charge_and_prop(aa_seq):
    """Returns net charge and proportion of charged amino acids"""
    protseq = aa_seq.upper()
    charge = -0.002
    cp = 0
    aa_charge = {'C': -0.045, 'D': -0.999, 'E': -0.998, 'H': 0.091,
                 'K': 1, 'R': 1, 'Y': -0.001}
    for aa in protseq:
        charge += aa_charge.get(aa, 0)
        if aa in aa_charge:
            cp += 1
    prop = 100.0 * cp / len(aa_seq)
    return (charge, prop)


In [9]:
charge_and_prop('EEARGPLRGKGDQKSAVSQKPRSRGILH')

(4.094, 39.285714285714285)

In [10]:
charge_and_prop('EEARGPLRGKGDQKSAVSQKPRSRGILH')[1]

39.285714285714285

- 🔹 - Converts a list into a text file Functions That Do (Not Return) Something

In [11]:
def save_list(input_list, file_name):
    """Saves each item of input_list to a new line in file_name"""
    with open(file_name, 'w') as fh:
        print(*input_list, sep='\n', file=fh)
    return None  # Optional

In [12]:
# Define the list
my_genes = ['BRCA1', 'TP53', 'EGFR', 'MYC']
# Save the list to a file named "genes.txt"
save_list(my_genes, 'genes.txt')

In [13]:
#genes.txt :
#BRCA1
#TP53
#EGFR
#MYC

- Instead of replacing, it appends to the end of the file (meaning each time you run the function, new items are added to the end of the file):

If a file with the same name already exists, its contents are deleted and replaced with the new list.
If you want to append to the file (not replace), you must use the 'a' mode instead of 'w' in open.


In [15]:
def append_list(input_list, file_name):
    """Appends each item of input_list to a new line at the end of file_name"""
    with open(file_name, 'a') as fh:  # 'a' means append mode
        print(*input_list, sep='\n', file=fh)
    return None  # Optional


In [16]:
# First list to write
my_first_list = ['GeneA', 'GeneB']
append_list(my_first_list, 'genes.txt')

# Another list to add later
my_second_list = ['GeneC', 'GeneD']
append_list(my_second_list, 'genes.txt')

In [17]:
#genes.txt :
#BRCA1
#TP53
#EGFR
#MYC
#GeneA
#GeneB
#GeneC
#GeneD

- Function Scope

Variables declared inside a function exist only within that function.
This concept is called "scope".
If you try to access such a variable from outside, Python will raise a NameError.


In [18]:
def duplicate(x):
    y = 1  # Local variable
    print(f'y = {y}')
    return 2 * x

duplicate(5)
# y = 1
# 10

print(y)
# NameError: y is not defined


y = 1


NameError: name 'y' is not defined

- 🔹 Scope Lookup Order
- If a variable is not found inside a function, Python searches for it outside — up to the global scope.

In [20]:
y = 3

def duplicate(x):
    print(f'y = {y}')  # uses global y
    return 2 * x

duplicate(5)
# y = 3
# 10


y = 3


10

But if y is defined both inside and outside, Python uses the one inside the function:


In [21]:
y = 3

def duplicate(x):
    y = 1  # local y takes priority
    print(f'y = {y}')
    return 2 * x

duplicate(5)
# y = 1
# 10


y = 1


10

### 🔹 ⚠️ Global Variables (Use with Caution)


You can declare a variable as global inside a function using the global keyword.
But be careful — they can be modified unexpectedly and are less memory-efficient.


In [22]:
def test(x):
    global z
    z = 10
    print(f'z = {z}')
    return x * 2

z = 1
test(4)
# z = 10
# 8

print(z)
# 10


z = 10
10


### Placement of Arguments
Until now, arguments have been passed in the same order they were defined.

In [21]:
save_list([1, 2, 3], 'list.txt')

In [22]:
save_list('list.txt', [1, 2, 3])
# TypeError: invalid file: [1, 2, 3]

TypeError: unhashable type: 'list'

✅ To use aThis happens because the function expects a list first, and then a string.
rguments in any order, you must use keyword arguments:


In [23]:
save_list(file_name='list.txt', input_list=[1, 2, 3])

- ⚙️ Arguments with Default Values
- Functions can define default values for parameters:

In [None]:
def function_name(arg1=default_value, arg2=default_value):
    ...


In [24]:
def save_list(input_list, file_name='temp.txt'):
    """A list (input_list) is saved in a file (file_name)"""
    with open(file_name, 'w') as fh:
        for item in input_list:
            fh.write(f'{item}\n')  # Python 3.6+ formatted string
    return None


In [25]:
 save_list(['MS233','MS772','MS120','MS93','MS912'])

- Variable Number of Arguments : You can define functions that accept any number of arguments using *:

In [26]:
def average(*numbers):
    if len(numbers) == 0:
        return None
    else:
        total = sum(numbers)
        return total / len(numbers)


In [27]:
print(average(2, 3, 4, 3, 2))       # Output: 2.8
average(2, 3, 4, 3, 2, 8)    # Output: 3.666...


2.8


3.6666666666666665

- Converts a list into a text file, using print and *
- 🔹 Using * to Unpack Lists
From Python 3 onward, * can also be used to unpack list elements:
This version is cleaner and avoids using a for loop. It prints each item in the list to the file, each on its own line.

In [30]:
def save_list(input_list, file_name='temp.txt'):
    """A list (input_list) is saved to a file (file_name)"""
    with open(file_name, 'w') as fh:
        print(*input_list, sep='\n', file=fh)
    return None


In [31]:
save_list(['apple', 'banana', 'cherry'], 'fruits.txt')

In [32]:
with open('fruits.txt', 'r') as f:
    print(f.read())

apple
banana
cherry



### 🔹 Undetermined Number of Keyword Arguments
Functions can accept any number of keyword arguments using a double asterisk **. These arguments are collected into a dictionary:

In [5]:
def commandline(name, **parameters):
    line = ''
    for item in parameters:
        line += f' -{item} {parameters[item]}'
    return name + line


In [6]:
commandline('formatdb', t='Caseins', i='indata.fas')

'formatdb -t Caseins -i indata.fas'

In [7]:
 commandline('formatdb', t='Caseins', i='indata.fas', p='F')

'formatdb -t Caseins -i indata.fas -p F'

### docstring
A docstring is a special string that comes immediately after a function definition. It is used for:
- Online help (via help())
- Automatic documentation
- Making the source code more understandable
https://peps.python.org/pep-0257/

In [36]:
def is_prime(n):
    """Returns True if n is a prime number, otherwise False."""


### Generators
Generators are special functions that retain their internal state. Unlike normal functions, they do not lose variable values after execution.
- They use the yield keyword instead of return.
- Useful for working with large data, like huge files, where storing everything in memory is inefficient.


In [14]:
def is_prime(n):
    """Returns True if n is a prime number, otherwise False."""
    if n < 2:
        return False
    for i in range(2, int(n ** 0.5) + 1):  # Optimized for modern use
        if n % i == 0:
            return False
    return True

def all_primes(n):
    primes = []
    for number in range(1, n):
        if is_prime(number):
            primes.append(number)
    return primes


In [15]:
primes = all_primes(100)
print(primes)

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]


🔸 The generator doesn’t build a list, so it saves memory and can be used in loops directly.

In [17]:
def g_all_primes(n):
    """Yields prime numbers up to n one by one."""
    for number in range(1, n):
        if is_prime(number):
            yield number


This function generates a list of prime numbers and stores them in memory:
✅ Because the function returns all the output as a complete list, you can view, read, or save it all at once.

In [18]:
g_primes = g_all_primes(20)
print(g_primes)

<generator object g_all_primes at 0x00000266147E6CF0>


In [19]:
for p in g_all_primes(20):
    print(p)

2
3
5
7
11
13
17
19


In [20]:
g_primes = g_all_primes(100)
print(next(g_primes))

2


In [21]:
print(next(g_primes))

3


In [39]:
primes = all_primes(10)
print(primes)

[2, 3, 5, 7]


In [22]:
import sys

print(sys.getsizeof(all_primes(100000)))     # Memory used by the full list
print(sys.getsizeof(g_all_primes(100000)))   # Memory used by generator

77840
112


- If you want to use a generator, you need to output it using for or the next() function.

📌 Method 1: With the for

✅ Outputs are produced one by one and at the moment of need, not all at once!

In [40]:
for p in g_all_primes(10):
    print(p)

2
3
5
7


📌 method2: Using next() (to see better control)

If you use next() more than the number of outputs, you will get a StopIteration error because the generator has run out.

In [42]:
gen = g_all_primes(10)
print(next(gen))  # 2
print(next(gen))  # 3
print(next(gen))  # 5
print(next(gen))  # 7
print(next(gen))  # error: StopIteration


2
3
5
7


StopIteration: 

In [43]:
import sys

print(sys.getsizeof(all_primes(10000)))     # Memory used by the full list
print(sys.getsizeof(g_all_primes(10000)))   # Memory used by generator

10192
112


✅ The generator only holds the function definition, not the data!

### fun:
Function: "Make all the tea at once"

In [6]:
def make_tea():
    return ["Boil water", "Add tea leaves", "Steep", "Serve", "Drink"]

In [7]:
tea = make_tea()
print(tea)

['Boil water', 'Add tea leaves', 'Steep', 'Serve', 'Drink']


In [2]:
for step in make_tea():
    print(step)

Boil water
Add tea leaves
Steep
Serve
Drink


- Generator: "One sip at a time..."
- 📌 You don't do everything at once. It’s like being lazy... but in an organized way!

In [3]:
def make_tea_step_by_step():
    yield "Boil water"
    yield "Add tea leaves"
    yield "Steep"
    yield "Serve"
    yield "Drink"

In [5]:
tea = make_tea_step_by_step()
print(tea)

<generator object make_tea_step_by_step at 0x000001690726BF90>


In [4]:
tea = make_tea_step_by_step()
print(next(tea))  # Boil water
print(next(tea))  # Add tea leaves

Boil water
Add tea leaves


 ### Modules and Packages
A module is a Python file that contains functions, constants, or objects that can be reused.
Modules provide namespaces, allowing functions with the same name to exist in different modules.
For example, if the file name is my_module.py, the module name will be my_module.

- ✅ import module_name
- ✅ from module_name import function_name
- ❌ from module_name import * – Avoid unless necessary

In [8]:
import os  # Importing the built-in 'os' module
print(os.getcwd())  # Returns current working directory
print(os.sep)  # Shows path separator used by OS


C:\Users\paris\bio4py
\


In [9]:
from os import getcwd  # Importing only getcwd function
print(getcwd())  # Now we can use it without the 'os.' prefix

C:\Users\paris\bio4py


In [10]:
from os import *  # Importing everything from os (NOT RECOMMENDED)
print(getcwd())
print(sep)


C:\Users\paris\bio4py
\


- 🔹 Aliasing a Module

In [12]:
#import xml.etree.ElementTree as ET  # Aliasing a module
#tree = ET.parse('example.xml')  # Using the alias


### Packages
📦 A package is a directory that contains a special file __init__.py and one or more modules or sub-packages.

In [None]:
Bio/                     # Top-level package
    __init__.py
    Align/               # Subpackage
        __init__.py
        AlignInfo.py
    Alphabet/
        __init__.py
        IUPAC.py
    Blast/
        __init__.py
        Applications.py
        NCBIStandalone.py


### 🧩 Installing Third-Party Modules
Python includes many built-in modules, but third-party modules enhance its capabilities.
✅ Use pip to install:

In [None]:
#pip install numpy  # Example

In [None]:
# in terminal:
#sudo apt install python3-pip  # Install pip for Python 3 on Ubuntu

In [None]:
# in jupyter notebook:
# !pip --version  # Check if pip is installed
# !pip install --upgrade pip setuptools

In [None]:
#in Linux/macOS:
#pip3 install -U pip setuptools  # Upgrade pip and setuptools

In [None]:
#in windows:
#python -m pip install -U pip setuptools

In [None]:
#pip install xlrd  # Install Excel reading library

#### Copying to PYTHONPATH (Simple but Rarely Used)
Python searches for modules in:
- The current script’s directory
- Python’s installation directory
- Directories listed in sys.path or PYTHONPATH

In [None]:
import sys
print(sys.path)  # Show current search paths
sys.path.append('/home/user/MyModules')  # Add custom path

### 🔹 Virtual Environments: Isolated Python Workspaces
virtualenv (or venv in Python ≥ 3.6) lets you create isolated environments for Python projects. 
Each environment has its own dependencies, avoiding version conflicts.

In [None]:
#!pip install virtualenv  # Install virtualenv globally

🔧 Create a Virtual Environment:
  Creates a folder named myenv/ with a clean Python environment.


In [8]:
#python3 -m venv myenv

Activate the Environment : You’ll see this in your terminal:

(myenv) $


In [None]:
#in linux / Macos:
#source myenv/bin/activate

In [None]:
#in windows:
#myenv\Scripts\activate.bat  # Command Prompt  
#myenv\Scripts\Activate.ps1  # PowerShell

📦 Installing Packages Inside venv (for example pandas)

In [None]:
#pip install pandas

❌ Deactivate the Environment


In [None]:
#deactivate

🧹 Remove the Environment : Just delete the folder:

In [None]:
#rm -r myenv

### 🐍 What is Conda?
Conda is a package and environment management tool used with Anaconda Python distribution.
It allows you to create isolated environments with specific packages.

In [None]:
#📦 Creating a New Environment
conda create -n bioinfo

#To create and install a package at the same time:
conda create -n excelprocessing xlrd

# 🔁 Activating / Deactivating Environments
# Activate
conda activate bioinfo
# Deactivate
conda deactivate

# 📦 Installing Packages in a Conda Environment
# Preferred way:
conda install pillow
# If not available in Conda, fallback to pip:
pip install beautifulsoup4

# 📜 Checking All Environments
conda info --envs
#The active environment will be marked with *.

### Creating Modules
To create a Python module, save your function(s) in a .py file. Make sure the file is located in a directory included in PYTHONPATH.
Example: Saving the save_list function into a module named utils.py:

In [14]:
# utils.py
def save_list(input_list, file_name='temp.txt'):
    """A list (input_list) is saved to a file (file_name)"""
    with open(file_name, 'w') as fh:
        print(*input_list, sep='\n', file=fh)
    return None


Usage from another script:

In [None]:
>>> import utils
>>> utils.save_list([1, 2, 3])

# How to create a virtual environment and use it in Jupyter Notebook

In [None]:
# 1 - Install virtualenv if you don’t have it: CMD  - terminal:
pip install virtualenv

In [None]:
# 2 - Create a virtual environment: (treminal)
virtualenv myenv

In [None]:
#3- Activate the environment: (terminal)
#On Windows:
myenv\Scripts\activate
# ❌ If you see errors when using `myenv\Scripts\activate` on Windows, 
# scroll down to the cells below with this topic for guidance.


#On Mac/Linux:
source myenv/bin/activate


In [None]:
#4- Install Jupyter and add this environment as a kernel: (terminal)
pip install ipykernel
python -m ipykernel install --user --name=myenv --display-name "Python (myenv)"

In [None]:
#5- Open Jupyter Notebook, go to Kernel > Change Kernel, and select “Python (myenv)”

### ✅ Steps to completely clean the virtual environment:

In [None]:
#in windows:
Remove-Item -Recurse -Force .\myenv
#in linux / mac:
rm -rf myenv


### ✅ How to install the package inside the virtual environment:

In [None]:
#in terminal
#windows :
.\testenv\Scripts\Activate.bat
#linux:
source testenv/bin/activate
pip install numpy



In [None]:
#or in jupyter notebook:
!pip install seaborn

# How to create a module and package in Jupyter Notebook or virtual environment

### Module:

In [None]:
#1- Create a file, e.g., mymodule.py:
# mymodule.py
def greet(name):
    """Greet a person by name."""
    return f"Hello, {name}!"


In [None]:
#2- Use it in a notebook:
import mymodule

mymodule.greet("Ali")


#If it's in a different directory:

import sys
sys.path.append('/path/to/your/module')



### Package:

In [None]:
#1- Create a folder named mypackage/

#2- Add a file __init__.py (can be empty)

#3- Add your modules like greetings.py:

In [None]:
# greetings.py
def say_hello(name):
    return f"Hello, {name}!"


In [None]:
#Folder structure:
mypackage/
│
├── __init__.py
├── greetings.py


In [None]:
#Then import in notebook:

from mypackage import greetings

greetings.say_hello("Sara")



# 🎯 1. Create a virtual environment in VS Code

In [None]:
#1- Open a project folder in VS Code

#2- Open terminal (Ctrl + ~)

#3- Create virtual environment:
python -m venv myenv

#4- Activate it:
#Windows: 
myenv\Scripts\activate
#❌ if myenv\Scripts\activate on windows OS: read next cell (part)
#macOS/Linux: 
source myenv/bin/activate

#5- Select interpreter in VS Code (bottom-left) to match the new venv


❌ if myenv\Scripts\activate on windows OS: 
- Error :cannot be this system. For more information, see
about_Execution_Policies

✅ Fix it :
-  Step 1: Run PowerShell as Administrator
Close VSCode if it's open.
Then:
- Open the Start menu
- Search for PowerShell
- Right-click on it and choose "Run as Administrator"

Step 2: Change the Execution Policy
- In the PowerShell window that opens, type the following command:

In [None]:
Set-ExecutionPolicy RemoteSigned

When it asks for confirmation, press Y and then Enter.

🔒 What this does: It allows you to run scripts created on your local machine (like activate.ps1). Scripts from the internet will still require a digital signature.

Step 3: Close PowerShell and Open VSCode
- Now open VSCode again. In the terminal inside VSCode, activate your virtual environment by running:

In [None]:
#4- Activate it:
#Windows: 
myenv\Scripts\activate

#5- Select interpreter in VS Code (bottom-left) to match the new venv


# 🎯 2. Creating modules and packages in VS Code

### 🔹 Creating a module

In [None]:
#1- Create a Python file inside the project folder, for example:
utils.py
#2- Write your functions there:
# utils.py
def greet(name):
    """Prints a greeting message."""
    print(f"Hello, {name}!")
#3- Use it in another file:
import utils
utils.greet("Sara")


## 🔹 Package creation

In [None]:
#1- Create a folder, for example:
mypackage/
├── __init__.py
└── tools.py

#2- Write a function in tools.py:
# mypackage/tools.py
def add(a, b):
    return a + b

#3- Import into the main file:
from mypackage import tools
print(tools.add(2, 3))



### Testing Modules
Good programming practice includes writing tests to ensure your code works correctly.
To prevent tests from running when your module is imported (but still run them when the file is executed directly), use this block:

In [None]:
if __name__ == '__main__':
    # Do something (e.g., run tests)


- ✅ Using doctest for Automatic Testing
- Python's doctest module can extract test cases from a function's docstring and execute them as if entered in the interpreter.

Here is an updated version of is_prime() with embedded doctests:

In [11]:
# prime5.py
def is_prime(n):
    """Check if n is a prime number.

    >>> is_prime(0)
    False
    >>> is_prime(1)
    False
    >>> is_prime(2)
    True
    >>> is_prime(3)
    True
    >>> is_prime(4)
    False
    >>> is_prime(5)
    True
    """
    if n <= 1:
        return False
    for x in range(2, int(n ** 0.5) + 1):
        if n % x == 0:
            return False
    return True

def _test():
    import doctest
    doctest.testmod()

if __name__ == '__main__':
    _test()


In [None]:
#Run the test:

$ python prime5.py

### Fun & Practical Example

Imagine you're throwing a party.
- You’re the main program, but you can’t do everything alone!
- You call your friend DJ Reza to play music → that’s a music module 🎵
- You ask your mom to bake a cake → that’s a baking module 🍰
- You get your little sister to keep an eye on the guests → that's your security module 😂
- Now, if you organize all these helpers into a folder called PartyHelpers, that’s your package! 🎁
So:
- A module is like one helper (a .py file that does one job)
- A package is a group of modules working together toward one goal (like having an awesome party)

### Theoretical Questions:

1. What is a function?

2. How many values can a function return?

3. Can a function be called without any parameters?

4. What is a docstring and why is it related to functions and modules?

5. Does every function need to know in advance how many parameters it will receive?

6. Why must all optional arguments in a function be placed at the end in the function call?

7. What is a module?

8. Why are modules invoked at the beginning of the program?

9. How do you import all contents of a module? Is this procedure advisable?

10. How can you test if your code is being executed as a standalone program or called as a module?

11. What is virtualenv and when would you use it?

12. Why is modularization important in analyzing biological data?

13. How can using modules improve the reproducibility of a bioinformatics workflow?

### Code-Related Questions:

1. Write a generator function.

2. Write a function that accepts a DNA sequence and returns its GC content (as a percentage).

3. Create a module named bio_utils.py that includes a function to transcribe a DNA sequence to RNA.

4. Use the __name__ == "__main__" statement to run a test only if the module is executed directly.

5. Write a function that can accept any number of gene expression values and returns their mean.

6. Write a script that imports your bio_utils.py module and uses one of its functions.

7. Create a function with optional parameters, such as:

In [None]:
def find_motif(seq, motif="ATG"):
    ...


Then call it both with and without the motif argument.

8. Define a module named sequence_tools that contains the following:
- A function to calculate GC content
- A function to count the number of codons in a DNA sequence
- Then analyze a DNA sequence using this module.

9. Write a generator function that yields codons (triplets of nucleotides) from a DNA sequence:

In [None]:
def codon_generator(sequence):
    ...
