<a href="https://colab.research.google.com/github/Ash100/Python_for_Lifescience/blob/main/Chapter_4_Working_with_files.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Learn Python for Biological Data Analysis
## **Chapter 4:** Working with Files

This course is designed and taught by **Dr. Ashfaq Ahmad**. During teaching I will use all the examples from the Biological Sciences or Life Sciences.

## 📅 Course Outline

---

## 🏗️ Foundation (Weeks 1–2)

### 📘 Chapter 1: Getting Started with Python and Colab
- Introduction to Google Colab interface
- Basic Python syntax and data types
- Variables, strings, and basic operations
- Print statements and comments

### 📘 Chapter 2: Control Structures
- Conditional statements (`if`/`else`)
- Loops (`for` and `while`)
- Basic functions and scope

---

## 🧬 Data Handling (Weeks 3–4)

### 📘 Chapter 3: Data Structures for Biology
- Lists and tuples (storing sequences, experimental data)
- Dictionaries (gene annotations, species data)
- Sets (unique identifiers, sample collections)

### 📘 Chapter 4: Working with Files
- Reading and writing text files
- Handling CSV files (experimental data)
- Basic file operations for biological datasets

---

## 📊 Scientific Computing (Weeks 5–7)

### 📘 Chapter 5: NumPy for Numerical Data
- Arrays for storing experimental measurements
- Mathematical operations on datasets
- Statistical calculations (mean, median, standard deviation)

### 📘 Chapter 6: Pandas for Data Analysis
- DataFrames for structured biological data
- Data cleaning and manipulation
- Filtering and grouping experimental results
- Handling missing data

### 📘 Chapter 7: Data Visualization
- Matplotlib basics for scientific plots
- Creating publication-quality figures
- Specialized plots for biological data (histograms, scatter plots, box plots)

---

## 🔬 Biological Applications (Weeks 8–10)

### 📘 Chapter 8: Sequence Analysis
- String manipulation for DNA/RNA sequences
- Basic sequence operations (reverse complement, transcription)
- Reading FASTA files
- Simple sequence statistics

### 📘 Chapter 9: Statistical Analysis for Biology
- Hypothesis testing basics
- t-tests and chi-square tests
- Correlation analysis
- Introduction to `scipy.stats`

### 📘 Chapter 10: Practical Projects
- Analyzing gene expression data
- Population genetics calculations
- Ecological data analysis
- Creating reproducible research workflows

---

## 🚀 Advanced Topics *(Optional – Weeks 11–12)*

### 📘 Chapter 11: Bioinformatics Libraries
- Introduction to Biopython
- Working with biological databases
- Phylogenetic analysis basics

### 📘 Chapter 12: Best Practices
- Code organization and documentation
- Error handling
- Reproducible research practices
- Sharing code and results

---

## 🧠 Key Teaching Strategies

1. Start each chapter with biological context – explain why the programming concept matters for their field.
2. Use biological datasets throughout – gene sequences, experimental measurements, species data.
3. Include hands-on exercises after each concept.
4. Emphasize reproducibility – show how code documents their analysis process.
5. Build complexity gradually – start with simple examples, then real research scenarios.

---

✅ This progression moves from basic programming concepts to practical biological applications, ensuring students can immediately apply what they learn to their research and coursework.


## Learning Objectives
By the end of this chapter, you will be able to:

1. Read and write text files containing biological data
2. Handle CSV files with experimental data
3. Perform basic file operations for biological datasets
4. Apply file handling techniques to real biological scenarios

### **Section 1:** Reading and Writing Text Files

Text files are fundamental for storing and retrieving data. In this section, we'll learn how to read content from existing text files and write new content into them using Python.

### 1.1 Reading Text Files

Reading a file involves opening it, processing its content, and then closing it. Python's `open()` function is used to open files. When reading, we typically open a file in 'read mode' (`'r'`).

### Opening and Closing a File

It's crucial to close a file after you're done with it to free up system resources. The `with` statement is the recommended way to handle file operations because it automatically ensures the file is closed, even if errors occur.

**Example 1: Reading an entire file**

Let's create a dummy file first to demonstrate reading.




In [1]:
# Create a dummy file for demonstration
with open("sample.txt", "w") as file:
    file.write("Hello, this is line 1.\n")
    file.write("This is line 2.\n")
    file.write("And this is line 3.")

In [2]:
# Now, let's read the entire content of 'sample.txt'
with open("sample.txt", "r") as file:
    content = file.read()
    print("Content of sample.txt:")
    print(content)

Content of sample.txt:
Hello, this is line 1.
This is line 2.
And this is line 3.


### File Modes for Different Purposes

When working with files in Python, the `open()` function takes a `mode` argument that specifies how the file will be accessed. Understanding these modes is crucial for performing the desired operations without unintended side effects.

| Mode | Description                                                                                             | Behavior if File Exists                               | Behavior if File Does Not Exist                     |
| :--- | :------------------------------------------------------------------------------------------------------ | :---------------------------------------------------- | :-------------------------------------------------- |
| `'r'`  | **Read Only**: Opens a file for reading. The file pointer is placed at the beginning of the file.     | File content is preserved; reading starts from beginning. | Raises a `FileNotFoundError`.                       |
| `'w'`  | **Write**: Opens a file for writing. If the file exists, its content is truncated (erased).           | **Content is overwritten.** | A new, empty file is created.                       |
| `'a'`  | **Append**: Opens a file for appending. New data is written to the end of the file.                   | New content is added to the end of the existing content. | A new, empty file is created.                       |
| `'r+'` | **Read and Write**: Opens a file for both reading and writing. The file pointer is at the beginning.  | File content is preserved; can read and write from beginning. | Raises a `FileNotFoundError`.                       |
| `'x'`  | **Exclusive Creation**: Opens a file for exclusive creation. If the file already exists, the operation fails. | Raises a `FileExistsError`.                           | A new, empty file is created for writing.           |

In [3]:
# Create a sample DNA sequence file
sample_dna = """ATCGATCGATCGATCGGCTAGCTAGCT
AGCTATTAAGGCCTTAAGGCCCGATCGATCGATCGAT"""

# Write to a file
with open("sample_dna.txt", "w") as file:
    file.write(sample_dna)

print("Sample DNA file created!")

Sample DNA file created!


**Note.**<br>Triple quotes allow the string to span multiple lines without needing special characters like \n or string concatenation

In [4]:
# Read the entire file
with open("sample_dna.txt", "r") as file:
    content = file.read()
    print("File contents:")
    print(content)

File contents:
ATCGATCGATCGATCGGCTAGCTAGCT
AGCTATTAAGGCCTTAAGGCCCGATCGATCGATCGAT


In [5]:
# Read line by line
with open("sample_dna.txt", "r") as file:
    print("Reading line by line:")
    for line_number, line in enumerate(file, 1):
        print(f"Line {line_number}: {line.strip()}")

Reading line by line:
Line 1: ATCGATCGATCGATCGGCTAGCTAGCT
Line 2: AGCTATTAAGGCCTTAAGGCCCGATCGATCGATCGAT


**1.2 Processing FASTA Files**

In [6]:
# Create a sample FASTA file
fasta_content = """>seq1 Human hemoglobin alpha chain
MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG
KKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTP
AVHASLDKFLASVSTVLTSKYR

>seq2 Human hemoglobin beta chain
MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPK
VKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFG
KEFTPPVQAAYQKVVAGVANALAHKYH"""

with open("sample_sequences.fasta", "w") as file:
    file.write(fasta_content)

print("FASTA file created!")

FASTA file created!


Below we are going to Parse the **FASTA** file.
### **Why Parse FASTA?**

Often, we need to extract specific information from FASTA files, like:
* Just the sequence headers.
* Just the sequences themselves.
* Mapping headers to their corresponding sequences.

In [7]:
# Function to parse FASTA file
def parse_fasta(filename):
    """Parse a FASTA file and return sequences as a dictionary"""
    sequences = {}
    current_header = None
    current_sequence = []

    with open(filename, "r") as file:
        for line in file:
            line = line.strip()
            if line.startswith(">"):
                # Save previous sequence if exists
                if current_header:
                    sequences[current_header] = "".join(current_sequence)
                # Start new sequence
                current_header = line[1:]  # Remove '>'
                current_sequence = []
            else:
                current_sequence.append(line)

        # Save last sequence
        if current_header:
            sequences[current_header] = "".join(current_sequence)

    return sequences


# Parse the FASTA file
sequences = parse_fasta("sample_sequences.fasta")

# Display results
for header, sequence in sequences.items():
    print(f"Header: {header}")
    print(f"Sequence length: {len(sequence)}")
    print(f"First 50 characters: {sequence[:50]}...")
    print("-" * 50)

Header: seq1 Human hemoglobin alpha chain
Sequence length: 142
First 50 characters: MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLS...
--------------------------------------------------
Header: seq2 Human hemoglobin beta chain
Sequence length: 147
First 50 characters: MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLS...
--------------------------------------------------


**1.3 Writing Analysis Results**

In [8]:
# Analyze sequences and write results
def analyze_sequence(sequence):
    """Analyze a DNA/protein sequence"""
    analysis = {
        "length": len(sequence),
        "A_count": sequence.count("A"),
        "T_count": sequence.count("T"),
        "G_count": sequence.count("G"),
        "C_count": sequence.count("C"),
        "gc_content": (sequence.count("G") + sequence.count("C")) / len(sequence) * 100,
    }
    return analysis


# Analyze all sequences and write results
with open("sequence_analysis.txt", "w") as output_file:
    output_file.write("Sequence Analysis Results\n")
    output_file.write("=" * 50 + "\n\n")

    for header, sequence in sequences.items():
        analysis = analyze_sequence(sequence)
        output_file.write(f"Sequence: {header}\n")
        output_file.write(f"Length: {analysis['length']} amino acids\n")
        output_file.write("Amino acid composition:\n")
        output_file.write(f"  A: {analysis['A_count']}\n")
        output_file.write(f"  T: {analysis['T_count']}\n")
        output_file.write(f"  G: {analysis['G_count']}\n")
        output_file.write(f"  C: {analysis['C_count']}\n")
        output_file.write(f"GC Content: {analysis['gc_content']:.2f}%\n")
        output_file.write("-" * 30 + "\n")

print("Analysis complete! Results written to sequence_analysis.txt")

# Display the results
with open("sequence_analysis.txt", "r") as file:
    print(file.read())

Analysis complete! Results written to sequence_analysis.txt
Sequence Analysis Results

Sequence: seq1 Human hemoglobin alpha chain
Length: 142 amino acids
Amino acid composition:
  A: 21
  T: 9
  G: 7
  C: 1
GC Content: 5.63%
------------------------------
Sequence: seq2 Human hemoglobin beta chain
Length: 147 amino acids
Amino acid composition:
  A: 15
  T: 7
  G: 13
  C: 2
GC Content: 10.20%
------------------------------



### **Section 2.** Handling CSV Files (Experimental Data)
**2.1 Basic CSV Operations**

CSV stands for **Comma Separated Values**. It's a simple, plain text file format used to store tabular data (numbers and text) in a structured way. Each line in the file represents a data record, and each record consists of one or more fields, separated by commas.

In [9]:
# Create a dummy CSV file
csv_data = """Name,Age,City
Alice,30,New York
Bob,24,London
Charlie,35,Paris
David,28,"San Francisco"
"""
with open("people.csv", "w") as f:
    f.write(csv_data.strip())

print("Created 'people.csv' for demonstration.")

Created 'people.csv' for demonstration.


In [10]:
# Reading the above CSV file
import csv

print("\nReading 'people.csv' with csv.reader:")
with open("people.csv", "r", newline="") as file:
    csv_reader = csv.reader(file)
    header = next(csv_reader)  # Read the header row
    print(f"Header: {header}")

    for row in csv_reader:
        print(f"Row: {row}")
        # You can access elements by index, e.g., row[0] for Name, row[1] for Age
        # print(f"Name: {row[0]}, Age: {row[1]}, City: {row[2]}")


Reading 'people.csv' with csv.reader:
Header: ['Name', 'Age', 'City']
Row: ['Alice', '30', 'New York']
Row: ['Bob', '24', 'London']
Row: ['Charlie', '35', 'Paris']
Row: ['David', '28', 'San Francisco']


In [11]:
import csv

# Create sample experimental data
experimental_data = [
    ["Sample_ID", "Treatment", "Concentration", "Growth_Rate", "Viability"],
    ["S001", "Control", 0, 2.3, 98.5],
    ["S002", "Drug_A", 10, 1.8, 85.2],
    ["S003", "Drug_A", 50, 1.2, 72.1],
    ["S004", "Drug_A", 100, 0.8, 45.3],
    ["S005", "Drug_B", 10, 2.1, 92.1],
    ["S006", "Drug_B", 50, 1.5, 78.9],
    ["S007", "Drug_B", 100, 1.0, 55.7],
    ["S008", "Control", 0, 2.4, 97.8],
]

# Write to CSV file
with open("experimental_data.csv", "w", newline="") as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(experimental_data)

print("Experimental data CSV created!")

Experimental data CSV created!


In [12]:
# Read and display CSV data
with open("experimental_data.csv", "r") as csvfile:
    reader = csv.reader(csvfile)
    print("Experimental Data:")
    for row in reader:
        print("\t".join(row))

Experimental Data:
Sample_ID	Treatment	Concentration	Growth_Rate	Viability
S001	Control	0	2.3	98.5
S002	Drug_A	10	1.8	85.2
S003	Drug_A	50	1.2	72.1
S004	Drug_A	100	0.8	45.3
S005	Drug_B	10	2.1	92.1
S006	Drug_B	50	1.5	78.9
S007	Drug_B	100	1.0	55.7
S008	Control	0	2.4	97.8


**2.2 Working with CSV Using DictReader**

The `csv.DictReader` class is a powerful and highly recommended tool when you're dealing with CSV files, especially when your data has a clear header row. Instead of giving you lists of strings (where you have to remember index numbers like `row[0]`, `row[1]`), `DictReader` provides each row as a dictionary. This makes your code much more readable and less prone to errors if the column order changes.

### Why Use `DictReader`?

* **Readability:** Access data by column name (e.g., `row['Name']`) instead of by index (e.g., `row[0]`).
* **Maintainability:** Your code is more robust. If a column is reordered in the CSV file, your code won't break as long as the column name remains the same.
* **Convenience:** Directly provides data in a key-value pair format, which is often easier to work with.

### 1. Basic Usage of `csv.DictReader`

To use `DictReader`, you simply pass your opened file object to it. It automatically reads the first row as the header and uses those values as dictionary keys for subsequent rows.

Lets pass the **experimental_data** to it

In [13]:
# Read CSV as dictionaries
with open("experimental_data.csv", "r") as csvfile:
    reader = csv.DictReader(csvfile)

    print("Data as dictionaries:")
    for row in reader:
        print(f"Sample {row['Sample_ID']}: {row['Treatment']} treatment")
        print(f"  Concentration: {row['Concentration']} μM")
        print(f"  Growth Rate: {row['Growth_Rate']} /hr")
        print(f"  Viability: {row['Viability']}%")
        print("-" * 30)

Data as dictionaries:
Sample S001: Control treatment
  Concentration: 0 μM
  Growth Rate: 2.3 /hr
  Viability: 98.5%
------------------------------
Sample S002: Drug_A treatment
  Concentration: 10 μM
  Growth Rate: 1.8 /hr
  Viability: 85.2%
------------------------------
Sample S003: Drug_A treatment
  Concentration: 50 μM
  Growth Rate: 1.2 /hr
  Viability: 72.1%
------------------------------
Sample S004: Drug_A treatment
  Concentration: 100 μM
  Growth Rate: 0.8 /hr
  Viability: 45.3%
------------------------------
Sample S005: Drug_B treatment
  Concentration: 10 μM
  Growth Rate: 2.1 /hr
  Viability: 92.1%
------------------------------
Sample S006: Drug_B treatment
  Concentration: 50 μM
  Growth Rate: 1.5 /hr
  Viability: 78.9%
------------------------------
Sample S007: Drug_B treatment
  Concentration: 100 μM
  Growth Rate: 1.0 /hr
  Viability: 55.7%
------------------------------
Sample S008: Control treatment
  Concentration: 0 μM
  Growth Rate: 2.4 /hr
  Viability: 97.8%

**2.3 Data Analysis and Filtering** <br>
Data analysis involves inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data filtration is a key part of this, allowing us to select specific subsets of data that meet certain criteria.

### 1. Basic Filtration and Analysis (Using Python Built-ins)

Before diving into specialized libraries, it's good to understand how to perform basic operations using standard Python lists and dictionaries, especially when you've parsed CSVs using `csv.DictReader`.

In [14]:
# Ensure people.csv exists
csv_data = """Name,Age,City,Occupation,Salary
Alice,30,New York,Engineer,75000
Bob,24,London,Designer,60000
Charlie,35,Paris,Manager,90000
David,28,"San Francisco",Engineer,80000
Eve,22,Berlin,Analyst,55000
Frank,40,Tokyo,Manager,100000
Grace,29,Sydney,Designer,62000
"""
with open("people.csv", "w") as f:
    f.write(csv_data.strip())

print("Ensured 'people.csv' for demonstration.")

import csv

# Load the data into a list of dictionaries
people_data = []
with open("people.csv", "r", newline="") as file:
    csv_dict_reader = csv.DictReader(file)
    for row in csv_dict_reader:
        # Convert numeric values from string to appropriate types
        try:
            row["Age"] = int(row["Age"])
            row["Salary"] = int(row["Salary"])
        except ValueError:
            print(f"Warning: Could not convert numeric values in row: {row}")
            continue  # Skip row if conversion fails
        people_data.append(row)

print("\nLoaded data (first 3 records):")
for i, person in enumerate(people_data[:3]):
    print(person)
    if i == 2:
        break

Ensured 'people.csv' for demonstration.

Loaded data (first 3 records):
{'Name': 'Alice', 'Age': 30, 'City': 'New York', 'Occupation': 'Engineer', 'Salary': 75000}
{'Name': 'Bob', 'Age': 24, 'City': 'London', 'Occupation': 'Designer', 'Salary': 60000}
{'Name': 'Charlie', 'Age': 35, 'City': 'Paris', 'Occupation': 'Manager', 'Salary': 90000}


In [15]:
# Analyze experimental data
def analyze_experimental_data(filename):
    """Analyze experimental data from CSV file"""
    treatments = {}

    with open(filename, "r") as csvfile:
        reader = csv.DictReader(csvfile)

        for row in reader:
            treatment = row["Treatment"]
            concentration = float(row["Concentration"])
            growth_rate = float(row["Growth_Rate"])
            viability = float(row["Viability"])

            if treatment not in treatments:
                treatments[treatment] = {
                    "concentrations": [],
                    "growth_rates": [],
                    "viabilities": [],
                }

            treatments[treatment]["concentrations"].append(concentration)
            treatments[treatment]["growth_rates"].append(growth_rate)
            treatments[treatment]["viabilities"].append(viability)

    return treatments


# Analyze the data
results = analyze_experimental_data("experimental_data.csv")

# Calculate statistics
for treatment, data in results.items():
    print(f"\nTreatment: {treatment}")
    print(f"Number of samples: {len(data['growth_rates'])}")
    print(
        f"Average growth rate: {sum(data['growth_rates']) / len(data['growth_rates']):.2f} /hr"
    )
    print(
        f"Average viability: {sum(data['viabilities']) / len(data['viabilities']):.2f}%"
    )
    print(
        f"Concentration range: {min(data['concentrations'])}-{max(data['concentrations'])} μM"
    )


Treatment: Control
Number of samples: 2
Average growth rate: 2.35 /hr
Average viability: 98.15%
Concentration range: 0.0-0.0 μM

Treatment: Drug_A
Number of samples: 3
Average growth rate: 1.27 /hr
Average viability: 67.53%
Concentration range: 10.0-100.0 μM

Treatment: Drug_B
Number of samples: 3
Average growth rate: 1.53 /hr
Average viability: 75.57%
Concentration range: 10.0-100.0 μM


**2.4 Writing Processed Data**

In [16]:
# Process and write summary data
summary_data = []
summary_data.append(
    [
        "Treatment",
        "Sample_Count",
        "Avg_Growth_Rate",
        "Avg_Viability",
        "Max_Concentration",
    ]
)

for treatment, data in results.items():
    avg_growth = sum(data["growth_rates"]) / len(data["growth_rates"])
    avg_viability = sum(data["viabilities"]) / len(data["viabilities"])
    max_concentration = max(data["concentrations"])
    sample_count = len(data["growth_rates"])

    summary_data.append(
        [
            treatment,
            sample_count,
            f"{avg_growth:.2f}",
            f"{avg_viability:.2f}",
            max_concentration,
        ]
    )

# Write summary to new CSV
with open("treatment_summary.csv", "w", newline="") as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(summary_data)

print("Summary data written to treatment_summary.csv")

# Display summary
with open("treatment_summary.csv", "r") as csvfile:
    reader = csv.reader(csvfile)
    print("\nTreatment Summary:")
    for row in reader:
        print("\t".join(row))

Summary data written to treatment_summary.csv

Treatment Summary:
Treatment	Sample_Count	Avg_Growth_Rate	Avg_Viability	Max_Concentration
Control	2	2.35	98.15	0.0
Drug_A	3	1.27	67.53	100.0
Drug_B	3	1.53	75.57	100.0


### **Section 3.** Basic File Operations for Biological Datasets
Biological datasets often come in various file formats (e.g., FASTA, FASTQ, BAM, VCF, CSV, TSV) and can be quite large. Efficiently handling these files is crucial for bioinformatics workflows. This section covers fundamental file operations in Python, focusing on common tasks encountered with biological data.<br>
**3.1 File System Operations**

The `os` `glob` and `shutil` modules provide functions for interacting with the file system, such as checking file **existence**, **moving**, **copying**, and **deleting** files and directories.

In [17]:
import os
import glob

# Create a directory structure for biological data
directories = ["data", "data/sequences", "data/experiments", "results"]

for directory in directories:
    if not os.path.exists(directory):
        os.makedirs(directory)
        print(f"Created directory: {directory}")

# Move files to appropriate directories
import shutil

# Move sequence files
if os.path.exists("sample_sequences.fasta"):
    shutil.move("sample_sequences.fasta", "data/sequences/")
    print("Moved FASTA file to sequences directory")

if os.path.exists("sample_dna.txt"):
    shutil.move("sample_dna.txt", "data/sequences/")
    print("Moved DNA file to sequences directory")

# Move experimental data
if os.path.exists("experimental_data.csv"):
    shutil.move("experimental_data.csv", "data/experiments/")
    print("Moved experimental data to experiments directory")

# Move results
if os.path.exists("sequence_analysis.txt"):
    shutil.move("sequence_analysis.txt", "results/")
    print("Moved analysis results to results directory")

if os.path.exists("treatment_summary.csv"):
    shutil.move("treatment_summary.csv", "results/")
    print("Moved summary to results directory")

Error: Destination path 'data/sequences/sample_sequences.fasta' already exists

**3.2 Batch Processing Files**

In [None]:
# Create multiple sequence files for batch processing
sample_sequences = {
    "gene1.fasta": ">gene1\nATCGATCGATCGATCG\nGCTAGCTAGCTAGCTA",
    "gene2.fasta": ">gene2\nTTAAGGCCTTAAGGCC\nCGATCGATCGATCGAT",
    "gene3.fasta": ">gene3\nGGCCTTAAGGCCTTAA\nATCGATCGATCGATCG",
}

# Write sample files
for filename, content in sample_sequences.items():
    with open(f"data/sequences/{filename}", "w") as file:
        file.write(content)

print("Created sample sequence files for batch processing")

In [None]:
# Batch process all FASTA files
def batch_process_fasta_files(directory):
    """Process all FASTA files in a directory"""
    fasta_files = glob.glob(os.path.join(directory, "*.fasta"))

    results = {}

    for filepath in fasta_files:
        filename = os.path.basename(filepath)
        print(f"Processing: {filename}")

        sequences = parse_fasta(filepath)

        for header, sequence in sequences.items():
            analysis = analyze_sequence(sequence)
            results[filename] = {"header": header, "analysis": analysis}

    return results


# Process all FASTA files
batch_results = batch_process_fasta_files("data/sequences/")

# Write batch results
with open("results/batch_analysis.txt", "w") as output_file:
    output_file.write("Batch Analysis Results\n")
    output_file.write("=" * 50 + "\n\n")

    for filename, data in batch_results.items():
        output_file.write(f"File: {filename}\n")
        output_file.write(f"Sequence: {data['header']}\n")
        analysis = data["analysis"]
        output_file.write(f"Length: {analysis['length']} bp\n")
        output_file.write(f"GC Content: {analysis['gc_content']:.2f}%\n")
        output_file.write("-" * 30 + "\n")

print("Batch analysis complete!")

**3.3 Error Handling and File Validation**

In [None]:
def safe_file_reader(filename):
    """Safely read a file with error handling"""
    try:
        with open(filename, "r") as file:
            content = file.read()
            return content
    except FileNotFoundError:
        print(f"Error: File '{filename}' not found!")
        return None
    except PermissionError:
        print(f"Error: Permission denied for file '{filename}'!")
        return None
    except Exception as e:
        print(f"Error reading file '{filename}': {e}")
        return None


# Test error handling
print("Testing error handling:")
content = safe_file_reader("nonexistent_file.txt")
print(f"Content: {content}")

# Test with existing file
content = safe_file_reader("results/batch_analysis.txt")
if content:
    print(f"Successfully read file. Length: {len(content)} characters")

**3.4 File Information and Statistics**

In [None]:
def get_file_info(filepath):
    """Get information about a file"""
    try:
        stats = os.stat(filepath)
        return {"size": stats.st_size, "modified": stats.st_mtime, "exists": True}
    except:
        return {"exists": False}


# Get information about all files in results directory
results_files = glob.glob("results/*")

print("File Information:")
print("=" * 50)
for filepath in results_files:
    info = get_file_info(filepath)
    if info["exists"]:
        filename = os.path.basename(filepath)
        print(f"File: {filename}")
        print(f"  Size: {info['size']} bytes")
        print(f"  Modified: {info['modified']}")
        print("-" * 30)

### 4. Practical Exercises
**Exercise 1: Gene Expression Data Processing**

In [None]:
# Create sample gene expression data
gene_expression_data = [
    ["Gene_ID", "Gene_Name", "Control_1", "Control_2", "Treatment_1", "Treatment_2"],
    ["G001", "GAPDH", 1000, 1050, 980, 1020],
    ["G002", "ACTB", 800, 820, 750, 780],
    ["G003", "TP53", 200, 180, 450, 420],
    ["G004", "BRCA1", 150, 160, 300, 280],
    ["G005", "MYC", 300, 280, 150, 170],
]

with open("data/experiments/gene_expression.csv", "w", newline="") as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(gene_expression_data)

print("Gene expression data created!")

**Task:** Write a function to calculate fold change for each gene and identify significantly changed genes.

**Exercise 2: Quality Control for Sequence Data**

In [None]:
# Create sample sequence with quality issues
problematic_sequences = """>seq_with_N
ATCGATCGATCNNNGATCGATCG
>short_seq
ATCG
>seq_with_gaps
ATCGATCG---ATCGATCG
>normal_seq
ATCGATCGATCGATCGATCG"""

with open("data/sequences/quality_check.fasta", "w") as file:
    file.write(problematic_sequences)

print("Quality check sequences created!")

**Task:** Write a quality control function that identifies and reports sequences with issues.

**Summary**<br>
In this chapter, you learned:

**1. Text File Operations:** How to read and write text files, including biological data formats like FASTA<br>
**2. CSV File Handling:** Working with experimental data in CSV format using Python's csv module<br>
**3. File System Operations:** Organizing biological datasets with directories and batch processing<br>
**4. Error Handling:** Implementing robust file operations with proper error handling<br>
**5. Best Practices:** Following Python conventions for file handling in biological data analysis<br>

These skills form the foundation for working with biological datasets and will be essential for more advanced data analysis tasks.