(6)=
# Chapter 6: File Input/Output

In real-world applications, you rarely type all your data directly into code. Instead, you:

- **Read data** from files (sensor logs, experimental results, configuration files)
- **Write results** to files (analysis outputs, reports, processed data)
- **Save and load** data between sessions

**File I/O** (Input/Output) is essential for:
- Loading experimental data
- Saving calculation results
- Reading configuration parameters
- Creating data analysis pipelines
- Sharing data with other programs

In this chapter, we'll cover:
1. Reading files 
2. Writing files  
3. File paths and directories

(6.1)=
## 6.1 Reading Text Files

Python provides built-in functions to read text files. The basic process is:

1. **Open** the file
2. **Read** the contents
3. **Close** the file

**Basic syntax:**
```python
file = open('filename.txt', 'r')  # 'r' means read mode
content = file.read()
file.close()
```

**File modes:**
- `'r'` - Read (default)
- `'w'` - Write (overwrites existing file)
- `'a'` - Append (adds to end of file)
- `'r+'` - Read and write

In [1]:
# First, let's create a sample data file to work with

# Create sample temperature data
sample_data = """Time,Temperature
0,25.0
1,27.5
2,30.2
3,32.8
4,35.1
"""

# Write this to a file
with open('temperature_data.txt', 'w') as f:
    f.write(sample_data)

print("Sample data file created: temperature_data.txt")

Sample data file created: temperature_data.txt


In [2]:
# Basic File Reading - Method 1: Read entire file

print("=== Reading Entire File ===")

# Open file in read mode
file = open('temperature_data.txt', 'r')

# Read entire contents
content = file.read()

# Close the file (important!)
file.close()

print(content)
print(f"\nType: {type(content)}")
print(f"Length: {len(content)} characters")

=== Reading Entire File ===
Time,Temperature
0,25.0
1,27.5
2,30.2
3,32.8
4,35.1


Type: <class 'str'>
Length: 52 characters


In [3]:
# Basic File Reading - Method 2: Read line by line

print("=== Reading Line by Line ===")

file = open('temperature_data.txt', 'r')

# Read lines into a list
lines = file.readlines()

file.close()

print(f"results: {lines}\n")

print(f"Number of lines: {len(lines)}\n")

for i, line in enumerate(lines):
    print(f"Line {i}: {repr(line)}")

=== Reading Line by Line ===
results: ['Time,Temperature\n', '0,25.0\n', '1,27.5\n', '2,30.2\n', '3,32.8\n', '4,35.1\n']

Number of lines: 6

Line 0: 'Time,Temperature\n'
Line 1: '0,25.0\n'
Line 2: '1,27.5\n'
Line 3: '2,30.2\n'
Line 4: '3,32.8\n'
Line 5: '4,35.1\n'


(6.1.1)=
### 6.1.1 Using Context Managers (Best Practice)

**Context managers** automatically close files, even if errors occur. This is the recommended way to work with files!

**Syntax:**
```python
with open('filename.txt', 'r') as file:
    content = file.read()
    # File automatically closes here
```

**Advantages:**
- Automatically closes the file
- Prevents resource leaks
- Cleaner code
- Handles errors gracefully

In [4]:
# Using Context Manager (Recommended!)

print("=== Reading with Context Manager ===")

# File opens and automatically closes
with open('temperature_data.txt', 'r') as file:
    content = file.read()
    print(content)

# File is already closed here
print("\nFile has been automatically closed")

=== Reading with Context Manager ===
Time,Temperature
0,25.0
1,27.5
2,30.2
3,32.8
4,35.1


File has been automatically closed


In [5]:
# Processing File Line by Line

print("=== Processing Data Line by Line ===")

with open('temperature_data.txt', 'r') as file:
    # Skip header line
    header = file.readline()
    print(f"Header: {header.strip()}\n") 
    # .strip() removes extra whitespace/newline
    
    print("Data:")
    for line in file:
        # Remove whitespace and split by comma
        line = line.strip()
        if line:  # Skip empty lines
            time, temp = line.split(',')
            print(f"  Time: {time} min, Temperature: {temp}°C")

=== Processing Data Line by Line ===
Header: Time,Temperature

Data:
  Time: 0 min, Temperature: 25.0°C
  Time: 1 min, Temperature: 27.5°C
  Time: 2 min, Temperature: 30.2°C
  Time: 3 min, Temperature: 32.8°C
  Time: 4 min, Temperature: 35.1°C


(6.2)=
## 6.2 Writing Text Files

Writing files is just as important as reading them. You can:

- Save calculation results
- Create data logs
- Generate reports
- Export data for other programs

**Write modes:**
- `'w'` - Write (creates new file or **overwrites** existing)
- `'a'` - Append (adds to end of existing file)
- `'x'` - Exclusive creation (fails if file exists)

In [6]:
# Writing to a File

print("=== Writing Data to File ===")

# Data to write
reactor_data = [
    ("R-101", 85.0, 2.5),
    ("R-102", 92.0, 2.8),
    ("R-103", 78.5, 2.3),
]

# Write to file
with open('reactor_report.txt', 'w') as file:
    # Write header
    file.write("Reactor Status Report\n")
    file.write("=" * 40 + "\n\n")
    
    # Write data
    for reactor_id, temp, pressure in reactor_data:
        line = f"{reactor_id}: Temp={temp}°C, Pressure={pressure} bar\n"
        file.write(line)

print("Report written to reactor_report.txt")

# Read it back to verify
print("\nFile contents:")
with open('reactor_report.txt', 'r') as file:
    print(file.read())

=== Writing Data to File ===
Report written to reactor_report.txt

File contents:
Reactor Status Report

R-101: Temp=85.0°C, Pressure=2.5 bar
R-102: Temp=92.0°C, Pressure=2.8 bar
R-103: Temp=78.5°C, Pressure=2.3 bar



In [7]:
# Appending to a File

print("=== Appending to Existing File ===")

# Add more data
new_data = ("R-104", 88.5, 2.6)

with open('reactor_report.txt', 'a') as file:
    reactor_id, temp, pressure = new_data
    line = f"{reactor_id}: Temp={temp}°C, Pressure={pressure} bar\n"
    file.write(line)

print("Data appended to reactor_report.txt")

# Read updated file
print("\nUpdated file contents:")
with open('reactor_report.txt', 'r') as file:
    print(file.read())

=== Appending to Existing File ===
Data appended to reactor_report.txt

Updated file contents:
Reactor Status Report

R-101: Temp=85.0°C, Pressure=2.5 bar
R-102: Temp=92.0°C, Pressure=2.8 bar
R-103: Temp=78.5°C, Pressure=2.3 bar
R-104: Temp=88.5°C, Pressure=2.6 bar



(6.3)=
## 6.3 Working with CSV Files

**CSV (Comma-Separated Values)** files are one of the most common data formats in science and engineering.

**Why CSV?**
- Simple, universal format
- Works with Excel, Google Sheets, MATLAB, etc.
- Human-readable
- Easy to parse

Python has a built-in `csv` module for working with CSV files.

In addition, **other Python packages can also be used to read and write CSV files**, depending on your needs:

- **`pandas`** → best for data analysis, tables, and large datasets
- **`numpy`** → useful for numerical data stored in CSV format
- **`openpyxl` / `xlsxwriter`** → when working with Excel files that include CSV-like data

For simple tasks, the built-in `csv` module is often sufficient.  
For more complex data processing and analysis, **`pandas` is commonly preferred**.

In [8]:
import csv

# Writing CSV Files

print("=== Writing CSV File ===")

# Sample experimental data
experimental_data = [
    ["Time (min)", "Temperature (C)", "Pressure (bar)", "Flow Rate (L/min)"],
    [0, 25.0, 1.0, 50.0],
    [5, 45.2, 1.5, 55.3],
    [10, 65.8, 2.0, 60.1],
    [15, 85.3, 2.5, 65.7],
    [20, 95.1, 2.8, 70.2],
]

# Write to CSV
with open('experiment_data.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerows(experimental_data)

print("Data written to experiment_data.csv")

# Display what we wrote
print("\nCSV contents:")
with open('experiment_data.csv', 'r') as file:
    print(file.read())

=== Writing CSV File ===
Data written to experiment_data.csv

CSV contents:
Time (min),Temperature (C),Pressure (bar),Flow Rate (L/min)
0,25.0,1.0,50.0
5,45.2,1.5,55.3
10,65.8,2.0,60.1
15,85.3,2.5,65.7
20,95.1,2.8,70.2



In [9]:
# Reading CSV Files

print("=== Reading CSV File ===")

with open('experiment_data.csv', 'r') as file:
    reader = csv.reader(file)
    
    # Read header
    header = next(reader)
    print("Header:", header)
    print()
    
    # Read data rows
    print("Data:")
    for row in reader:
        time, temp, pressure, flow = row
        print(f"  t={time} min: T={temp}°C, P={pressure} bar, F={flow} L/min")

=== Reading CSV File ===
Header: ['Time (min)', 'Temperature (C)', 'Pressure (bar)', 'Flow Rate (L/min)']

Data:
  t=0 min: T=25.0°C, P=1.0 bar, F=50.0 L/min
  t=5 min: T=45.2°C, P=1.5 bar, F=55.3 L/min
  t=10 min: T=65.8°C, P=2.0 bar, F=60.1 L/min
  t=15 min: T=85.3°C, P=2.5 bar, F=65.7 L/min
  t=20 min: T=95.1°C, P=2.8 bar, F=70.2 L/min


In [10]:
# read csv with pandas
import pandas as pd
df = pd.read_csv('experiment_data.csv')
print("\nData read with pandas:")
print(df)   


Data read with pandas:
   Time (min)  Temperature (C)  Pressure (bar)  Flow Rate (L/min)
0           0             25.0             1.0               50.0
1           5             45.2             1.5               55.3
2          10             65.8             2.0               60.1
3          15             85.3             2.5               65.7
4          20             95.1             2.8               70.2


(6.4)=
## 6.4 File Paths and Directories

Understanding file paths is crucial for working with files in different locations.

**Path types:**
- **Absolute path**: Full path from root directory
  - `/Users/username/data/experiment.csv` (Mac/Linux)
  - `C:\Users\username\data\experiment.csv` (Windows)
- **Relative path**: Path relative to current directory
  - `data/experiment.csv`
  - `../data/experiment.csv` (.. means parent directory)


Assume your current working directory is:
```
project/
├── main.py
├── data/
│   ├── experiment.csv
│   └── results.csv
└── output/
```
- `data/experiment.csv`  
  → accesses the file inside the `data` directory

- `output/`  
  → refers to the `output` folder in the current directory

- `../project/data/experiment.csv`  
  → moves up one directory, then navigates into `project/data`

- `/Users/username/project/data/experiment.csv`  
  → absolute path that works regardless of the current directory

Python's `pathlib` module provides a modern, cross-platform way to work with paths.

In [11]:
from pathlib import Path
import os

print("=== Working with Paths ===")

# Get current working directory
current_dir = Path.cwd()
print(f"Current directory: {current_dir}")

=== Working with Paths ===
Current directory: /Users/hoon/CHME212/cheme_comp_book/docs/chapter06


In [12]:
# Create a Path object
data_file = Path('experiment_data.csv')
print(f"\nFile path: {data_file}")
print(f"Absolute path: {data_file.absolute()}")
print(f"File exists: {data_file.exists()}")
print(f"Is file: {data_file.is_file()}")


File path: experiment_data.csv
Absolute path: /Users/hoon/CHME212/cheme_comp_book/docs/chapter06/experiment_data.csv
File exists: True
Is file: True


In [13]:
# Get file information
if data_file.exists():
    print(f"File name: {data_file.name}")
    print(f"File stem (no extension): {data_file.stem}")
    print(f"File extension: {data_file.suffix}")
    print(f"File size: {data_file.stat().st_size} bytes")

File name: experiment_data.csv
File stem (no extension): experiment_data
File extension: .csv
File size: 149 bytes


In [14]:
# Creating Directories and Organizing Files

print("=== Creating Directory Structure ===")

# Create a data directory
data_dir = Path('project_data')
data_dir.mkdir(exist_ok=True)  # exist_ok=True won't error if exists

print(f"Created directory: {data_dir}")

=== Creating Directory Structure ===
Created directory: project_data


In [15]:
# Create subdirectories
raw_dir = data_dir / 'raw'
processed_dir = data_dir / 'processed'

raw_dir.mkdir(exist_ok=True)
processed_dir.mkdir(exist_ok=True)

print(f"Created: {raw_dir}")
print(f"Created: {processed_dir}")

Created: project_data/raw
Created: project_data/processed


In [16]:
# Write a file in the subdirectory
output_file = processed_dir / 'results.txt'
output_file.write_text("Analysis complete: All tests passed\n")

print(f"\nWrote file: {output_file}")
print(f"Contents: {output_file.read_text()}")


Wrote file: project_data/processed/results.txt
Contents: Analysis complete: All tests passed



(6.5)=
## 6.5 Error Handling with Files

Files can cause many errors:
- File doesn't exist
- No permission to read/write
- Disk full
- File is locked by another program

**Good practice:** Always handle potential errors!

In [17]:
# Example 1: File doesn't exist
print("\n1. Trying to read non-existent file:")
try:
    with open('nonexistent_file.txt', 'r') as file:
        content = file.read()
except FileNotFoundError:
    print("   Error: File not found!")
    print("   Creating a default file instead...")
    with open('nonexistent_file.txt', 'w') as file:
        file.write("Default content\n")
    print("   Default file created.")


1. Trying to read non-existent file:


In [18]:
# Example 2: Check if file exists before reading
print("\n2. Checking file existence first:")
filename = 'data_file.txt'
if Path(filename).exists():
    with open(filename, 'r') as file:
        content = file.read()
    print(f"   File read successfully: {len(content)} characters")
else:
    print(f"   File '{filename}' does not exist")


2. Checking file existence first:
   File 'data_file.txt' does not exist


In [19]:
# print("\n3. Comprehensive error handling:")

# def safe_read_file(filename):
#     """Safely read a file with error handling"""
#     try:
#         with open(filename, 'r') as file:
#             return file.read()
#     except FileNotFoundError:
#         print(f"   Error: '{filename}' not found")
#         return None
#     except PermissionError:
#         print(f"   Error: No permission to read '{filename}'")
#         return None
#     except Exception as e:
#         print(f"   Unexpected error: {e}")
#         return None
    
# content = safe_read_file('experiment_data.csv')
# if content:
#     print(f"   Successfully read {len(content)} characters")

(6.6)=
## 6.6 Practical Applications

Let's put everything together with realistic examples.

In [20]:
# Comprehensive Example 1: Processing Sensor Log Files

print("=" * 60)
print("SENSOR DATA ANALYSIS PIPELINE")
print("=" * 60)

# Step 1: Create sample sensor log
print("\n[1/4] Creating sample sensor log...")

sensor_log = """timestamp,sensor_id,temperature,pressure,status
2024-01-15 08:00:00,S001,25.3,1.01,OK
2024-01-15 08:05:00,S001,28.7,1.05,OK
2024-01-15 08:10:00,S001,95.2,2.85,WARNING
2024-01-15 08:15:00,S001,105.3,3.15,CRITICAL
2024-01-15 08:20:00,S001,98.1,2.95,WARNING
2024-01-15 08:25:00,S001,85.4,2.50,OK
"""

with open('sensor_log.csv', 'w') as f:
    f.write(sensor_log)
print("   sensor_log.csv created")

# Step 2: Read and analyze data
print("\n[2/4] Analyzing sensor data...")

warnings = []
critical = []
temperatures = []

with open('sensor_log.csv', 'r') as file:
    reader = csv.DictReader(file)
    
    for row in reader:
        temp = float(row['temperature'])
        temperatures.append(temp)
        
        if row['status'] == 'WARNING':
            warnings.append(row)
        elif row['status'] == 'CRITICAL':
            critical.append(row)

print(f"   Total readings: {len(temperatures)}")
print(f"   Warnings: {len(warnings)}")
print(f"   Critical alerts: {len(critical)}")
print(f"   Avg temperature: {sum(temperatures)/len(temperatures):.1f}°C")
print(f"   Max temperature: {max(temperatures):.1f}°C")

# Step 3: Generate report
print("\n[3/4] Generating analysis report...")

with open('sensor_analysis_report.txt', 'w') as file:
    file.write("SENSOR ANALYSIS REPORT\n")
    file.write("=" * 50 + "\n\n")
    
    file.write(f"Total readings: {len(temperatures)}\n")
    file.write(f"Average temperature: {sum(temperatures)/len(temperatures):.1f}°C\n")
    file.write(f"Maximum temperature: {max(temperatures):.1f}°C\n")
    file.write(f"Minimum temperature: {min(temperatures):.1f}°C\n\n")
    
    file.write(f"Warnings: {len(warnings)}\n")
    file.write(f"Critical alerts: {len(critical)}\n\n")
    
    if critical:
        file.write("CRITICAL EVENTS:\n")
        for event in critical:
            file.write(f"  {event['timestamp']}: "
                      f"T={event['temperature']}°C, "
                      f"P={event['pressure']} bar\n")

print("   Report saved to sensor_analysis_report.txt")

# Step 4: Display the report
print("\n[4/4] Report contents:")
print("=" * 60)
with open('sensor_analysis_report.txt', 'r') as file:
    print(file.read())

SENSOR DATA ANALYSIS PIPELINE

[1/4] Creating sample sensor log...
   sensor_log.csv created

[2/4] Analyzing sensor data...
   Total readings: 6
   Critical alerts: 1
   Avg temperature: 73.0°C
   Max temperature: 105.3°C

[3/4] Generating analysis report...
   Report saved to sensor_analysis_report.txt

[4/4] Report contents:
SENSOR ANALYSIS REPORT

Total readings: 6
Average temperature: 73.0°C
Maximum temperature: 105.3°C
Minimum temperature: 25.3°C

Critical alerts: 1

CRITICAL EVENTS:
  2024-01-15 08:15:00: T=105.3°C, P=3.15 bar



In [21]:
# Comprehensive Example 2: Batch Processing Multiple Files

print("=" * 60)
print("BATCH DATA QUALITY CONTROL SYSTEM")
print("=" * 60)

# Step 1: Create sample batch data files
print("\n[1/3] Creating sample batch data...")

batch_dir = Path('batch_data')
batch_dir.mkdir(exist_ok=True)

# Create multiple batch files
batches = {
    'batch_001.csv': [["purity", "yield"], [0.965, 0.88], [0.962, 0.89]],
    'batch_002.csv': [["purity", "yield"], [0.948, 0.91], [0.945, 0.90]],
    'batch_003.csv': [["purity", "yield"], [0.978, 0.85], [0.975, 0.86]],
}

for filename, data in batches.items():
    filepath = batch_dir / filename
    with open(filepath, 'w', newline='') as file:
        writer = csv.writer(file)
        writer.writerows(data)
    print(f"   Created: {filename}")

# Step 2: Process all batch files
print("\n[2/3] Processing all batches...")

min_purity = 0.95
min_yield = 0.85

results = []

for batch_file in sorted(batch_dir.glob('batch_*.csv')):
    print(f"\n   Processing {batch_file.name}...")
    
    with open(batch_file, 'r') as file:
        reader = csv.DictReader(file)
        
        batch_purities = []
        batch_yields = []
        
        for row in reader:
            batch_purities.append(float(row['purity']))
            batch_yields.append(float(row['yield']))
        
        avg_purity = sum(batch_purities) / len(batch_purities)
        avg_yield = sum(batch_yields) / len(batch_yields)
        
        # Quality control check
        if avg_purity >= min_purity and avg_yield >= min_yield:
            status = "PASS"
        else:
            status = "FAIL"
        
        print(f"     Avg purity: {avg_purity:.1%}")
        print(f"     Avg yield: {avg_yield:.1%}")
        print(f"     Status: {status}")
        
        results.append({
            'batch': batch_file.stem,
            'purity': avg_purity,
            'yield': avg_yield,
            'status': status
        })

# Step 3: Write summary report
print("\n[3/3] Generating summary report...")

summary_file = batch_dir / 'quality_control_summary.csv'
fieldnames = ['batch', 'purity', 'yield', 'status']

with open(summary_file, 'w', newline='') as file:
    writer = csv.DictWriter(file, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(results)

print(f"   Summary saved to {summary_file}")

# Display summary
print("\nQUALITY CONTROL SUMMARY:")
print("=" * 60)
passed = sum(1 for r in results if r['status'] == 'PASS')
failed = len(results) - passed
print(f"Total batches: {len(results)}")
print(f"Passed: {passed}")
print(f"Failed: {failed}")
print(f"\nPass rate: {passed/len(results):.1%}")

BATCH DATA QUALITY CONTROL SYSTEM

[1/3] Creating sample batch data...
   Created: batch_001.csv
   Created: batch_002.csv
   Created: batch_003.csv

[2/3] Processing all batches...

   Processing batch_001.csv...
     Avg purity: 96.4%
     Avg yield: 88.5%
     Status: PASS

   Processing batch_002.csv...
     Avg purity: 94.6%
     Avg yield: 90.5%
     Status: FAIL

   Processing batch_003.csv...
     Avg purity: 97.6%
     Avg yield: 85.5%
     Status: PASS

[3/3] Generating summary report...
   Summary saved to batch_data/quality_control_summary.csv

QUALITY CONTROL SUMMARY:
Total batches: 3
Passed: 2
Failed: 1

Pass rate: 66.7%


## Summary

In this chapter, you learned how to work with files in Python:

### Reading Files
- **`open()` function**: Basic file operations
- **Context managers**: `with` statement (best practice)
- **Methods**: `.read()`, `.readline()`, `.readlines()`
- **Iterate line by line**: Memory-efficient for large files

### Writing Files
- **Write mode**: `'w'` (overwrites)
- **Append mode**: `'a'` (adds to end)
- **Methods**: `.write()`, `.writelines()`

### CSV Files
- **`csv.reader()`** and **`csv.writer()`**: Basic CSV operations
- **`csv.DictReader()`** and **`csv.DictWriter()`**: Dictionary-based (more readable)
- Common format for data exchange

### File Paths
- **`pathlib.Path`**: Modern path handling
- **Absolute vs relative paths**
- **Directory operations**: Create, check existence

### Best Practices
1. **Always use context managers** (`with` statement)
2. **Handle errors** (file not found, permissions)
3. **Close files** (automatic with context managers)
4. **Check file existence** before reading
5. **Use meaningful file names**
6. **Organize data** in directories

### Quick Reference

```python
# Reading a file
with open('data.txt', 'r') as file:
    content = file.read()

# Writing a file
with open('output.txt', 'w') as file:
    file.write("Results\n")

# Reading CSV
import csv
with open('data.csv', 'r') as file:
    reader = csv.DictReader(file)
    for row in reader:
        print(row['column_name'])

# Writing CSV
with open('output.csv', 'w', newline='') as file:
    writer = csv.DictWriter(file, fieldnames=['name', 'value'])
    writer.writeheader()
    writer.writerow({'name': 'test', 'value': 123})

# Working with paths
from pathlib import Path
data_file = Path('data') / 'experiment.csv'
if data_file.exists():
    content = data_file.read_text()
```

File I/O is fundamental for real-world data analysis and scientific computing!