# Python File Handling Tutorial

This notebook covers essential file operations in Python, including text files, CSV files, and JSON files.

## What is File Handling?

File handling allows your Python programs to:
- **Read data** from external files
- **Write data** to files for storage
- **Process different file formats** (text, CSV, JSON)
- **Persist data** between program runs

## The `with` Statement

Python's `with` statement is the recommended way to handle files because it:
- **Automatically closes files** when done
- **Handles errors gracefully** 
- **Prevents resource leaks**
- **Makes code cleaner and safer**

**Basic Syntax:**
```python
with open('filename.txt', 'mode') as file_variable:
    # Work with the file
    pass
# File automatically closes here
```

---

## File Modes Reference

| Mode | Description | Creates New File? | Overwrites? |
|------|-------------|-------------------|-------------|
| `'r'` | Read only (default) | No | N/A |
| `'w'` | Write only | Yes | Yes - completely |
| `'a'` | Append only | Yes | No - adds to end |
| `'r+'` | Read and write | No | No |
| `'w+'` | Write and read | Yes | Yes - completely |
| `'a+'` | Append and read | Yes | No - adds to end |

**Important:** Always specify the mode explicitly for clarity, even though `'r'` is the default.

---

## 1. Reading Text Files - Basic Method

The `.read()` method loads the entire file content into memory as a single string. This is perfect for small files where you need all the content at once.

**When to use:**
- Small files that fit comfortably in memory
- When you need the entire file content as one piece
- Processing the whole file at once

**Caution:** Can use a lot of memory with large files.

In [None]:
# Reading entire file content at once
# Note: This assumes 'welcome.txt' exists in your working directory
try:
    with open('welcome.txt', 'r') as text_file:
        text_data = text_file.read()
        print("File contents:")
        print(text_data)
        print(f"\nFile size: {len(text_data)} characters")
except FileNotFoundError:
    print("File 'welcome.txt' not found. Creating a sample file...")
    # Create a sample file for demonstration
    with open('welcome.txt', 'w') as text_file:
        text_file.write("Welcome to Python File Handling!\n")
        text_file.write("This is a sample text file.\n")
        text_file.write("You can read, write, and modify files easily.")
    
    # Now read the file
    with open('welcome.txt', 'r') as text_file:
        text_data = text_file.read()
        print("File contents:")
        print(text_data)

## 2. Reading Text Files Line by Line

The `.readlines()` method returns a list where each element is a line from the file. This approach is memory-efficient for large files and gives you more control over processing.

**Benefits:**
- Process files line by line
- More memory efficient for large files
- Easier to handle structured text data
- Can stop processing early if needed

**Note:** Each line includes the newline character (`\n`) at the end.

In [None]:
# Reading file line by line
try:
    with open('how_many_lines.txt', 'r') as lines_doc:
        lines = lines_doc.readlines()
        print(f"File has {len(lines)} lines:\n")
        
        for i, line in enumerate(lines, 1):
            print(f"Line {i}: {line.rstrip()}")
            
except FileNotFoundError:
    print("File 'how_many_lines.txt' not found. Creating a sample file...")
    # Create a sample file with multiple lines
    with open('how_many_lines.txt', 'w') as lines_doc:
        lines_doc.write("First line of the file\n")
        lines_doc.write("Second line with some data\n")
        lines_doc.write("Third line for demonstration\n")
        lines_doc.write("Fourth and final line")
    
    # Now read the file line by line
    with open('how_many_lines.txt', 'r') as lines_doc:
        lines = lines_doc.readlines()
        print(f"File has {len(lines)} lines:\n")
        
        for i, line in enumerate(lines, 1):
            print(f"Line {i}: {line.rstrip()}")

## Alternative: Iterating Directly Over File Object

You can iterate directly over a file object without calling `.readlines()`. This is the most memory-efficient approach for large files.

In [None]:
# More memory-efficient line-by-line reading
print("Reading file using direct iteration:")
with open('how_many_lines.txt', 'r') as lines_doc:
    for line_num, line in enumerate(lines_doc, 1):
        print(f"Line {line_num}: {line.strip()}")

## 3. Writing Text Files - Write Mode

Write mode (`'w'`) creates a new file or completely overwrites an existing file. Use this when you want to start fresh with new content.

**Key Points:**
- **Creates new file** if it doesn't exist
- **Overwrites completely** if file exists
- **No way to recover** overwritten data
- File is created even if you don't write anything

**Use Cases:**
- Creating new files
- Replacing entire file contents
- Generating reports or logs from scratch

In [None]:
# Writing to a file (overwrites existing content)
with open('bad_bands.txt', 'w') as bad_bands_doc:
    bad_bands_doc.write('The Beatles')  # Controversial opinion!
    # Note: This completely replaces any existing content

print("File 'bad_bands.txt' has been created/overwritten.")

# Let's read it back to confirm
with open('bad_bands.txt', 'r') as bad_bands_doc:
    content = bad_bands_doc.read()
    print(f"File contents: '{content}'")

# Writing multiple lines
with open('good_bands.txt', 'w') as good_bands_doc:
    good_bands_doc.write('Led Zeppelin\n')
    good_bands_doc.write('Pink Floyd\n')
    good_bands_doc.write('Queen\n')
    good_bands_doc.write('The Rolling Stones')

print("\nCreated 'good_bands.txt' with multiple lines:")
with open('good_bands.txt', 'r') as good_bands_doc:
    print(good_bands_doc.read())

## 4. Appending to Text Files - Append Mode

Append mode (`'a'`) adds new content to the end of an existing file without removing what's already there. If the file doesn't exist, it creates a new one.

**Benefits:**
- **Preserves existing content**
- **Adds to the end** of the file
- **Safe for logs** and cumulative data
- **Creates file** if it doesn't exist

**Common Uses:**
- Log files
- Adding entries to lists
- Incremental data collection

In [None]:
# Appending to a file (preserves existing content)
# First, let's create a file with some initial content
with open('cool_dogs.txt', 'w') as cool_dogs_file:
    cool_dogs_file.write('Lassie\n')
    cool_dogs_file.write('Beethoven\n')

print("Initial file contents:")
with open('cool_dogs.txt', 'r') as cool_dogs_file:
    print(cool_dogs_file.read())

# Now append new content
with open('cool_dogs.txt', 'a') as cool_dogs_file:
    cool_dogs_file.write('Air Buddy\n')
    cool_dogs_file.write('Scooby-Doo\n')

print("After appending:")
with open('cool_dogs.txt', 'r') as cool_dogs_file:
    content = cool_dogs_file.read()
    print(content)
    print(f"Total dogs listed: {len(content.strip().split())}")

## 5. Reading CSV Files - Basic Method

CSV (Comma-Separated Values) files are a common format for structured data. You can read them as plain text, but Python's `csv` module provides much better tools.

**CSV Format Basics:**
- Values separated by commas (or other delimiters)
- Usually has headers in the first row
- Can contain text, numbers, dates, etc.
- Widely supported by spreadsheet applications

In [None]:
# Reading CSV as plain text (basic method)
try:
    with open('logger.csv', 'r') as log_csv_file:
        csv_content = log_csv_file.read()
        print("Raw CSV content:")
        print(csv_content)
except FileNotFoundError:
    print("File 'logger.csv' not found. We'll create one in the next examples.")
    
    # Create a sample CSV file
    sample_csv = "time,address,limit\n08:39:37,1.227.124.181,844404\n13:13:35,198.51.139.193,543871"
    with open('sample_log.csv', 'w') as sample_file:
        sample_file.write(sample_csv)
    
    print("Created sample CSV file:")
    with open('sample_log.csv', 'r') as sample_file:
        print(sample_file.read())

## 6. Reading CSV Files with csv.DictReader

`csv.DictReader` is a powerful tool that treats each row as a dictionary, using the first row as column headers. This makes working with CSV data much more intuitive.

**Advantages:**
- **Column names as keys** - access data by header name
- **Automatic parsing** - handles quotes, commas in data
- **Easy data extraction** - perfect for processing specific columns
- **Readable code** - `row['Email']` vs `row[2]`

In [None]:
import csv

# Create a sample users.csv file for demonstration
sample_users = """Name,Email,Age,City
John Doe,john@example.com,25,New York
Jane Smith,jane@example.com,30,Los Angeles
Bob Johnson,bob@example.com,35,Chicago
Alice Brown,alice@example.com,28,Houston"""

with open('users.csv', 'w') as users_file:
    users_file.write(sample_users)

# Now read CSV using DictReader
list_of_email_addresses = []

with open('users.csv', newline='') as users_csv:
    user_reader = csv.DictReader(users_csv)
    
    print("CSV Headers:", user_reader.fieldnames)
    print("\nProcessing rows:")
    
    for row_num, row in enumerate(user_reader, 1):
        print(f"Row {row_num}: {row['Name']} ({row['Email']}) - Age {row['Age']}")
        list_of_email_addresses.append(row['Email'])

print(f"\nExtracted {len(list_of_email_addresses)} email addresses:")
for email in list_of_email_addresses:
    print(f"  - {email}")

## 7. CSV Files with Custom Delimiters

Not all "CSV" files use commas as separators. `csv.DictReader` can handle different delimiters like semicolons, tabs, or custom characters like `@`.

**Common Delimiters:**
- `,` (comma) - standard CSV
- `;` (semicolon) - European CSV format
- `\t` (tab) - TSV (Tab-Separated Values)
- `|` (pipe) - database exports
- Custom characters as needed

In [None]:
import csv

# Create a sample books.csv with @ delimiter
sample_books = """Title@Author@ISBN@Year
1984@George Orwell@978-0451524935@1949
To Kill a Mockingbird@Harper Lee@978-0061120084@1960
The Great Gatsby@F. Scott Fitzgerald@978-0743273565@1925
Pride and Prejudice@Jane Austen@978-0486284736@1813"""

with open('books.csv', 'w') as books_file:
    books_file.write(sample_books)

# Read CSV with custom delimiter
with open('books.csv', 'r') as books_csv:
    books_reader = csv.DictReader(books_csv, delimiter='@')
    
    print("Books database:")
    print("-" * 50)
    
    # Extract ISBN numbers using list comprehension
    isbn_list = [book['ISBN'] for book in books_reader]

# Read again to display full information (reader is exhausted after first use)
with open('books.csv', 'r') as books_csv:
    books_reader = csv.DictReader(books_csv, delimiter='@')
    
    for book in books_reader:
        print(f"'{book['Title']}' by {book['Author']} ({book['Year']})")
        print(f"  ISBN: {book['ISBN']}")
        print()

print(f"Extracted {len(isbn_list)} ISBN numbers:")
for isbn in isbn_list:
    print(f"  - {isbn}")

## 8. Writing CSV Files with csv.DictWriter

`csv.DictWriter` is the counterpart to `DictReader` for writing CSV files. It allows you to write dictionaries as CSV rows, making it perfect for structured data output.

**Key Methods:**
- `writeheader()` - writes the column headers
- `writerow(dict)` - writes a single dictionary as a row
- `writerows(list_of_dicts)` - writes multiple dictionaries

**Benefits:**
- **Structured output** - ensures consistent column order
- **Automatic formatting** - handles commas, quotes in data
- **Header management** - easy to add column names

In [None]:
import csv

# Sample access log data (like server logs)
access_log = [
    {'time': '08:39:37', 'limit': 844404, 'address': '1.227.124.181'},
    {'time': '13:13:35', 'limit': 543871, 'address': '198.51.139.193'},
    {'time': '19:40:45', 'limit': 3021, 'address': '172.1.254.208'},
    {'time': '18:57:16', 'limit': 67031769, 'address': '172.58.247.219'},
    {'time': '21:17:13', 'limit': 9083, 'address': '124.144.20.113'},
    {'time': '23:34:17', 'limit': 65913, 'address': '203.236.149.220'},
    {'time': '13:58:05', 'limit': 1541474, 'address': '192.52.206.76'},
    {'time': '10:52:00', 'limit': 11465607, 'address': '104.47.149.93'},
    {'time': '14:56:12', 'limit': 109, 'address': '192.31.185.7'},
    {'time': '18:56:35', 'limit': 6207, 'address': '2.228.164.197'}
]

# Define the field order for CSV columns
fields = ['time', 'address', 'limit']

# Write data to CSV file
with open('logger.csv', 'w', newline='') as logger_csv:
    log_writer = csv.DictWriter(logger_csv, fieldnames=fields)
    
    # Write the header row
    log_writer.writeheader()
    
    # Write each log entry as a row
    for line in access_log:
        log_writer.writerow(line)

print(f"Successfully wrote {len(access_log)} log entries to 'logger.csv'")

# Verify by reading the file back
print("\nVerifying written data:")
with open('logger.csv', 'r') as logger_csv:
    content = logger_csv.read()
    print(content)

# Statistical analysis of the log data
total_requests = len(access_log)
avg_limit = sum(entry['limit'] for entry in access_log) / total_requests
max_limit = max(entry['limit'] for entry in access_log)
min_limit = min(entry['limit'] for entry in access_log)

print(f"\nLog Analysis:")
print(f"Total requests: {total_requests}")
print(f"Average limit: {avg_limit:,.0f}")
print(f"Maximum limit: {max_limit:,}")
print(f"Minimum limit: {min_limit:,}")

## 9. Reading JSON Files

JSON (JavaScript Object Notation) is a popular format for structured data, especially in web applications and APIs. Python's `json` module makes it easy to work with JSON data.

**JSON Basics:**
- **Lightweight** text format
- **Human-readable** structure
- **Maps to Python types** - objects→dict, arrays→list, etc.
- **Web standard** for APIs and configuration

**Python JSON Mapping:**
- JSON object → Python dict
- JSON array → Python list
- JSON string → Python str
- JSON number → Python int/float
- JSON boolean → Python bool
- JSON null → Python None

In [None]:
import json

# Create a sample JSON file
sample_message = {
    "text": "Hello, World! This is a JSON message.",
    "author": "Python Tutorial",
    "timestamp": "2024-01-15T10:30:00Z",
    "priority": "high",
    "read": False,
    "tags": ["tutorial", "json", "python"],
    "metadata": {
        "version": "1.0",
        "encoding": "utf-8"
    }
}

# Write JSON to file first
with open('message.json', 'w') as message_file:
    json.dump(sample_message, message_file, indent=2)

print("Created 'message.json' file\n")

# Now read JSON from file
with open('message.json', 'r') as message_json:
    message = json.load(message_json)
    
    # Access the main text
    print(f"Message text: {message['text']}")
    
    # Access nested data
    print(f"Author: {message['author']}")
    print(f"Priority: {message['priority']}")
    print(f"Read status: {message['read']}")
    
    # Access arrays and nested objects
    print(f"Tags: {', '.join(message['tags'])}")
    print(f"Version: {message['metadata']['version']}")
    
    # Show the data type
    print(f"\nPython type: {type(message)}")
    print(f"Available keys: {list(message.keys())}")

## 10. Writing JSON Files

Writing JSON files is straightforward with `json.dump()`. You can write any Python data structure that maps to JSON (dicts, lists, strings, numbers, booleans, None).

**Common Parameters:**
- `indent` - adds pretty formatting with indentation
- `sort_keys` - sorts dictionary keys alphabetically
- `ensure_ascii` - controls non-ASCII character encoding

**Use Cases:**
- Configuration files
- API responses
- Data exchange between applications
- Storing complex data structures

In [None]:
import json

# Complex data structure to save as JSON
data_payload = [
    {
        'interesting message': 'What is JSON? A web application\'s little pile of secrets.',
        'follow up': 'But enough talk!',
        'timestamp': '2024-01-15T14:30:00Z',
        'importance': 9,
        'categories': ['json', 'standards', 'interoperability']
    },
    {
        'interesting message': 'Python makes JSON handling incredibly easy with the json module.',
        'follow up': 'Perfect for modern applications!',
        'timestamp': '2024-01-15T14:40:00Z',
        'importance': 7,
        'categories': ['python', 'json', 'tutorial']
    }
]

# Write to JSON file with pretty formatting
with open('data.json', 'w') as data_json:
    json.dump(data_payload, data_json, indent=2, sort_keys=True)

print(f"Successfully wrote {len(data_payload)} records to 'data.json'")

# Read back and display
print("\nFile contents:")
with open('data.json', 'r') as data_json:
    content = data_json.read()
    print(content[:300] + "..." if len(content) > 300 else content)

# Load and analyze the data
with open('data.json', 'r') as data_json:
    loaded_data = json.load(data_json)
    
print(f"\nData Analysis:")
print(f"Number of messages: {len(loaded_data)}")
avg_importance = sum(msg['importance'] for msg in loaded_data) / len(loaded_data)
print(f"Average importance: {avg_importance:.1f}")

# Show all unique categories
all_categories = set()
for msg in loaded_data:
    all_categories.update(msg['categories'])
print(f"All categories: {sorted(all_categories)}")

## Working with JSON Strings

Sometimes you need to work with JSON data as strings rather than files. The `json` module provides `loads()` and `dumps()` for this purpose.

In [None]:
import json

# Convert Python object to JSON string
python_data = {
    "name": "Alice",
    "age": 30,
    "hobbies": ["reading", "hiking", "coding"],
    "active": True
}

# dumps() = "dump string" - converts to JSON string
json_string = json.dumps(python_data, indent=2)
print("Python object as JSON string:")
print(json_string)
print(f"Type: {type(json_string)}")

# loads() = "load string" - converts from JSON string
parsed_data = json.loads(json_string)
print(f"\nParsed back to Python: {parsed_data}")
print(f"Type: {type(parsed_data)}")
print(f"Name: {parsed_data['name']}, Age: {parsed_data['age']}")

## File Handling Best Practices

### 1. Error Handling
Always handle potential file errors gracefully:

In [None]:
import json
import csv

def safe_read_json(filename):
    """Safely read a JSON file with comprehensive error handling."""
    try:
        with open(filename, 'r') as file:
            data = json.load(file)
            return data, None  # data, error
    except FileNotFoundError:
        return None, f"File '{filename}' not found"
    except json.JSONDecodeError as e:
        return None, f"Invalid JSON in '{filename}': {e}"
    except PermissionError:
        return None, f"Permission denied accessing '{filename}'"
    except Exception as e:
        return None, f"Unexpected error reading '{filename}': {e}"

# Test the function
data, error = safe_read_json('data.json')
if error:
    print(f"Error: {error}")
else:
    print(f"Successfully read {len(data)} items from JSON file")

# Test with non-existent file
data, error = safe_read_json('nonexistent.json')
if error:
    print(f"Expected error: {error}")

### 2. File Path Handling
Use the `pathlib` module for robust file path operations:

In [None]:
from pathlib import Path
import json

# Modern file path handling
data_dir = Path('data_files')
data_dir.mkdir(exist_ok=True)  # Create directory if it doesn't exist

# Create file paths
config_file = data_dir / 'config.json'
log_file = data_dir / 'application.log'
csv_file = data_dir / 'users.csv'

print(f"Config file path: {config_file}")
print(f"Log file path: {log_file}")
print(f"CSV file path: {csv_file}")

# Check if files exist
print(f"\nFile existence:")
print(f"Config exists: {config_file.exists()}")
print(f"Log exists: {log_file.exists()}")
print(f"CSV exists: {csv_file.exists()}")

# Create a sample config file
config_data = {
    "app_name": "File Handler Demo",
    "version": "1.0",
    "debug": True,
    "max_file_size": 1024000
}

with open(config_file, 'w') as f:
    json.dump(config_data, f, indent=2)

print(f"\nCreated config file: {config_file}")
print(f"File size: {config_file.stat().st_size} bytes")

### 3. Working with Large Files
For large files, process them chunk by chunk to avoid memory issues:

In [None]:
def process_large_file(filename, chunk_size=1024):
    """Process a large file in chunks to save memory."""
    try:
        with open(filename, 'r') as file:
            chunk_count = 0
            total_chars = 0
            
            while True:
                chunk = file.read(chunk_size)
                if not chunk:  # End of file
                    break
                    
                chunk_count += 1
                total_chars += len(chunk)
                
                # Process the chunk (example: count characters)
                if chunk_count % 10 == 0:  # Progress update
                    print(f"Processed {chunk_count} chunks, {total_chars} characters")
            
            return total_chars, chunk_count
            
    except FileNotFoundError:
        print(f"File '{filename}' not found")
        return 0, 0

# Create a sample "large" file for demonstration
large_content = "This is a sample line.\n" * 1000  # 1000 lines
with open('large_sample.txt', 'w') as f:
    f.write(large_content)

# Process the file
total_chars, chunks = process_large_file('large_sample.txt', chunk_size=100)
print(f"\nFile processing complete:")
print(f"Total characters: {total_chars:,}")
print(f"Total chunks: {chunks}")

### 4. Data Validation and Cleaning
Always validate and clean data when reading from files:

In [None]:
import csv
from datetime import datetime

def validate_user_data(row):
    """Validate and clean user data from CSV."""
    errors = []
    
    # Clean and validate name
    name = row.get('Name', '').strip()
    if not name:
        errors.append('Name is required')
    
    # Validate email
    email = row.get('Email', '').strip().lower()
    if not email or '@' not in email:
        errors.append('Valid email is required')
    
    # Validate age
    try:
        age = int(row.get('Age', 0))
        if age < 0 or age > 150:
            errors.append('Age must be between 0 and 150')
    except ValueError:
        errors.append('Age must be a number')
        age = None
    
    return {
        'name': name,
        'email': email,
        'age': age,
        'errors': errors,
        'valid': len(errors) == 0
    }

# Create sample data with some invalid entries
sample_data = """Name,Email,Age,City
John Doe,john@example.com,25,New York
,jane@example.com,30,Los Angeles
Bob Johnson,invalid-email,35,Chicago
Alice Brown,alice@example.com,999,Houston
Charlie Wilson,charlie@example.com,28,Seattle"""

with open('users_validation.csv', 'w') as f:
    f.write(sample_data)

# Process and validate the data
valid_users = []
invalid_users = []

with open('users_validation.csv', 'r') as csvfile:
    reader = csv.DictReader(csvfile)
    
    for row_num, row in enumerate(reader, 1):
        validated = validate_user_data(row)
        
        if validated['valid']:
            valid_users.append(validated)
        else:
            invalid_users.append((row_num, validated))

print(f"Data validation results:")
print(f"Valid users: {len(valid_users)}")
print(f"Invalid users: {len(invalid_users)}")

if invalid_users:
    print("\nValidation errors:")
    for row_num, user in invalid_users:
        print(f"Row {row_num}: {', '.join(user['errors'])}")

if valid_users:
    print("\nValid users:")
    for user in valid_users:
        print(f"  {user['name']} ({user['email']}) - Age {user['age']}")

## Summary and Comparison

### File Formats Comparison

| Format | Best For | Pros | Cons |
|--------|----------|------|------|
| **Text (.txt)** | Simple data, logs, notes | Simple, universal, human-readable | No structure, manual parsing |
| **CSV (.csv)** | Tabular data, spreadsheets | Structured, widely supported | Limited data types, no nesting |
| **JSON (.json)** | APIs, configs, complex data | Structured, supports nesting, web standard | Larger file size, no comments |

### Key Methods Summary

| Operation | Text Files | CSV Files | JSON Files |
|-----------|------------|-----------|------------|
| **Read All** | `file.read()` | `csv.DictReader()` | `json.load()` |
| **Read Lines** | `file.readlines()` | Loop through reader | N/A |
| **Write** | `file.write()` | `csv.DictWriter()` | `json.dump()` |
| **Append** | Mode `'a'` | Mode `'a'` + CSV writer | Load, modify, save |

### Best Practices Checklist

✅ **Always use `with` statements** for automatic file closing  
✅ **Handle exceptions** gracefully with try/except  
✅ **Validate data** when reading from external sources  
✅ **Use appropriate file modes** (r, w, a)  
✅ **Choose the right format** for your data type  
✅ **Process large files** in chunks when needed  
✅ **Use pathlib** for file path operations  
✅ **Close files properly** (automatically with `with`)  

## Next Steps

1. **Practice with real data** - Try reading your own CSV or JSON files
2. **Build a data processor** - Create a script that reads, processes, and writes data
3. **Explore advanced topics** - Learn about binary files, file compression, and databases
4. **Error handling** - Practice robust error handling for production code
5. **Performance** - Learn about memory-efficient processing for large files

File handling is fundamental to most real-world Python applications. Whether you're processing data, building web applications, or creating utilities, these skills will serve you well!