# Files, Folders, and Filesystems in Medical Data Integration

Understanding how to work with files and folders is fundamental for medical data integration. Medical data often comes in various formats (DICOM images, CSV lab results, HL7 messages, etc.) stored across different locations. This notebook covers essential concepts and practical skills for managing medical data files programmatically.

## Working with the Current Directory

Let's start by understanding where we are in the filesystem. The current working directory is where Python looks for files by default.

In [None]:
import os

current_directory = os.getcwd()
print(f"Current working directory: {current_directory}")

## Creating a Medical Data Directory Structure

Medical data often requires organized folder structures. Let's create a typical hierarchy for storing patient data, lab results, and imaging files.

In [None]:
# Define the directory structure
base_dir = "medical_data"
subdirectories = [
    "patient_records",
    "lab_results",
    "imaging/dicom",
    "imaging/processed",
    "reports"
]

Now let's create these directories. The `os.makedirs()` function can create nested directories, and `exist_ok=True` prevents errors if the directory already exists.

In [None]:
# Create the base directory
os.makedirs(base_dir, exist_ok=True)

# Create subdirectories
for subdir in subdirectories:
    path = os.path.join(base_dir, subdir)
    os.makedirs(path, exist_ok=True)
    print(f"Created: {path}")

## Listing Directory Contents

Let's explore what's in our medical data directory. This is useful when you need to process multiple files or check what data is available.

In [None]:
# List all items in the medical_data directory
items = os.listdir(base_dir)
print("Contents of medical_data directory:")
for item in items:
    print(f"  - {item}")

## Creating Sample Medical Data Files

Let's create some sample files to simulate a real medical data environment. We'll create patient records in CSV format and a simple text report.

In [None]:
# Create a sample patient record CSV
import csv

patient_data = [
    ["PatientID", "Name", "Age", "BloodType", "AdmissionDate"],
    ["P001", "John Doe", "45", "A+", "2024-01-15"],
    ["P002", "Jane Smith", "32", "O-", "2024-01-16"],
    ["P003", "Bob Johnson", "67", "B+", "2024-01-17"]
]

Now we'll write this data to a CSV file in the patient_records directory.

In [None]:
csv_path = os.path.join(base_dir, "patient_records", "patients_2024.csv")

with open(csv_path, 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerows(patient_data)

print(f"Created file: {csv_path}")

Let's create a sample lab result file. Lab results often come in text format with specific formatting.

In [None]:
lab_result = """LAB RESULT REPORT
==================
Patient ID: P001
Date: 2024-01-15
Test: Complete Blood Count (CBC)

Results:
- WBC: 7.2 K/uL (Normal: 4.5-11.0)
- RBC: 4.8 M/uL (Normal: 4.5-5.5)
- Hemoglobin: 14.5 g/dL (Normal: 13.5-17.5)
- Hematocrit: 42% (Normal: 39-49)
"""

lab_path = os.path.join(base_dir, "lab_results", "P001_CBC_20240115.txt")

In [None]:
with open(lab_path, 'w') as file:
    file.write(lab_result)

print(f"Created file: {lab_path}")

## Reading Files

Now let's read the files we created. First, we'll read the CSV file containing patient records.

In [None]:
with open(csv_path, 'r') as file:
    reader = csv.DictReader(file)
    print("Patient Records:")
    for row in reader:
        print(f"  {row['PatientID']}: {row['Name']}, Age {row['Age']}, Blood Type {row['BloodType']}")

Let's read the lab result text file and display its contents.

In [None]:
with open(lab_path, 'r') as file:
    content = file.read()
    print("Lab Result Content:")
    print(content)

## File Properties and Metadata

Understanding file properties is important for data validation and audit trails in medical systems. Let's examine file size, modification time, and other properties.

In [None]:
import datetime

# Get file statistics
file_stats = os.stat(csv_path)

print(f"File: {csv_path}")
print(f"Size: {file_stats.st_size} bytes")
print(f"Last modified: {datetime.datetime.fromtimestamp(file_stats.st_mtime)}")

## Walking Through Directory Trees

When dealing with large medical data repositories, you often need to find all files of a certain type. The `os.walk()` function helps traverse directory trees.

In [None]:
print("Complete directory structure:")
for root, dirs, files in os.walk(base_dir):
    level = root.replace(base_dir, '').count(os.sep)
    indent = ' ' * 2 * level
    print(f"{indent}{os.path.basename(root)}/")
    subindent = ' ' * 2 * (level + 1)
    for file in files:
        print(f"{subindent}{file}")

## Using pathlib for Modern File Operations

Python's `pathlib` module provides a more intuitive, object-oriented approach to file operations. This is particularly useful for cross-platform medical applications.

In [None]:
from pathlib import Path

# Create a Path object
medical_path = Path(base_dir)
print(f"Medical data path: {medical_path}")
print(f"Is directory: {medical_path.is_dir()}")
print(f"Absolute path: {medical_path.absolute()}")

Let's use pathlib to find all CSV files in our medical data directory.

In [None]:
# Find all CSV files
csv_files = list(medical_path.rglob("*.csv"))
print("CSV files found:")
for csv_file in csv_files:
    print(f"  - {csv_file.relative_to(medical_path)}")

## File Operations: Copying and Moving

Medical data often needs to be backed up or moved between systems. Let's demonstrate safe file operations using the `shutil` module.

In [None]:
import shutil

# Create a backup directory
backup_dir = os.path.join(base_dir, "backups")
os.makedirs(backup_dir, exist_ok=True)

# Copy the patient CSV file to backup
src = csv_path
dst = os.path.join(backup_dir, "patients_2024_backup.csv")
shutil.copy2(src, dst)  # copy2 preserves metadata

print(f"Backed up {os.path.basename(src)} to {dst}")

## Handling File Errors

In medical systems, robust error handling is crucial. Let's demonstrate how to handle common file-related errors gracefully.

In [None]:
def safe_read_file(filepath):
    """Safely read a file with error handling"""
    try:
        with open(filepath, 'r') as file:
            return file.read()
    except FileNotFoundError:
        print(f"Error: File '{filepath}' not found")
        return None
    except PermissionError:
        print(f"Error: No permission to read '{filepath}'")
        return None
    except Exception as e:
        print(f"Unexpected error reading '{filepath}': {e}")
        return None

In [None]:
# Test with existing file
content = safe_read_file(lab_path)
if content:
    print("File read successfully")

# Test with non-existing file
content = safe_read_file("non_existing_file.txt")

## Cleaning Up Empty Directories

Finally, let's create a utility function to clean up empty directories, which is useful for maintaining organized medical data repositories.

In [None]:
def remove_empty_dirs(path):
    """Remove empty directories recursively"""
    for root, dirs, files in os.walk(path, topdown=False):
        for dir_name in dirs:
            dir_path = os.path.join(root, dir_name)
            try:
                if not os.listdir(dir_path):  # Check if directory is empty
                    os.rmdir(dir_path)
                    print(f"Removed empty directory: {dir_path}")
            except OSError:
                pass  # Directory not empty or other error

# Test on our medical data directory
remove_empty_dirs(base_dir)

## Exercise

Create a medical data file organizer that:

1. Creates a new directory structure with folders for different years (2022, 2023, 2024)
2. Within each year, create subdirectories for each month (01-January, 02-February, etc.)
3. Generate sample patient admission files (at least 3) with random dates in 2024
4. Create a function that reads all files and organizes them into the correct year/month folders based on their admission date
5. Generate a summary report that counts the number of files in each month
6. Implement error handling for invalid dates or missing files

Bonus: Add functionality to archive (zip) data older than a specified date.

This exercise will help you practice file operations, directory management, and data organization – all critical skills for medical data integration.