# Path Handling and Portable I/O with pathlib

In medical data integration, we often work with files stored in complex directory structures across different operating systems. The `pathlib` module provides an object-oriented, platform-independent way to handle file paths and perform file operations.

Let's start by importing the pathlib module and creating a Path object for the current directory.

In [None]:
from pathlib import Path

current_dir = Path.cwd()
print(f"Current directory: {current_dir}")

Path objects can be created from strings and combined using the `/` operator, which works across all operating systems.

In [None]:
data_folder = Path("medical_data")
patient_file = data_folder / "patients" / "patient_001.csv"
print(f"Patient file path: {patient_file}")

We can check if paths exist and get information about them without worrying about OS-specific commands.

In [None]:
print(f"Does the path exist? {patient_file.exists()}")
print(f"Is it a file? {patient_file.is_file()}")
print(f"Is it a directory? {patient_file.is_dir()}")

Let's create a directory structure typical for medical data projects. The `mkdir()` method with `parents=True` creates all necessary parent directories.

In [None]:
project_root = Path("medical_project")
raw_data = project_root / "data" / "raw"
processed_data = project_root / "data" / "processed"

raw_data.mkdir(parents=True, exist_ok=True)
processed_data.mkdir(parents=True, exist_ok=True)
print("Directory structure created successfully")

Path objects provide convenient properties to access different parts of a file path.

In [None]:
sample_file = Path("medical_project/data/raw/blood_tests_2023.csv")
print(f"Name: {sample_file.name}")
print(f"Stem: {sample_file.stem}")
print(f"Suffix: {sample_file.suffix}")
print(f"Parent: {sample_file.parent}")

We can easily change file extensions, which is useful when converting medical data formats.

In [None]:
csv_file = Path("patient_data.csv")
json_file = csv_file.with_suffix(".json")
print(f"Original: {csv_file}")
print(f"Converted: {json_file}")

The `glob()` method allows us to find files matching a pattern, useful for batch processing medical data files.

In [None]:
# Create some sample files first
for i in range(3):
    file_path = raw_data / f"patient_{i:03d}.txt"
    file_path.touch()

# Now find all patient files
patient_files = list(raw_data.glob("patient_*.txt"))
print(f"Found {len(patient_files)} patient files:")
for f in patient_files:
    print(f"  - {f.name}")

Pathlib integrates seamlessly with file I/O operations. Here's how to write and read medical data.

In [None]:
# Write patient data
patient_info = "Patient ID: 12345\nAge: 45\nDiagnosis: Hypertension"
patient_file = processed_data / "patient_summary.txt"
patient_file.write_text(patient_info)
print("Patient data written successfully")

In [None]:
# Read the data back
content = patient_file.read_text()
print("File contents:")
print(content)

For binary files like medical images, use `read_bytes()` and `write_bytes()` methods.

In [None]:
# Simulate saving binary medical data
image_data = b"\x89PNG\r\n\x1a\n"  # PNG header bytes
image_path = raw_data / "xray_001.png"
image_path.write_bytes(image_data)
print(f"Binary file size: {image_path.stat().st_size} bytes")

Path objects can be converted to strings when needed for compatibility with other libraries.

In [None]:
import pandas as pd

# Create a sample CSV file
csv_path = processed_data / "lab_results.csv"
df = pd.DataFrame({
    'patient_id': [1, 2, 3],
    'glucose': [95, 110, 88]
})
df.to_csv(str(csv_path), index=False)
print(f"CSV saved to: {csv_path}")

Finally, let's clean up the directory structure we created for this demonstration.

In [None]:
import shutil

if project_root.exists():
    shutil.rmtree(project_root)
    print("Cleanup completed")

## Exercise

Create a medical data organization system using pathlib:

1. Create a directory structure with the following hierarchy:
   - `hospital_data/`
     - `2023/`
       - `radiology/`
       - `laboratory/`
       - `pharmacy/`

2. In each department folder, create 5 dummy files following the pattern: `{department}_report_{001-005}.txt`

3. Write a function that:
   - Takes a department name as input
   - Finds all files for that department
   - Creates a summary file listing all found files with their sizes
   - Saves the summary in a new `summaries/` directory

4. Use the `rglob()` method to find all `.txt` files across all departments and print their total count and combined size in bytes.