# üìò P1.2.3.4 ‚Äì Python File Operations
## Topic: Working with Directories (pathlib, os)

## üéØ Learning Objectives
By the end of this notebook, you will:
- Navigate and manipulate file paths with `pathlib`
- Create and delete directories
- List directory contents
- Check if files/directories exist
- Use `os` module for directory operations
- Walk through directory trees

## üìÅ Why Directory Operations Matter
Real programs need to:
- Organize files in folders
- Check if files exist before reading
- Create output directories
- Process multiple files in a folder
- Clean up temporary files

## üÜï Modern Approach: pathlib (Recommended)
`pathlib` is the modern, object-oriented way to work with paths.

In [None]:
from pathlib import Path

# Current directory
current_dir = Path.cwd()
print(f"Current directory: {current_dir}")

# Create a path object
data_dir = Path("data")
print(f"Data directory: {data_dir}")
print(f"Absolute path: {data_dir.absolute()}")

## üìÇ Creating Directories

In [None]:
from pathlib import Path

# Create a single directory
Path("data").mkdir(exist_ok=True)

# Create nested directories
Path("data/output/results").mkdir(parents=True, exist_ok=True)

print("‚úÖ Directories created")

## ‚úÖ Checking if Files/Directories Exist

In [None]:
from pathlib import Path

data_path = Path("data")

print(f"Does 'data' exist? {data_path.exists()}")
print(f"Is it a directory? {data_path.is_dir()}")
print(f"Is it a file? {data_path.is_file()}")

# Check file existence
config_file = Path("data/config.txt")
if not config_file.exists():
    print(f"File {config_file} does not exist")

## üìã Listing Directory Contents

In [None]:
from pathlib import Path

# Ensure data directory exists
Path("data").mkdir(exist_ok=True)

# Create some sample files
Path("data/file1.txt").write_text("Sample 1")
Path("data/file2.txt").write_text("Sample 2")
Path("data/report.csv").write_text("Name,Age\nAlice,25")

# List all items
data_dir = Path("data")
print("All items in 'data':")
for item in data_dir.iterdir():
    print(f"  {item.name} ({'dir' if item.is_dir() else 'file'})")

## üîç Finding Specific Files 

In [None]:
from pathlib import Path

data_dir = Path("data")

# Find all .txt files
print("All .txt files:")
for txt_file in data_dir.glob("*.txt"):
    print(f"  {txt_file.name}")

# Recursive search (all directories)
print("\nAll .txt files (recursive):")
for txt_file in data_dir.rglob("*.txt"):
    print(f"  {txt_file}")

## üîß Path Manipulation

In [None]:
from pathlib import Path

file_path = Path("data/output/results/report.csv")

print(f"Full path: {file_path}")
print(f"File name: {file_path.name}")
print(f"File stem (no extension): {file_path.stem}")
print(f"Extension: {file_path.suffix}")
print(f"Parent directory: {file_path.parent}")

# Join paths safely (cross-platform)
new_path = Path("data") / "logs" / "app.log"
print(f"\nJoined path: {new_path}")

## üß™ Practical Example: Organize Files by Extension

In [None]:
from pathlib import Path

# Create mixed files
Path("data/doc1.txt").write_text("Document 1")
Path("data/doc2.txt").write_text("Document 2")
Path("data/data1.csv").write_text("A,B\n1,2")
Path("data/data2.csv").write_text("X,Y\n3,4")

def organize_files_by_extension(source_dir):
    source = Path(source_dir)
    
    for file in source.iterdir():
        if file.is_file():
            # Get extension without dot
            ext = file.suffix.lstrip('.')
            if ext:  # Only if there's an extension
                # Create folder for this extension
                ext_dir = source / ext
                ext_dir.mkdir(exist_ok=True)
                
                # Move file (rename with new path)
                new_path = ext_dir / file.name
                if not new_path.exists():
                    file.rename(new_path)
                    print(f"Moved: {file.name} ‚Üí {ext}/")

organize_files_by_extension("data")

# Show organized structure
print("\nOrganized structure:")
for item in Path("data").rglob("*"):
    if item.is_file():
        print(f"  {item.relative_to('data')}")

## üóÇÔ∏è Traditional Approach: os Module
The `os` module is older but still useful.

In [None]:
import os

# Current directory
print(f"Current dir: {os.getcwd()}")

# Check existence
print(f"'data' exists? {os.path.exists('data')}")
print(f"Is directory? {os.path.isdir('data')}")

# List directory
print("\nContents of 'data':")
for item in os.listdir("data"):
    print(f"  {item}")

## üö∂ Walking Through Directory Trees

In [None]:
import os

# os.walk() traverses entire directory tree
print("Directory tree:")
for root, dirs, files in os.walk("data"):
    level = root.replace("data", "").count(os.sep)
    indent = " " * 2 * level
    print(f"{indent}{os.path.basename(root)}/")
    sub_indent = " " * 2 * (level + 1)
    for file in files:
        print(f"{sub_indent}{file}")

## üÜö pathlib vs os: Quick Comparison

| Task | pathlib | os |
|---|---|---|
| Create directory | `Path('dir').mkdir()` | `os.makedirs('dir')` |
| Check exists | `Path('f').exists()` | `os.path.exists('f')` |
| List directory | `Path('d').iterdir()` | `os.listdir('d')` |
| Join paths | `Path('a') / 'b'` | `os.path.join('a', 'b')` |

**Recommendation:** Use `pathlib` for new code (cleaner, object-oriented)

## üõ†Ô∏è Directory Operations Best Practices
- Use `pathlib` for modern, readable code
- Always use `exist_ok=True` when creating directories
- Use `parents=True` for nested directory creation
- Check existence before deleting
- Use `/` operator (pathlib) for cross-platform paths
- Never hardcode absolute paths

### ‚úÖ Key Takeaways
- `pathlib.Path` is the modern way to work with paths
- Use `.mkdir(parents=True, exist_ok=True)` to create directories
- Use `.exists()`, `.is_file()`, `.is_dir()` to check paths
- Use `.glob()` to find files by pattern
- `os` module works but `pathlib` is cleaner
- **In AI/ML:** Organize datasets, create output folders, find training files, manage model checkpoints