# Advanced Usage

This notebook covers advanced use cases and tips for getting the most out of iops-profiler.

## Setup

In [None]:
%load_ext iops_profiler

import tempfile
import os
import shutil

test_dir = tempfile.mkdtemp()
print(f"Test directory: {test_dir}")

## Profiling Data Science Workflows

Let's see how iops-profiler can help optimize common data science operations.

### JSON File I/O

In [None]:
import json

# Create test data
test_data = {
    'users': [{'id': i, 'name': f'user_{i}', 'value': i * 1.5} for i in range(1000)]
}

In [None]:
%%iops
# Profile JSON writing
json_file = os.path.join(test_dir, 'data.json')
with open(json_file, 'w') as f:
    json.dump(test_data, f)

In [None]:
%%iops
# Profile JSON reading
with open(json_file, 'r') as f:
    loaded_data = json.load(f)

### Comparing Text vs Binary Formats

In [None]:
import pickle

# Text format (JSON)
print("JSON (text format):")
json_file = os.path.join(test_dir, 'data.json')
%iops json.dump(test_data, open(json_file, 'w'))

print("\nPickle (binary format):")
pickle_file = os.path.join(test_dir, 'data.pkl')
%iops pickle.dump(test_data, open(pickle_file, 'wb'))

## Dealing with Caching

Operating system caching can significantly affect I/O measurements. Here's how to work with it.

In [None]:
# Create a test file
cache_test_file = os.path.join(test_dir, 'cache_test.txt')
with open(cache_test_file, 'w') as f:
    f.write('test data' * 10000)

In [None]:
%%iops
# First read - may hit disk
with open(cache_test_file, 'r') as f:
    data1 = f.read()
print("First read (cold cache)")

In [None]:
%%iops
# Second read - likely cached
with open(cache_test_file, 'r') as f:
    data2 = f.read()
print("Second read (warm cache)")

Notice how the second read might show fewer or zero I/O operations because the data is cached in memory.

**Tip:** To get consistent measurements, consider:
1. Using a "warm-up" run before profiling
2. Using different files for each test
3. Flushing caches between tests (requires system permissions)

## Forcing Synchronous I/O

To ensure data is actually written to disk (not just buffered), use `fsync`.

In [None]:
%%iops
# Regular write (may be buffered)
regular_file = os.path.join(test_dir, 'regular.txt')
with open(regular_file, 'w') as f:
    f.write('data' * 10000)
print("Regular write")

In [None]:
%%iops
# Synchronous write (forces disk write)
sync_file = os.path.join(test_dir, 'sync.txt')
f = open(sync_file, 'w')
f.write('data' * 10000)
f.flush()
os.fsync(f.fileno())
f.close()
print("Synchronous write with fsync")

The synchronous write may show more I/O operations and take longer because it ensures data reaches the disk.

## Profiling Multiple Files

When working with multiple files, iops-profiler tracks all I/O operations in the profiled cell.

In [None]:
%%iops
# Work with multiple files simultaneously
for i in range(10):
    filename = os.path.join(test_dir, f'multi_{i}.txt')
    with open(filename, 'w') as f:
        f.write(f'File {i}: ' + 'content' * 100)

## Optimization Example: Chunked Writing

Let's compare different approaches to writing large amounts of data.

In [None]:
# Prepare test data
large_data = 'x' * 1000000  # 1 MB of data

In [None]:
%%iops
# Approach 1: Write all at once
file1 = os.path.join(test_dir, 'approach1.txt')
with open(file1, 'w') as f:
    f.write(large_data)
print("Approach 1: Single write")

In [None]:
%%iops
# Approach 2: Write in chunks
file2 = os.path.join(test_dir, 'approach2.txt')
chunk_size = 1000
with open(file2, 'w') as f:
    for i in range(0, len(large_data), chunk_size):
        f.write(large_data[i:i+chunk_size])
print("Approach 2: Chunked writes (1000 bytes)")

In [None]:
%%iops
# Approach 3: Write in larger chunks
file3 = os.path.join(test_dir, 'approach3.txt')
chunk_size = 10000
with open(file3, 'w') as f:
    for i in range(0, len(large_data), chunk_size):
        f.write(large_data[i:i+chunk_size])
print("Approach 3: Chunked writes (10000 bytes)")

Compare:
- Which approach has the best throughput?
- How do operation counts differ?
- Is there a sweet spot for chunk size?

## Memory-Mapped Files

Memory-mapped files can provide very different I/O patterns.

In [None]:
import mmap

# Create a file for memory mapping
mmap_file = os.path.join(test_dir, 'mmap_test.dat')
with open(mmap_file, 'wb') as f:
    f.write(b'\0' * 1000000)  # 1 MB of zeros

In [None]:
%%iops
# Regular file writing
with open(mmap_file, 'r+b') as f:
    f.seek(500000)
    f.write(b'x' * 1000)
print("Regular file write")

In [None]:
%%iops
# Memory-mapped file writing
with open(mmap_file, 'r+b') as f:
    mm = mmap.mmap(f.fileno(), 0)
    mm[500000:501000] = b'y' * 1000
    mm.close()
print("Memory-mapped write")

Memory-mapped I/O may show different characteristics because it uses virtual memory paging.

## Tips for Accurate Profiling

### 1. Warm Up Before Measuring

In [None]:
# Warm-up run (don't profile)
warmup_file = os.path.join(test_dir, 'warmup.txt')
with open(warmup_file, 'w') as f:
    f.write('warmup' * 1000)

# Now profile
%iops open(warmup_file, 'r').read()

### 2. Use Fresh Files for Each Test

In [None]:
# Test 1
%iops open(os.path.join(test_dir, 'test1.txt'), 'w').write('data' * 1000)

# Test 2 - use a different file
%iops open(os.path.join(test_dir, 'test2.txt'), 'w').write('data' * 1000)

### 3. Run Multiple Iterations

I/O performance can vary. Run multiple times and look at the average.

In [None]:
print("Run 1:")
%iops open(os.path.join(test_dir, 'run1.txt'), 'w').write('data' * 1000)

print("\nRun 2:")
%iops open(os.path.join(test_dir, 'run2.txt'), 'w').write('data' * 1000)

print("\nRun 3:")
%iops open(os.path.join(test_dir, 'run3.txt'), 'w').write('data' * 1000)

## Cleanup

In [None]:
shutil.rmtree(test_dir)
print("Cleanup complete!")

## Summary

In this notebook, we covered:

1. **Data Science Workflows**: Profiling JSON, pickle, and other data formats
2. **Caching Effects**: Understanding and working with OS caching
3. **Synchronous I/O**: Forcing data to disk with fsync
4. **Multi-File Operations**: Tracking I/O across multiple files
5. **Optimization**: Comparing different writing strategies
6. **Memory-Mapped Files**: Alternative I/O mechanisms
7. **Best Practices**: Tips for accurate and reproducible profiling

Key takeaways:
- Always warm up before measuring for consistency
- Be aware of caching effects
- Use multiple runs to account for variability
- Compare relative performance rather than absolute numbers
- Use appropriate file sizes for your use case

Next steps:
- Apply these techniques to your own I/O-intensive code
- Experiment with different strategies for your specific use case
- Use histogram mode to understand operation distributions