# Histogram Visualization

This notebook demonstrates how to use the histogram feature to visualize I/O operation distributions.

**Note:** Histogram mode is available on Linux (with strace) and macOS (with fs_usage), but not on Windows.

## Setup

Load the extension and prepare our test environment.

In [None]:
%load_ext iops_profiler

In [None]:
import tempfile
import os
import shutil

# Create a temporary directory
test_dir = tempfile.mkdtemp()
print(f"Working directory: {test_dir}")

## Basic Histogram Example

Let's start with a simple example that creates files of different sizes. The `--histogram` flag enables visualization.

In [None]:
%%iops --histogram
# Create files with varying sizes
for i in range(5):
    filename = os.path.join(test_dir, f'file_{i}.txt')
    # Size increases exponentially: 1KB, 10KB, 100KB, 1MB, 10MB
    size = 1024 * (10 ** i)
    with open(filename, 'w') as f:
        f.write('x' * size)

The histogram shows two charts:

1. **Operation Count Distribution**: How many operations fall into each size bucket
2. **Total Bytes Distribution**: Total bytes transferred in each size bucket

Both use logarithmic scale on the x-axis to show the wide range of operation sizes.

## Read Operations Histogram

Now let's read the files back and see the read operation distribution.

In [None]:
%%iops --histogram
# Read files of different sizes
for i in range(5):
    filename = os.path.join(test_dir, f'file_{i}.txt')
    with open(filename, 'r') as f:
        content = f.read()

Notice how the distribution might differ from writes:
- Operating system may cache recently written data
- Read buffering strategies may differ from write buffering
- Some reads might be satisfied from memory cache

## Mixed Read/Write Operations

Let's see what happens when we mix read and write operations.

In [None]:
%%iops --histogram
# Write small files
for i in range(10):
    small_file = os.path.join(test_dir, f'small_{i}.txt')
    with open(small_file, 'w') as f:
        f.write('data' * 256)  # ~1KB each

# Write medium files
for i in range(5):
    medium_file = os.path.join(test_dir, f'medium_{i}.txt')
    with open(medium_file, 'w') as f:
        f.write('data' * 2560)  # ~10KB each

# Write large file
large_file = os.path.join(test_dir, 'large.txt')
with open(large_file, 'w') as f:
    f.write('data' * 256000)  # ~1MB

# Now read some files back
for i in range(5):
    with open(os.path.join(test_dir, f'small_{i}.txt'), 'r') as f:
        _ = f.read()

The histogram now shows separate lines for:
- **Reads** (one color)
- **Writes** (another color)  
- **All operations** combined (third color)

This makes it easy to see how read and write patterns differ.

## Analyzing Buffer Size Impact

One practical use of histograms is to analyze how buffer sizes affect I/O patterns.

In [None]:
%%iops --histogram
# Small buffer size (default)
test_file = os.path.join(test_dir, 'buffer_test.txt')
with open(test_file, 'w') as f:
    for i in range(1000):
        f.write('x' * 100)

In [None]:
%%iops --histogram
# Larger buffer size
test_file_buffered = os.path.join(test_dir, 'buffer_test_large.txt')
with open(test_file_buffered, 'w', buffering=8192) as f:
    for i in range(1000):
        f.write('x' * 100)

Compare the two histograms:
- The larger buffer may result in fewer, larger operations
- This can improve throughput but increase latency
- The histogram makes the difference visually clear

## Real-World Example: CSV Writing

Let's look at a more realistic scenario - writing CSV data.

In [None]:
%%iops --histogram
import csv

csv_file = os.path.join(test_dir, 'data.csv')
with open(csv_file, 'w', newline='') as f:
    writer = csv.writer(f)
    # Write header
    writer.writerow(['id', 'name', 'value', 'description'])
    # Write data rows
    for i in range(1000):
        writer.writerow([i, f'item_{i}', i * 1.5, f'Description for item {i}'])

The histogram reveals:
- How the CSV writer batches operations
- Whether writes are uniform or variable in size
- Opportunities for optimization (e.g., adjusting buffer size)

## Understanding the Histogram

### X-axis: Bytes per Operation (log scale)
Shows the size of individual I/O operations. The logarithmic scale allows you to see both tiny (< 1KB) and large (> 1MB) operations on the same chart.

### Y-axis (Top chart): Operation Count
How many operations fall into each size bucket. Helps identify the most common operation sizes.

### Y-axis (Bottom chart): Total Bytes
Total bytes transferred in each size bucket. Shows which operation sizes contribute most to overall data transfer.

### Interpretation Tips
- **Many small operations**: May indicate inefficient buffering
- **Few large operations**: Usually more efficient for throughput
- **Bimodal distribution**: Suggests different types of operations (e.g., metadata vs. data)
- **Read vs. Write differences**: May reveal caching or buffering strategies

## Cleanup

In [None]:
shutil.rmtree(test_dir)
print("Cleanup complete!")

## Summary

In this notebook, we learned:

1. How to enable histogram visualization with `--histogram`
2. Interpreting operation count and bytes distribution charts
3. Analyzing read vs. write patterns
4. Using histograms to optimize buffer sizes
5. Applying histogram analysis to real-world scenarios

Histograms are particularly useful for:
- Understanding I/O patterns in complex code
- Identifying inefficiencies (many small operations)
- Optimizing buffer and chunk sizes
- Comparing different implementation strategies

**Remember:** Histogram mode is only available on Linux and macOS, not on Windows.