# Performance Test for JSONL Operations

This notebook demonstrates the performance characteristics of various JSONL operations using the `jsonlfile` module. We'll test:

1. Saving large datasets
2. Loading datasets
3. Selecting records within a range
4. Updating records
5. Deleting records

We'll use a large dataset with 1 million records and measure the performance of each operation.

In [23]:
import sys
import os
import random
import time
from datetime import datetime, timedelta

from jsonldb.jsonlfile import save_jsonl, load_jsonl, select_jsonl, update_jsonl, delete_jsonl

## Helper Functions

First, let's define our helper functions for generating test data and timing operations.

In [24]:
def generate_random_record():
    """Generate a random record with consistent structure."""
    return {
        "timestamp": datetime.now().isoformat(),
        "value": random.randint(1, 1000000),
        "temperature": round(random.uniform(-10.0, 40.0), 2),
        "pressure": round(random.uniform(980, 1020), 2),
        "humidity": random.randint(30, 90),
        "status": random.choice(["normal", "warning", "critical", "unknown"]),
        "location": random.choice(["north", "south", "east", "west", "center"]),
        "tags": random.sample(["sensor", "validated", "raw", "filtered", "anomaly", 
                             "peak", "valley", "trend", "stable"], k=3)
    }

def time_operation(name, func, *args, **kwargs):
    """Time an operation and print results."""
    start = time.time()
    result = func(*args, **kwargs)
    elapsed = time.time() - start
    print(f"{name}: {elapsed:.3f} seconds")
    return result

## Test Parameters

Let's set up our test parameters. We'll use:
- 1,000,000 total records
- 10,000 records for selection
- 10,000 records for updates
- 10,000 records for deletion

In [25]:
# Test parameters
num_records = 1_000_000  # 1M records
select_range = 10_000    # Select 10K records
update_count = 10_000    # Update 10K records
delete_count = 10_000    # Delete 10K records

## Generate Test Data

Let's generate our test dataset with random records.

In [26]:
print(f"Generating {num_records:,} records...")
data = {
    f"key_{i:08d}": generate_random_record()
    for i in range(num_records)
}
print(f"Generated {len(data):,} records")

Generating 1,000,000 records...
Generated 1,000,000 records


## Performance Tests

Now let's run our performance tests for each operation.

### 1. Save Operation

Test saving the entire dataset to a JSONL file.

In [27]:
print("Testing save operation...")
time_operation("Save", save_jsonl, "perf_test.jsonl", data)

Testing save operation...
Save: 3.557 seconds


### 2. Load Operation

Test loading the entire dataset from the JSONL file.

In [28]:
print("Testing load operation...")
loaded_data = time_operation("Load", load_jsonl, "perf_test.jsonl")
print(f"Loaded {len(loaded_data):,} records")

Testing load operation...
Load: 7.071 seconds
Loaded 1,000,000 records


### 3. Select Operation

Test selecting a range of records from the dataset.

In [29]:
print("Testing select operation...")
start_key = f"key_{random.randint(0, num_records-select_range):08d}"
end_key = f"key_{int(start_key[4:]) + select_range:08d}"
selected = time_operation("Select", select_jsonl, "perf_test.jsonl", (start_key, end_key))
print(f"Selected {len(selected):,} records")



Testing select operation...
Select: 13.457 seconds
Selected 1,000,000 records


### 4. Update Operation

Test updating a subset of records in the dataset.

In [30]:
print("Testing update operation...")
updates = {
    f"key_{random.randint(0, num_records):08d}": generate_random_record()
    for _ in range(update_count)
}
time_operation("Update", update_jsonl, "perf_test.jsonl", updates)

Testing update operation...
Update: 2.458 seconds


### 5. Delete Operation

Test deleting a subset of records from the dataset.

In [31]:
print("Testing delete operation...")
delete_keys = [f"key_{random.randint(0, num_records):08d}" 
              for _ in range(delete_count)]
time_operation("Delete", delete_jsonl, "perf_test.jsonl", delete_keys)

Testing delete operation...
Delete: 23.444 seconds


## Performance Statistics

Let's calculate and display the final statistics.

In [32]:
print("Performance Statistics:")
print("-" * 50)
final_data = load_jsonl("perf_test.jsonl")
print(f"Final record count: {len(final_data):,}")
print(f"Records per operation:")
print(f"- Save: {num_records/1000:.1f}K records")
print(f"- Select: {select_range/1000:.1f}K records")
print(f"- Update: {update_count/1000:.1f}K records")
print(f"- Delete: {delete_count/1000:.1f}K records")

Performance Statistics:
--------------------------------------------------
Final record count: 990,062
Records per operation:
- Save: 1000.0K records
- Select: 10.0K records
- Update: 10.0K records
- Delete: 10.0K records


## Cleanup

Finally, let's clean up our test files.

In [33]:
print("Cleaning up...")
os.remove("perf_test.jsonl")
os.remove("perf_test.jsonl.idx")
print("Done!")

Cleaning up...
Done!
