# Basic JSONL DataFrame Operations Example

This notebook demonstrates the basic operations of the `jsonldf` package with pandas DataFrames.
We'll create a simple dataset, save it to JSONL format, and perform various operations.

In [23]:
import pandas as pd
import sys
import os
from datetime import datetime

# Add parent directory to path to import jsonlfile
# sys.path.append(os.path.abspath(os.path.join(os.getcwd(), "../../")))




In [24]:
from jsonldb.jsonldf import (
    save_jsonldf, load_jsonldf, update_jsonldf,
    select_jsonldf, delete_jsonldf, lint_jsonldf
)

## Create Sample DataFrame

Let's create a simple DataFrame with user information.

In [25]:
# Create sample DataFrame
data = {
    'name': ['John', 'Alice', 'Bob', 'Carol'],
    'age': [30, 25, 35, 28],
    'city': ['New York', 'London', 'Paris', 'Tokyo']
}

df = pd.DataFrame(data)
df.index = [f'user_{i:03d}' for i in range(1, 5)]

print("Original DataFrame:")
print(df)

Original DataFrame:
           name  age      city
user_001   John   30  New York
user_002  Alice   25    London
user_003    Bob   35     Paris
user_004  Carol   28     Tokyo


## Save DataFrame to JSONL

Now we'll save our DataFrame to a JSONL file.

In [26]:
print("Saving DataFrame to JSONL...")
save_jsonldf('test_basic.jsonl', df)

Saving DataFrame to JSONL...


## Load and Verify Data

Let's load the data back from the JSONL file to verify it was saved correctly.

In [27]:
print("Loading DataFrame from JSONL:")
loaded_df = load_jsonldf('test_basic.jsonl')
print(loaded_df)

Loading DataFrame from JSONL:
           name  age      city
user_001   John   30  New York
user_002  Alice   25    London
user_003    Bob   35     Paris
user_004  Carol   28     Tokyo


## Update Records

Now we'll update some records in our dataset.

In [28]:
print("Updating records...")

# Create updates DataFrame
updates = pd.DataFrame({
    'name': ['John Updated', 'Alice Updated'],
    'age': [31, 26],
    'city': ['Boston', 'Manchester']
}, index=['user_001', 'user_002'])

update_jsonldf('test_basic.jsonl', updates)

# Verify updates
print("\nVerifying updates:")
updated_df = load_jsonldf('test_basic.jsonl')
print(updated_df)

Updating records...

Verifying updates:
                   name  age        city
user_003            Bob   35       Paris
user_004          Carol   28       Tokyo
user_001   John Updated   31      Boston
user_002  Alice Updated   26  Manchester


## Select Range of Records

Let's select a range of records from our dataset.

In [29]:
print("Selecting records in range (user_001 to user_002):")
selected_df = select_jsonldf('test_basic.jsonl', ('user_001', 'user_002'))
print(selected_df)

Selecting records in range (user_001 to user_002):
                   name  age        city
user_001   John Updated   31      Boston
user_002  Alice Updated   26  Manchester


## Delete Records

Now we'll delete some records from our dataset.

In [30]:
print("Deleting records (user_001, user_003)...")
delete_jsonldf('test_basic.jsonl', ['user_001', 'user_003'])

# Verify deletions
print("\nVerifying deletions:")
deleted_df = load_jsonldf('test_basic.jsonl')
print(deleted_df)

Deleting records (user_001, user_003)...

Verifying deletions:
                   name  age        city
user_004          Carol   28       Tokyo
user_002  Alice Updated   26  Manchester


## Lint the File

Finally, let's sort and clean our JSONL file.

In [31]:
print("Sorting and cleaning the file...")
lint_jsonldf('test_basic.jsonl')

# Show final state
print("\nFinal state after lint:")
final_df = load_jsonldf('test_basic.jsonl')
print(final_df)

Sorting and cleaning the file...

Final state after lint:
                   name  age        city
user_002  Alice Updated   26  Manchester
user_004          Carol   28       Tokyo


## Cleanup

Let's clean up our files.

In [32]:
print("Cleaning up...")
os.remove('test_basic.jsonl')
os.remove('test_basic.jsonl.idx')
print("Done!")

Cleaning up...
Done!
