# Get fast feedback on transformations

Test transformation logic on sample rows before processing your entire dataset.

**What's in this recipe:**
- Query transformations without adding columns
- Test on sample rows with `.head()`
- Speed up your iteration cycle


## Problem

You want to test transformation logic on your data, but you need a way to preview results before processing everything.

**The challenge:** How do you verify your logic works correctly without either:
- Writing throwaway test code that you'll delete later
- Waiting for expensive operations (API calls, model inference) to run on your full dataset

```python
# Test a transformation on sample data
t.add_computed_column(result=expensive_transform(t.col))
# Processes all 10,000 rows... then you realize the logic is wrong
```

You need a pattern for fast iteration that doesn't require temporary code or processing the full dataset.


## Solution

**Without Pixeltable:** Write temporary test code on a subset, verify it works, then rewrite it for the full dataset—and hope you copied the logic correctly.

**With Pixeltable:** Use Pixeltable's Query-then-Commit pattern:

1. **Query**: Preview with `.select(expr).head()` 
2. **Commit**: Apply with `.add_computed_column(col=expr)`

Same expression in both steps means no transcription errors. Run expensive operations only after you've confirmed the logic works.

### Setup


In [None]:
%pip install -qU pixeltable

In [8]:
import pixeltable as pxt

In [9]:
# Create a fresh directory (drop existing if present)
pxt.drop_dir('demo_project', force=True)
pxt.create_dir('demo_project')

Created directory 'demo_project'.


<pixeltable.catalog.dir.Dir at 0x16f3c3350>

### Create sample data


In [10]:
t = pxt.create_table('demo_project.lyrics', {'text': pxt.String})
t.insert([
    {'text': 'Tumble out of bed and I stumble to the kitchen'},
    {'text': 'Pour myself a cup of ambition'},
    {'text': 'And yawn and stretch and try to come to life'},
    {'text': "Jump in the shower and the blood starts pumpin'"},
    {'text': "Out on the street, the traffic starts jumpin'"},
    {'text': 'With folks like me on the job from nine to five'}
])

print(f"Total rows: {t.count()}")

Created table 'lyrics'.
Inserting rows into `lyrics`: 6 rows [00:00, 1747.87 rows/s]
Inserted 6 rows with 0 errors.
Total rows: 6


### Example 1: Built-in string methods

Query-then-Commit with built-in functions.


In [11]:
# Query: Test uppercase transformation on subset
t.select(
    t.text,
    uppercase=t.text.upper()
).head(2)

text,uppercase
Tumble out of bed and I stumble to the kitchen,TUMBLE OUT OF BED AND I STUMBLE TO THE KITCHEN
Pour myself a cup of ambition,POUR MYSELF A CUP OF AMBITION


In [12]:
# Commit: Apply to all rows (same expression)
t.add_computed_column(uppercase=t.text.upper())

t.select(t.text, t.uppercase).show()

Added 6 column values with 0 errors.


text,uppercase
Tumble out of bed and I stumble to the kitchen,TUMBLE OUT OF BED AND I STUMBLE TO THE KITCHEN
Pour myself a cup of ambition,POUR MYSELF A CUP OF AMBITION
And yawn and stretch and try to come to life,AND YAWN AND STRETCH AND TRY TO COME TO LIFE
Jump in the shower and the blood starts pumpin',JUMP IN THE SHOWER AND THE BLOOD STARTS PUMPIN'
"Out on the street, the traffic starts jumpin'","OUT ON THE STREET, THE TRAFFIC STARTS JUMPIN'"
With folks like me on the job from nine to five,WITH FOLKS LIKE ME ON THE JOB FROM NINE TO FIVE


### Example 2: Custom UDF

Query-then-Commit with a user-defined function.


In [13]:
# Define a custom transformation
@pxt.udf
def word_count(text: str) -> int:
    return len(text.split())


In [14]:
# Query: Test UDF on subset
t.select(
    t.text,
    word_count=word_count(t.text)
).head(2)


text,word_count
Tumble out of bed and I stumble to the kitchen,10
Pour myself a cup of ambition,6


In [15]:
# Commit: Apply to all rows (same expression)
t.add_computed_column(word_count=word_count(t.text))

t.select(t.text, t.word_count).show()


Added 6 column values with 0 errors.


text,word_count
Tumble out of bed and I stumble to the kitchen,10
Pour myself a cup of ambition,6
And yawn and stretch and try to come to life,10
Jump in the shower and the blood starts pumpin',9
"Out on the street, the traffic starts jumpin'",8
With folks like me on the job from nine to five,11


## Explanation

**How Query-then-Commit works:**

1. **Query** - Use `.select()` with `.head()` to test logic on a subset
2. **Commit** - Use `.add_computed_column()` with the exact same expression

**Why it matters:**
- Test on 2-3 rows instead of your full dataset
- No throwaway code to delete later
- No transcription errors (same expression in both steps)
- Iterate faster, especially with expensive operations (API calls, model inference)


## See also

- [Transform images with PIL operations](./img-pil-transforms.ipynb)
- [Convert RGB images to grayscale](./img-rgb-to-grayscale.ipynb)
- [Apply filters to images](./img-apply-filters.ipynb)
