Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 71 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -319,6 +319,22 @@ class MyPermission(SentryPermission):
}
```

### Options System

Sentry uses a centralized options system where all options are registered in `src/sentry/options/defaults.py` with required default values.

```python
# CORRECT: options.get() without default - registered default is used
from sentry import options

batch_size = options.get("deletions.group-hash-metadata.batch-size")

# WRONG: Redundant default value
batch_size = options.get("deletions.group-hash-metadata.batch-size", 1000)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I keep getting suggested to do it this way when coding.

```

**Important**: Never suggest adding a default value to `options.get()` calls. All options are registered via `register()` in `defaults.py` which requires a default value. The options system will always return the registered default if no value is set, making a second default parameter redundant and potentially inconsistent.

### Logging Pattern

```python
Expand Down Expand Up @@ -394,6 +410,61 @@ class MultiProducer:
4. Add indexes for queries on 1M+ row tables
5. Use `db_index=True` or `db_index_together`

#### Composite Index Strategy: Match Your Query Patterns
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the most repeating issues I have seen in the deletions code yet I introduced the same problem my self! (using id and seer_matched_hash_id in the same query):
#102960 (comment)


**Critical Rule**: When writing a query that filters on multiple columns simultaneously, you MUST verify that a composite index exists covering those columns in the filter order.

**How to Identify When You Need a Composite Index:**

1. **Look for Multi-Column Filters**: Any query using multiple columns in `.filter()` or `WHERE` clause
2. **Check Index Coverage**: Verify the model's `Meta.indexes` includes those columns
3. **Consider Query Order**: Index column order should match the most selective filters first

**Common Patterns Requiring Composite Indexes:**

```python
# NEEDS COMPOSITE INDEX: Filtering on foreign_key_id AND id
Model.objects.filter(
foreign_key_id__in=ids, # First column
id__gt=last_id # Second column
)[:batch_size]
# Required: Index(fields=["foreign_key", "id"])

# NEEDS COMPOSITE INDEX: Status + timestamp range queries
Model.objects.filter(
status="open", # First column
created_at__gte=start # Second column
)
# Required: Index(fields=["status", "created_at"])

# NEEDS COMPOSITE INDEX: Org + project + type lookups
Model.objects.filter(
organization_id=org_id, # First column
project_id=proj_id, # Second column
type=event_type # Third column
)
# Required: Index(fields=["organization", "project", "type"])
```

**How to Check if Index Exists:**

1. Read the model file: Check the `Meta` class for `indexes = [...]`
2. Single foreign key gets auto-index, but **NOT** when combined with other filters
3. If you filter on FK + another column, you need explicit composite index

**Red Flags to Watch For:**

- Query uses `column1__in=[...]` AND `column2__gt/lt/gte/lte`
- Query filters on FK relationship PLUS primary key or timestamp
- Pagination queries combining filters with cursor-based `id__gt`
- Large IN clauses combined with range filters

**When in Doubt:**

1. Check production query performance in Sentry issues (slow query alerts)
2. Run `EXPLAIN ANALYZE` on similar queries against production-sized data
3. Add the composite index if table has 1M+ rows and query runs in loops/batches

## Anti-Patterns (NEVER DO)

### Backend
Expand Down