repair_metadata OOMs on large repositories (1000+ packages)

## Problem

When `repair_metadata` processes a repository with 1000+ packages, peak memory reaches ~7.7GB against an 8GB worker limit. The worker becomes unresponsive, misses its heartbeat, and Pulp marks the task as "Worker has gone missing." This consistently fails on the same large repos.

## Root Cause

Three factors compound to create the memory spike:

1. **`BULK_SIZE = 1000`** — batch and metadata_batch lists accumulate up to 1000 items before flushing
2. **Double wheel read** — each wheel is read from S3 twice: once in `artifact_to_python_content_data` and again in `artifact_to_metadata_artifact`
3. **No file handle cleanup** — `artifact.file` handles are never explicitly closed, keeping buffered data in memory

## Proposed Fix

- Reduce `BULK_SIZE` from 1000 to 250
- Reuse the temp file from the first wheel read for metadata artifact creation
- Explicitly close artifact file handles after each iteration

Expected peak memory reduction: from ~7.7GB to ~2-3GB for a 1042-package repo.

## Evidence

Task failure from production:
```json
{
  "state": "failed",
  "error": {"reason": "Worker has gone missing."},
  "progress_reports": [{"total": 1042, "done": 833}]
}
```

Prometheus metrics show memory spiking from 1.5GB to 7.7GB (96.8% of 8GB limit) during the repair task.

Related: PULP-1573

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

repair_metadata OOMs on large repositories (1000+ packages) #1188

Problem

Root Cause

Proposed Fix

Evidence

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

repair_metadata OOMs on large repositories (1000+ packages) #1188

Description

Problem

Root Cause

Proposed Fix

Evidence

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions