Simplify task_migrate_institution_history_to_raw_institution by removing duplicate existence checks#1297
Conversation
…ory_to_raw_institution Co-authored-by: robertatakenaka <505143+robertatakenaka@users.noreply.github.com>
c120405
into
copilot/create-data-migration-task
There was a problem hiding this comment.
Pull request overview
This PR simplifies task_migrate_institution_history_to_raw_institution by removing redundant History-model imports and pre-migration .exists() checks, relying on the Journal migration methods instead.
Changes:
- Removed imports of History models previously used only for
.exists()checks. - Replaced conditional
.exists()-gated migration calls with unconditional migration calls and atotal_migratedcounter. - Updated the logic for incrementing
journals_processedbased ontotal_migrated.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Process all history types directly - the methods handle checking if there's data to migrate | ||
| total_migrated = 0 | ||
|
|
||
| # Only process if there are items to migrate | ||
| if has_publisher or has_owner or has_sponsor or has_copyright_holder: | ||
| # Process PublisherHistory | ||
| if has_publisher: | ||
| migrated = journal.migrate_publisher_history_to_raw() | ||
| stats["publisher_history_migrated"] += len(migrated or []) | ||
|
|
||
| # Process OwnerHistory | ||
| if has_owner: | ||
| migrated = journal.migrate_owner_history_to_raw() | ||
| stats["owner_history_migrated"] += len(migrated or []) | ||
|
|
||
| # Process SponsorHistory | ||
| if has_sponsor: | ||
| migrated = journal.migrate_sponsor_history_to_raw() | ||
| stats["sponsor_history_migrated"] += len(migrated or []) | ||
|
|
||
| # Process CopyrightHolderHistory | ||
| if has_copyright_holder: | ||
| migrated = journal.migrate_copyright_holder_history_to_raw() | ||
| stats["copyright_holder_history_migrated"] += len(migrated or []) | ||
|
|
||
| migrated = journal.migrate_publisher_history_to_raw() | ||
| stats["publisher_history_migrated"] += len(migrated or []) | ||
| total_migrated += len(migrated or []) | ||
|
|
||
| migrated = journal.migrate_owner_history_to_raw() | ||
| stats["owner_history_migrated"] += len(migrated or []) | ||
| total_migrated += len(migrated or []) | ||
|
|
||
| migrated = journal.migrate_sponsor_history_to_raw() | ||
| stats["sponsor_history_migrated"] += len(migrated or []) | ||
| total_migrated += len(migrated or []) | ||
|
|
||
| migrated = journal.migrate_copyright_holder_history_to_raw() | ||
| stats["copyright_holder_history_migrated"] += len(migrated or []) | ||
| total_migrated += len(migrated or []) | ||
|
|
||
| # Only count as processed if at least one history item was migrated | ||
| if total_migrated > 0: | ||
| stats["journals_processed"] += 1 |
There was a problem hiding this comment.
total_migrated is computed from len(migrated) where each migrate_*_history_to_raw() currently returns all history rows (it appends every history_item even when institution is None). This changes behavior: journals that have history rows but no institution set will be counted as journals_processed, and the *_history_migrated stats will be inflated (e.g., this will fail TestTaskMigrateInstitutionHistoryToRawInstitution.test_task_migrate_with_journal_without_institution). Consider counting only rows that actually had institution set (e.g., have the migrate_* methods return only migrated items / migrated count, or in the task use journal.<related>.filter(institution__isnull=False).count() for stats and the processed check).
| migrated = journal.migrate_publisher_history_to_raw() | ||
| stats["publisher_history_migrated"] += len(migrated or []) | ||
| total_migrated += len(migrated or []) | ||
|
|
||
| migrated = journal.migrate_owner_history_to_raw() | ||
| stats["owner_history_migrated"] += len(migrated or []) | ||
| total_migrated += len(migrated or []) | ||
|
|
||
| migrated = journal.migrate_sponsor_history_to_raw() | ||
| stats["sponsor_history_migrated"] += len(migrated or []) | ||
| total_migrated += len(migrated or []) | ||
|
|
||
| migrated = journal.migrate_copyright_holder_history_to_raw() | ||
| stats["copyright_holder_history_migrated"] += len(migrated or []) | ||
| total_migrated += len(migrated or []) |
There was a problem hiding this comment.
Calling all four migrate_*_history_to_raw() methods unconditionally will execute four full-table SELECTs over the related history managers and iterate every row, even when nothing needs migrating (e.g., all institution fields already NULL). Previously the task short-circuited those cases with cheap .exists() checks. To avoid a potential performance regression on journals with large history tables, consider filtering the querysets inside the migrate_* methods (only rows with institution__isnull=False) or otherwise skipping migration calls when there are no rows to migrate (without requiring extra model imports).
O que esse PR faz?
Remove verificações de existência duplicadas e imports desnecessários no
task_migrate_institution_history_to_raw_institution. Os métodos de migração já tratam internamente querysets vazios via_migrate_history_to_raw().Mudanças:
.exists()checks.exists()redundantes e lógica condicional aninhadaAntes:
Depois:
Onde a revisão poderia começar?
journal/tasks.pylinhas 15-21 (imports removidos) e 697-719 (lógica simplificada).Como este poderia ser testado manualmente?
Executar os testes existentes:
Os testes validam que apenas journals com histórico contendo instituições são contados como processados.
Algum cenário de contexto que queira dar?
O método
_migrate_history_to_raw()(journal/models.py:1252) já verifica sehistory_item.institutionexiste, retornando early se for None. As verificações.exists()na task eram redundantes e geravam queries extras desnecessárias.Screenshots
N/A - mudança de código backend.
Quais são tickets relevantes?
Issue relacionada ao problema de simplificação da task de migração.
Referências
journal/models.pylinhas 1239-1375: implementação dos métodos de migraçãojournal/tests.pylinhas 680-907: testes da funcionalidadeOriginal prompt
This pull request was created from Copilot chat.
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.