Determine unset mime types in previous scribes#25
Determine unset mime types in previous scribes#25shivankacker wants to merge 2 commits intomasterfrom
Conversation
WalkthroughA new Django migration adds a data migration that backfills ScribeFile.mime_type for records where it is null or empty. The migration derives a file extension from internal_name and uses mimetypes.guess_type plus type-specific defaults: audio/mpeg for SCRIBE_AUDIO, image/jpeg for SCRIBE_DOCUMENT, and application/octet-stream otherwise; each updated record is saved. The migration also alters the ScribeFile.mime_type field to CharField(max_length=200) (removing Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
This PR modifies the mime_type field in the ScribeFile model to be non-nullable, and includes a migration to populate missing mime_type values in existing records by inferring them from file extensions and types.
Changes:
- Removed
blank=True, null=Truefrom themime_typefield in theScribeFilemodel - Added a data migration to populate unset
mime_typevalues based on file extensions and file types before enforcing the non-null constraint
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| care_scribe/models/scribe_file.py | Updated mime_type field definition to remove nullable attributes |
| care_scribe/migrations/0012_alter_scribefile_mime_type.py | Added migration with data migration function to set missing mime types before schema change |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
care_scribe/migrations/0012_alter_scribefile_mime_type.py (1)
8-30: Considerbulk_updatefor better performance on large datasets.Calling
save()on each record individually results in N database queries. For large tables, this can make the migration very slow.♻️ Proposed refactor using bulk_update
def set_mime_types(apps, schema_editor): ScribeFile = apps.get_model('care_scribe', 'ScribeFile') + files_to_update = [] for scribe_file in ScribeFile.objects.filter(mime_type__isnull=True) | ScribeFile.objects.filter(mime_type=''): extension = os.path.splitext(scribe_file.internal_name)[1].lower() if scribe_file.file_type == 1: # SCRIBE_AUDIO mime_type = mimetypes.guess_type(f"file{extension}")[0] if not mime_type or not mime_type.startswith('audio/'): mime_type = 'audio/mpeg' scribe_file.mime_type = mime_type elif scribe_file.file_type == 2: # SCRIBE_DOCUMENT mime_type = mimetypes.guess_type(f"file{extension}")[0] if not mime_type or not mime_type.startswith('image/'): mime_type = 'image/jpeg' scribe_file.mime_type = mime_type else: mime_type = mimetypes.guess_type(f"file{extension}")[0] scribe_file.mime_type = mime_type or 'application/octet-stream' - scribe_file.save() + files_to_update.append(scribe_file) + + if files_to_update: + ScribeFile.objects.bulk_update(files_to_update, ['mime_type'], batch_size=1000)
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
care_scribe/migrations/0012_alter_scribefile_mime_type.pycare_scribe/models/scribe_file.py
🧰 Additional context used
🧬 Code graph analysis (1)
care_scribe/migrations/0012_alter_scribefile_mime_type.py (1)
care_scribe/models/scribe_file.py (2)
ScribeFile(14-57)save(43-54)
🪛 Ruff (0.14.10)
care_scribe/migrations/0012_alter_scribefile_mime_type.py
8-8: Unused function argument: schema_editor
(ARG001)
35-37: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
39-46: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
🔇 Additional comments (3)
care_scribe/migrations/0012_alter_scribefile_mime_type.py (2)
20-24: Verify defaultimage/jpegis appropriate forSCRIBE_DOCUMENT.The
SCRIBE_DOCUMENTfile type defaults toimage/jpeg, which suggests these are image-based documents. If documents could include PDFs or other non-image formats, this default may result in incorrect mime types being assigned.
33-46: Migration structure is correct.The operation ordering is appropriate: data backfill via
RunPythonexecutes beforeAlterFieldmakes the column non-nullable. Thenoopreverse operation means rollback won't undo the mime_type values, which is acceptable since the previous migration added the nullable field.care_scribe/models/scribe_file.py (1)
24-24: mime_type field now required — serializer enforces this through validation.The migration properly backfills existing records before making
mime_typenon-nullable. TheScribeFileUploadCreateSerializerautomatically enforcesmime_typeas required (inherited from the model field definition) and validates it againstALLOWED_MIME_TYPESin the create method. NewScribeFileinstances created through the API will fail serializer validation ifmime_typeis missing, preventingIntegrityError.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
care_scribe/migrations/0012_alter_scribefile_mime_type.py (1)
11-30: Consider usingbulk_updatefor better performance.Calling
.save()individually for each record issues a separate database query per record. For large datasets, this can be slow. Usingbulk_updatewould be more efficient.♻️ Proposed refactor using bulk_update
def set_mime_types(apps, schema_editor): ScribeFile = apps.get_model('care_scribe', 'ScribeFile') + files_to_update = [] for scribe_file in ScribeFile.objects.filter(models.Q(mime_type__isnull=True) | models.Q(mime_type='')): extension = os.path.splitext(scribe_file.internal_name)[1].lower() if scribe_file.file_type == 1: # SCRIBE_AUDIO mime_type = mimetypes.guess_type(f"file{extension}")[0] if not mime_type or not mime_type.startswith('audio/'): mime_type = 'audio/mpeg' scribe_file.mime_type = mime_type elif scribe_file.file_type == 2: # SCRIBE_DOCUMENT mime_type = mimetypes.guess_type(f"file{extension}")[0] if not mime_type or not mime_type.startswith('image/'): mime_type = 'image/jpeg' scribe_file.mime_type = mime_type else: mime_type = mimetypes.guess_type(f"file{extension}")[0] scribe_file.mime_type = mime_type or 'application/octet-stream' - scribe_file.save() + files_to_update.append(scribe_file) + + if files_to_update: + ScribeFile.objects.bulk_update(files_to_update, ['mime_type'], batch_size=1000)
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
care_scribe/migrations/0012_alter_scribefile_mime_type.py
🧰 Additional context used
🧬 Code graph analysis (1)
care_scribe/migrations/0012_alter_scribefile_mime_type.py (1)
care_scribe/models/scribe_file.py (2)
ScribeFile(14-57)save(43-54)
🪛 Ruff (0.14.10)
care_scribe/migrations/0012_alter_scribefile_mime_type.py
8-8: Unused function argument: schema_editor
(ARG001)
35-37: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
39-46: Mutable class attributes should be annotated with typing.ClassVar
(RUF012)
🔇 Additional comments (3)
care_scribe/migrations/0012_alter_scribefile_mime_type.py (3)
1-5: LGTM!Imports are appropriate for this data migration.
8-12: LGTM!The function signature follows Django's
RunPythonconvention (theschema_editorparameter is required even if unused), and the query correctly filters records needing backfill.
33-46: LGTM!The migration is correctly structured:
- Data backfill runs before the schema change (correct order for making a field non-null)
- Using
noopfor the reverse is acceptable since the original null/empty values were invalid- Dependencies are properly declared
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.