Skip to content

Determine unset mime types in previous scribes#25

Open
shivankacker wants to merge 2 commits intomasterfrom
mime-type-migration
Open

Determine unset mime types in previous scribes#25
shivankacker wants to merge 2 commits intomasterfrom
mime-type-migration

Conversation

@shivankacker
Copy link
Copy Markdown
Member

@shivankacker shivankacker commented Jan 13, 2026

Summary by CodeRabbit

  • Chores
    • Added a data migration to backfill missing MIME types for existing file records.
    • Made the MIME type field required on file records and assigned intelligent defaults based on file type: audio → audio/mpeg, documents/images → image/jpeg, others → application/octet-stream.

✏️ Tip: You can customize this high-level summary in your review settings.

Copilot AI review requested due to automatic review settings January 13, 2026 12:24
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 13, 2026

Walkthrough

A new Django migration adds a data migration that backfills ScribeFile.mime_type for records where it is null or empty. The migration derives a file extension from internal_name and uses mimetypes.guess_type plus type-specific defaults: audio/mpeg for SCRIBE_AUDIO, image/jpeg for SCRIBE_DOCUMENT, and application/octet-stream otherwise; each updated record is saved. The migration also alters the ScribeFile.mime_type field to CharField(max_length=200) (removing blank=True, null=True).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: implementing a data migration to backfill mime_type values for ScribeFile records that previously had unset (null or empty) mime types.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch mime-type-migration

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR modifies the mime_type field in the ScribeFile model to be non-nullable, and includes a migration to populate missing mime_type values in existing records by inferring them from file extensions and types.

Changes:

  • Removed blank=True, null=True from the mime_type field in the ScribeFile model
  • Added a data migration to populate unset mime_type values based on file extensions and file types before enforcing the non-null constraint

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
care_scribe/models/scribe_file.py Updated mime_type field definition to remove nullable attributes
care_scribe/migrations/0012_alter_scribefile_mime_type.py Added migration with data migration function to set missing mime types before schema change

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
care_scribe/migrations/0012_alter_scribefile_mime_type.py (1)

8-30: Consider bulk_update for better performance on large datasets.

Calling save() on each record individually results in N database queries. For large tables, this can make the migration very slow.

♻️ Proposed refactor using bulk_update
 def set_mime_types(apps, schema_editor):
     ScribeFile = apps.get_model('care_scribe', 'ScribeFile')
+    files_to_update = []
 
     for scribe_file in ScribeFile.objects.filter(mime_type__isnull=True) | ScribeFile.objects.filter(mime_type=''):
         extension = os.path.splitext(scribe_file.internal_name)[1].lower()
 
         if scribe_file.file_type == 1:  # SCRIBE_AUDIO
             mime_type = mimetypes.guess_type(f"file{extension}")[0]
             if not mime_type or not mime_type.startswith('audio/'):
                 mime_type = 'audio/mpeg'
             scribe_file.mime_type = mime_type
 
         elif scribe_file.file_type == 2:  # SCRIBE_DOCUMENT
             mime_type = mimetypes.guess_type(f"file{extension}")[0]
             if not mime_type or not mime_type.startswith('image/'):
                 mime_type = 'image/jpeg'
             scribe_file.mime_type = mime_type
 
         else:
             mime_type = mimetypes.guess_type(f"file{extension}")[0]
             scribe_file.mime_type = mime_type or 'application/octet-stream'
 
-        scribe_file.save()
+        files_to_update.append(scribe_file)
+
+    if files_to_update:
+        ScribeFile.objects.bulk_update(files_to_update, ['mime_type'], batch_size=1000)
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2404711 and dfff400.

📒 Files selected for processing (2)
  • care_scribe/migrations/0012_alter_scribefile_mime_type.py
  • care_scribe/models/scribe_file.py
🧰 Additional context used
🧬 Code graph analysis (1)
care_scribe/migrations/0012_alter_scribefile_mime_type.py (1)
care_scribe/models/scribe_file.py (2)
  • ScribeFile (14-57)
  • save (43-54)
🪛 Ruff (0.14.10)
care_scribe/migrations/0012_alter_scribefile_mime_type.py

8-8: Unused function argument: schema_editor

(ARG001)


35-37: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


39-46: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

🔇 Additional comments (3)
care_scribe/migrations/0012_alter_scribefile_mime_type.py (2)

20-24: Verify default image/jpeg is appropriate for SCRIBE_DOCUMENT.

The SCRIBE_DOCUMENT file type defaults to image/jpeg, which suggests these are image-based documents. If documents could include PDFs or other non-image formats, this default may result in incorrect mime types being assigned.


33-46: Migration structure is correct.

The operation ordering is appropriate: data backfill via RunPython executes before AlterField makes the column non-nullable. The noop reverse operation means rollback won't undo the mime_type values, which is acceptable since the previous migration added the nullable field.

care_scribe/models/scribe_file.py (1)

24-24: mime_type field now required — serializer enforces this through validation.

The migration properly backfills existing records before making mime_type non-nullable. The ScribeFileUploadCreateSerializer automatically enforces mime_type as required (inherited from the model field definition) and validates it against ALLOWED_MIME_TYPES in the create method. New ScribeFile instances created through the API will fail serializer validation if mime_type is missing, preventing IntegrityError.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
care_scribe/migrations/0012_alter_scribefile_mime_type.py (1)

11-30: Consider using bulk_update for better performance.

Calling .save() individually for each record issues a separate database query per record. For large datasets, this can be slow. Using bulk_update would be more efficient.

♻️ Proposed refactor using bulk_update
 def set_mime_types(apps, schema_editor):
     ScribeFile = apps.get_model('care_scribe', 'ScribeFile')
+    files_to_update = []
 
     for scribe_file in ScribeFile.objects.filter(models.Q(mime_type__isnull=True) | models.Q(mime_type='')):
         extension = os.path.splitext(scribe_file.internal_name)[1].lower()
 
         if scribe_file.file_type == 1:  # SCRIBE_AUDIO
             mime_type = mimetypes.guess_type(f"file{extension}")[0]
             if not mime_type or not mime_type.startswith('audio/'):
                 mime_type = 'audio/mpeg'
             scribe_file.mime_type = mime_type
 
         elif scribe_file.file_type == 2:  # SCRIBE_DOCUMENT
             mime_type = mimetypes.guess_type(f"file{extension}")[0]
             if not mime_type or not mime_type.startswith('image/'):
                 mime_type = 'image/jpeg'
             scribe_file.mime_type = mime_type
 
         else:
             mime_type = mimetypes.guess_type(f"file{extension}")[0]
             scribe_file.mime_type = mime_type or 'application/octet-stream'
 
-        scribe_file.save()
+        files_to_update.append(scribe_file)
+
+    if files_to_update:
+        ScribeFile.objects.bulk_update(files_to_update, ['mime_type'], batch_size=1000)
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dfff400 and 81778d9.

📒 Files selected for processing (1)
  • care_scribe/migrations/0012_alter_scribefile_mime_type.py
🧰 Additional context used
🧬 Code graph analysis (1)
care_scribe/migrations/0012_alter_scribefile_mime_type.py (1)
care_scribe/models/scribe_file.py (2)
  • ScribeFile (14-57)
  • save (43-54)
🪛 Ruff (0.14.10)
care_scribe/migrations/0012_alter_scribefile_mime_type.py

8-8: Unused function argument: schema_editor

(ARG001)


35-37: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)


39-46: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

🔇 Additional comments (3)
care_scribe/migrations/0012_alter_scribefile_mime_type.py (3)

1-5: LGTM!

Imports are appropriate for this data migration.


8-12: LGTM!

The function signature follows Django's RunPython convention (the schema_editor parameter is required even if unused), and the query correctly filters records needing backfill.


33-46: LGTM!

The migration is correctly structured:

  • Data backfill runs before the schema change (correct order for making a field non-null)
  • Using noop for the reverse is acceptable since the original null/empty values were invalid
  • Dependencies are properly declared

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants