Determine unset mime types in previous scribes by shivankacker · Pull Request #25 · ohcnetwork/care_scribe

shivankacker · 2026-01-13T12:24:58Z

Summary by CodeRabbit

Chores
- Added a data migration to backfill missing MIME types for existing file records.
- Made the MIME type field required on file records and assigned intelligent defaults based on file type: audio → audio/mpeg, documents/images → image/jpeg, others → application/octet-stream.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-13T12:25:10Z

Walkthrough

A new Django migration adds a data migration that backfills ScribeFile.mime_type for records where it is null or empty. The migration derives a file extension from internal_name and uses mimetypes.guess_type plus type-specific defaults: audio/mpeg for SCRIBE_AUDIO, image/jpeg for SCRIBE_DOCUMENT, and application/octet-stream otherwise; each updated record is saved. The migration also alters the ScribeFile.mime_type field to CharField(max_length=200) (removing blank=True, null=True).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: implementing a data migration to backfill mime_type values for ScribeFile records that previously had unset (null or empty) mime types.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch mime-type-migration

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR modifies the mime_type field in the ScribeFile model to be non-nullable, and includes a migration to populate missing mime_type values in existing records by inferring them from file extensions and types.

Changes:

Removed blank=True, null=True from the mime_type field in the ScribeFile model
Added a data migration to populate unset mime_type values based on file extensions and file types before enforcing the non-null constraint

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
care_scribe/models/scribe_file.py	Updated `mime_type` field definition to remove nullable attributes
care_scribe/migrations/0012_alter_scribefile_mime_type.py	Added migration with data migration function to set missing mime types before schema change

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

care_scribe/migrations/0012_alter_scribefile_mime_type.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

care_scribe/migrations/0012_alter_scribefile_mime_type.py (1)

8-30: Consider bulk_update for better performance on large datasets.

Calling save() on each record individually results in N database queries. For large tables, this can make the migration very slow.

♻️ Proposed refactor using bulk_update

 def set_mime_types(apps, schema_editor):
     ScribeFile = apps.get_model('care_scribe', 'ScribeFile')
+    files_to_update = []
 
     for scribe_file in ScribeFile.objects.filter(mime_type__isnull=True) | ScribeFile.objects.filter(mime_type=''):
         extension = os.path.splitext(scribe_file.internal_name)[1].lower()
 
         if scribe_file.file_type == 1:  # SCRIBE_AUDIO
             mime_type = mimetypes.guess_type(f"file{extension}")[0]
             if not mime_type or not mime_type.startswith('audio/'):
                 mime_type = 'audio/mpeg'
             scribe_file.mime_type = mime_type
 
         elif scribe_file.file_type == 2:  # SCRIBE_DOCUMENT
             mime_type = mimetypes.guess_type(f"file{extension}")[0]
             if not mime_type or not mime_type.startswith('image/'):
                 mime_type = 'image/jpeg'
             scribe_file.mime_type = mime_type
 
         else:
             mime_type = mimetypes.guess_type(f"file{extension}")[0]
             scribe_file.mime_type = mime_type or 'application/octet-stream'
 
-        scribe_file.save()
+        files_to_update.append(scribe_file)
+
+    if files_to_update:
+        ScribeFile.objects.bulk_update(files_to_update, ['mime_type'], batch_size=1000)

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2404711 and dfff400.

📒 Files selected for processing (2)

care_scribe/migrations/0012_alter_scribefile_mime_type.py
care_scribe/models/scribe_file.py

🧰 Additional context used

🧬 Code graph analysis (1)

care_scribe/migrations/0012_alter_scribefile_mime_type.py (1)

care_scribe/models/scribe_file.py (2)

ScribeFile (14-57)

save (43-54)

🪛 Ruff (0.14.10)

care_scribe/migrations/0012_alter_scribefile_mime_type.py

8-8: Unused function argument: schema_editor

(ARG001)

35-37: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

39-46: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

🔇 Additional comments (3)

care_scribe/migrations/0012_alter_scribefile_mime_type.py (2)

20-24: Verify default image/jpeg is appropriate for SCRIBE_DOCUMENT.

The SCRIBE_DOCUMENT file type defaults to image/jpeg, which suggests these are image-based documents. If documents could include PDFs or other non-image formats, this default may result in incorrect mime types being assigned.

33-46: Migration structure is correct.

The operation ordering is appropriate: data backfill via RunPython executes before AlterField makes the column non-nullable. The noop reverse operation means rollback won't undo the mime_type values, which is acceptable since the previous migration added the nullable field.

care_scribe/models/scribe_file.py (1)

24-24: mime_type field now required — serializer enforces this through validation.

The migration properly backfills existing records before making mime_type non-nullable. The ScribeFileUploadCreateSerializer automatically enforces mime_type as required (inherited from the model field definition) and validates it against ALLOWED_MIME_TYPES in the create method. New ScribeFile instances created through the API will fail serializer validation if mime_type is missing, preventing IntegrityError.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

care_scribe/migrations/0012_alter_scribefile_mime_type.py (1)

11-30: Consider using bulk_update for better performance.

Calling .save() individually for each record issues a separate database query per record. For large datasets, this can be slow. Using bulk_update would be more efficient.

♻️ Proposed refactor using bulk_update

 def set_mime_types(apps, schema_editor):
     ScribeFile = apps.get_model('care_scribe', 'ScribeFile')
+    files_to_update = []
 
     for scribe_file in ScribeFile.objects.filter(models.Q(mime_type__isnull=True) | models.Q(mime_type='')):
         extension = os.path.splitext(scribe_file.internal_name)[1].lower()
 
         if scribe_file.file_type == 1:  # SCRIBE_AUDIO
             mime_type = mimetypes.guess_type(f"file{extension}")[0]
             if not mime_type or not mime_type.startswith('audio/'):
                 mime_type = 'audio/mpeg'
             scribe_file.mime_type = mime_type
 
         elif scribe_file.file_type == 2:  # SCRIBE_DOCUMENT
             mime_type = mimetypes.guess_type(f"file{extension}")[0]
             if not mime_type or not mime_type.startswith('image/'):
                 mime_type = 'image/jpeg'
             scribe_file.mime_type = mime_type
 
         else:
             mime_type = mimetypes.guess_type(f"file{extension}")[0]
             scribe_file.mime_type = mime_type or 'application/octet-stream'
 
-        scribe_file.save()
+        files_to_update.append(scribe_file)
+
+    if files_to_update:
+        ScribeFile.objects.bulk_update(files_to_update, ['mime_type'], batch_size=1000)

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dfff400 and 81778d9.

📒 Files selected for processing (1)

care_scribe/migrations/0012_alter_scribefile_mime_type.py

🧰 Additional context used

🧬 Code graph analysis (1)

care_scribe/migrations/0012_alter_scribefile_mime_type.py (1)

care_scribe/models/scribe_file.py (2)

ScribeFile (14-57)

save (43-54)

🪛 Ruff (0.14.10)

care_scribe/migrations/0012_alter_scribefile_mime_type.py

8-8: Unused function argument: schema_editor

(ARG001)

35-37: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

39-46: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

🔇 Additional comments (3)

care_scribe/migrations/0012_alter_scribefile_mime_type.py (3)

1-5: LGTM!

Imports are appropriate for this data migration.

8-12: LGTM!

The function signature follows Django's RunPython convention (the schema_editor parameter is required even if unused), and the query correctly filters records needing backfill.

33-46: LGTM!

The migration is correctly structured:

Data backfill runs before the schema change (correct order for making a field non-null)

Using noop for the reverse is acceptable since the original null/empty values were invalid

Dependencies are properly declared

Add migrations to fill unset mimetypes

dfff400

Copilot AI review requested due to automatic review settings January 13, 2026 12:24

Copilot AI reviewed Jan 13, 2026

View reviewed changes

care_scribe/migrations/0012_alter_scribefile_mime_type.py Show resolved Hide resolved

care_scribe/migrations/0012_alter_scribefile_mime_type.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Jan 13, 2026

View reviewed changes

shivankacker mentioned this pull request Jan 13, 2026

Fix audio preview ohcnetwork/care_scribe_fe#58

Open

fixes

81778d9

coderabbitai bot reviewed Jan 13, 2026

View reviewed changes

sainak approved these changes Jan 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Determine unset mime types in previous scribes#25

Determine unset mime types in previous scribes#25
shivankacker wants to merge 2 commits intomasterfrom
mime-type-migration

shivankacker commented Jan 13, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

shivankacker commented Jan 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shivankacker commented Jan 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 13, 2026 •

edited

Loading