fix(search): parse tika xmpDM:duration as a float#2638
Merged
Conversation
Tika emits xmpDM:duration as seconds in floating-point form (for example "154.57379150390625"), so strconv.ParseInt rejected every value and the field was silently dropped — every indexed audio item ended up without a duration. Parse the value with strconv.ParseFloat and convert to milliseconds ourselves. Adjust the existing extractor test to cover the fractional case.
There was a problem hiding this comment.
Pull request overview
This PR fixes audio duration extraction for items indexed via the Tika content extractor by correctly parsing xmpDM:duration as a floating-point seconds value and converting it into milliseconds for the LibreGraph audio facet.
Changes:
- Parse
xmpDM:durationwithstrconv.ParseFloatinstead ofstrconv.ParseInt. - Convert duration from seconds (possibly fractional) to milliseconds when setting
audio.Duration. - Update the existing Tika extractor spec to cover fractional duration values (
"225.5"→225500ms).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
services/search/pkg/content/tika.go |
Switches duration parsing to float seconds and converts to ms before calling audio.SetDuration. |
services/search/pkg/content/tika_test.go |
Updates the audio metadata test fixture and assertion to validate fractional-second duration parsing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address review feedback: a straight int64 cast truncates toward zero, so Tika values that produce results like 1234.999... millisecond would land at 1234 ms instead of 1235 ms. Round before casting so durations are as accurate as float64 allows.
|
micbar
approved these changes
Apr 21, 2026
Contributor
Author
|
Thanks for the review! I'm not allowed to merge, feel free to merge whenever you like :) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.




Description
Parse the
xmpDM:durationmetadata value Tika returns withstrconv.ParseFloatinstead ofstrconv.ParseInt, and convert the result to milliseconds when callingaudio.SetDuration.Related Issue
No tracking issue yet — happy to open one if preferred.
Motivation and Context
Tika emits
xmpDM:durationas seconds in floating-point form (e.g."154.57379150390625").strconv.ParseIntrejects the decimal separator, returns an error, and the whole block is skipped — so every audio item indexed through the Tika extractor ended up without a duration, silently.Symptom in the Graph API:
audiofacets on driveItem search hits never includedduration, even for MP3 files Tika parses happily.How Has This Been Tested?
localhost:9998, bleve search engine),opencloud search index --all-spaces --force-rescanafter the patch (separate fix PR upcoming)go test ./services/search/pkg/content/...— the existing extractor spec now exercises the fractional case ("225.5"→225500ms); all 12 specs passxmpDM:durationfor them — that's a separate upstream limitation, not this PR's concern)Types of changes
Checklist: