Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filenames containing ? give warning : 'extension mismatch' #129

workflowsguy opened this issue Jun 24, 2019 · 2 comments

Filenames containing ? give warning : 'extension mismatch' #129

workflowsguy opened this issue Jun 24, 2019 · 2 comments


Copy link

workflowsguy commented Jun 24, 2019

When files are processed with sf, those that contain a question mark at the end of the filename will be identified with the correct type, but a "extension mismatch" warning will still be output, viz.

sf "/Volumes/Public/bearbeiten/Dateien/ermitteln Dateityp/Salzburger Nachtstudio.2019-06-19 - Kulturkampf im Klassenzimmer?.mp3"
siegfried   : 1.7.12
scandate    : 2019-06-24T16:27:08+02:00
signature   : default.sig
created     : 2019-06-15T12:22:38+02:00
identifiers : 
  - name    : 'pronom'
    details : 'DROID_SignatureFile_V95.xml; container-signature-20180917.xml'
filename : '/Volumes/Public/bearbeiten/Dateien/ermitteln Dateityp/Salzburger Nachtstudio.2019-06-19 - Kulturkampf im Klassenzimmer?.mp3'
filesize : 74564436
modified : 2019-06-21T17:03:54+02:00
errors   : 
matches  :
  - ns      : 'pronom'
    id      : 'fmt/134'
    format  : 'MPEG 1/2 Audio Layer 3'
    version : 
    mime    : 'audio/mpeg'
    basis   : 'byte match at [[0 3] [74560365 1151] [74562035 1151] [74563705 3]] (signature 1/8)'
    warning : 'extension mismatch'

I am running on macOS, where ? is an allowed character for filenames.


@richardlehane richardlehane self-assigned this Jun 25, 2019
Copy link

richardlehane commented Jun 25, 2019

thanks for this report workflowsguy, an interesting bug! I'll look into it

Copy link

richardlehane commented Jun 25, 2019

I've found the offending code:

The issue is that some filenames are within URLs (because of WARC scanning) and where sf thinks the name is a URL it strips characters following a "?" because in a URL that's the query string. E.g. it is trying to get the name within a string like ""

But in your case where the ? is legitimately part of a regular file name, this is breaking extension matching.

I'll have a think about how to re-jig this bit of the code to fix

@richardlehane richardlehane added this to the 1.7.13 milestone Jul 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet

No branches or pull requests

2 participants