Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace non-ASCII characters in recording filenames. #1437

Merged
merged 3 commits into from Apr 4, 2021

Conversation

@softins
Copy link
Member

@softins softins commented Apr 2, 2021

Replaces high-order Latin characters with their nearest ASCII equivalent for readability.
Other non-ASCII chars are replaced with '_'.

Fixes #1424

Replaces high-order Latin characters with their nearest ASCII equivalent for readability.
Other non-ASCII chars are replaced with '_'.

Fixes #1424
@softins softins requested a review from pljones Apr 2, 2021
@hoffie hoffie added this to Triage in Tracking via automation Apr 2, 2021
@hoffie hoffie moved this from Triage to Waiting on Team in Tracking Apr 2, 2021

for (auto &c : r)
{
c = (c >= 128) ? charmap[c - 128] : (c >= 32) ? c : '_';

This comment has been minimized.

@hoffie

hoffie Apr 2, 2021
Member

Getting this with gcc 10.2.0:

src/recorder/jamrecorder.cpp: In member function 'QString recorder::CJamClient::TranslateChars(const QString&) const':
src/recorder/jamrecorder.cpp:125:16: warning: comparison is always false due to limited range of data type [-Wtype-limits]
  125 |         c = (c >= 128) ? charmap[c - 128] : (c >= 32) ? c : '_';
      |              ~~^~~~~~

In fact, German 'Ä' is replaced by '_' but I guess it was supposed to be replaced by 'A' in your logic?

This comment has been minimized.

@softins

softins Apr 2, 2021
Author Member

Ok, this must be a portability thing. On your system auto &c must be getting typed as signed char, whereas on mine it gets typed as unsigned char. That would be why on your system it never uses the array.

I could try it with unsigned char &c instead, but that might end up as a type mismatch. Or maybe just cast c to unsigned within the loop body.

No time on Sat to look at it, will do Sun or Mon. I see the Mac build failed too, so will have to see why.

This comment has been minimized.

@softins

softins Apr 2, 2021
Author Member

I’ve pushed a change that hopefully might be portable. Let’s see.

This comment has been minimized.

@hoffie

hoffie Apr 3, 2021
Member

No more warnings for me and Ä is proplery replaced with A. Looks good.

@pljones
Copy link
Collaborator

@pljones pljones commented Apr 3, 2021

If this is going to be done, it might as well remove all the nasties, rather than adding some -- specifically the ones that then get replaced: -, ., :, /, \, (space). However, it's best to avoid ,, ~, %, |, ^, <, >, ', and ", really, as they're difficult on certain platforms (I should probably have stripped them, too).

@hoffie
hoffie approved these changes Apr 3, 2021
Copy link
Member

@hoffie hoffie left a comment

Should probably be squash-merged?

@softins
Copy link
Member Author

@softins softins commented Apr 3, 2021

If this is going to be done, it might as well remove all the nasties, rather than adding some -- specifically the ones that then get replaced: -, ., :, /, \, (space). However, it's best to avoid ,, ~, %, |, ^, <, >, ', and ", really, as they're difficult on certain platforms (I should probably have stripped them, too).

Well the function of TranslateChars() is to replace high-order Latin1 characters with their nearest ASCII equivalents, if any, not to sanitise for platform filename conventions. That is the job of the Regex replace further down the expression that constructs the filename. Bear in mind TranslateChars() doesn't do anything to characters that are already ASCII.

@softins
Copy link
Member Author

@softins softins commented Apr 3, 2021

Unless you would prefer that TranslateChars() had a complete 256-byte table and also took over the job of sanitising, instead of the .replace(QRegExp? @pljones

@softins softins marked this pull request as draft Apr 3, 2021
@pljones
Copy link
Collaborator

@pljones pljones commented Apr 4, 2021

Unless you would prefer that TranslateChars() had a complete 256-byte table and also took over the job of sanitising, instead of the .replace(QRegExp? @pljones

If it's only going to be used in the one place, it might as well do it -- it'll be cheaper than the QRegExp, too, I expect. Yeah...

@softins softins marked this pull request as ready for review Apr 4, 2021
@pljones
pljones approved these changes Apr 4, 2021
Tracking automation moved this from Waiting on Team to In Progress Apr 4, 2021
@softins softins requested review from pljones and hoffie and removed request for pljones and hoffie Apr 4, 2021
@softins softins merged commit 664fd9e into jamulussoftware:master Apr 4, 2021
7 checks passed
7 checks passed
@github-actions
Prepare Auto-Build/Release
Details
@github-actions
Build assets for AndroidAPK (artifact+codeQL)
Details
@github-actions
Build assets for Linux (artifacts+codeQL)
Details
@github-actions
Build assets for MacOS (codeQL)
Details
@github-actions
Build assets for MacOS (artifacts)
Details
@github-actions
Build assets for Windows (artifact+codeQL)
Details
@github-code-scanning
CodeQL 5 new warnings
Details
Tracking automation moved this from In Progress to Done Apr 4, 2021
@softins softins deleted the softins:ascii-filenames branch Apr 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Linked issues

Successfully merging this pull request may close these issues.

3 participants