Fix encoding detection on PHP 8.1 #182

come-nc · 2022-04-25T13:59:41Z

Use mb_check_encoding to detect encoding as mb_detect_encoding is misbehaving under PHP 8.1.
Also use mb_convert_encoding instead of utf8_encode as it’s getting deprecated in PHP 8.2.
Fixes #181

Use mb_check_encoding to detect encoding as mb_detect_encoding is misbehaving under PHP 8.1. Also use mb_convert_encoding instead of utf8_encode as it’s getting deprecated in PHP 8.2. Fixes sabre-io#181

staabm · 2022-04-25T14:01:40Z

do we need additional test-coverage for this change?

codecov · 2022-04-25T14:02:12Z

Codecov Report

Merging #182 (42e1411) into master (315f592) will decrease coverage by 0.15%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##             master     #182      +/-   ##
============================================
- Coverage     89.75%   89.60%   -0.16%     
  Complexity      262      262              
============================================
  Files            15       15              
  Lines           898      885      -13     
============================================
- Hits            806      793      -13     
  Misses           92       92

Impacted Files	Coverage Δ
lib/functions.php	`95.65% <100.00%> (-0.07%)`	⬇️
lib/Client.php	`84.50% <0.00%> (-0.67%)`	⬇️
lib/Response.php	`96.77% <0.00%> (-0.20%)`	⬇️

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

come-nc · 2022-04-25T14:18:03Z

do we need additional test-coverage for this change?

If the current coverage did not catch the bug, yes.

Here are some example strings that could be used in the test: https://3v4l.org/RrjlE
You can see that both 'Dušan' and 'Živko' (stolen from https://en.wikipedia.org/wiki/Slavic_names) are messed up by the function.

staabm · 2022-04-25T14:23:32Z

please investigate whether we need such a test and add it, if required.

Signed-off-by: Côme Chilliet <come.chilliet@nextcloud.com>

come-nc · 2022-04-26T09:07:30Z

please investigate whether we need such a test and add it, if required.

Test added.

staabm · 2022-04-26T09:34:14Z

lib/functions.php


-    switch ($encoding) {
-        case 'ISO-8859-1':
-            $path = utf8_encode($path);


since this method is deprecated as of php 8.2, this is a step into a good direction

👍

PVince81 · 2022-06-07T15:30:21Z

@DeepDiver1975 @phil-davis any objections to merging this ? thanks 😄

come-nc · 2022-06-09T09:59:12Z

@DeepDiver1975 There is the same problem in https://github.com/sabre-io/dav/blob/master/lib/DAV/StringUtil.php#L78 and https://github.com/sabre-io/vobject/blob/master/lib/StringUtil.php#L41

Should I open PR on these repos as well?

DeepDiver1975 · 2022-06-13T15:40:35Z

Should I open PR on these repos as well?

yes please

phil-davis · 2022-06-24T07:55:13Z

Just a note for "information". There are some impossibilities for this kind of automated "educated guess" detection of encoding.
For example, hex C2A3 is the UTF-8 for the UK pound symbol.
But in ISO-8859-1 that is 2 code-points - C2 is Â and A3 happens to be the UK pound symbol £

So if any software is presented with C2A3 as an "encoded string" and no other meta-data about how to interpret it, then there is no way to know if it is meant to represent Â£ or just £.

There will be plenty of other examples. Have a play at https://dencode.com/en/string/hex and try putting in hex for https://en.wikipedia.org/wiki/UTF-8 code points, and find ones that match sets of ISO-8859-1 code points that represent sequences of characters that could also be a valid, reasonable combination that might occur in a file name, for example.

Fix encoding detection on PHP 8.1

540aaad

Use mb_check_encoding to detect encoding as mb_detect_encoding is misbehaving under PHP 8.1. Also use mb_convert_encoding instead of utf8_encode as it’s getting deprecated in PHP 8.2. Fixes sabre-io#181

Add test for slavic words encoding detection

42e1411

Signed-off-by: Côme Chilliet <come.chilliet@nextcloud.com>

staabm reviewed Apr 26, 2022

View reviewed changes

staabm approved these changes Apr 26, 2022

View reviewed changes

staabm requested a review from phil-davis April 26, 2022 09:34

DeepDiver1975 merged commit 8e29569 into sabre-io:master Jun 8, 2022

come-nc mentioned this pull request Jun 9, 2022

[Bug]: NC24 + PHP8.1 break UTF-8 compatibility nextcloud/server#31212

Closed

8 tasks

This was referenced Jun 14, 2022

Fix encoding detection on PHP 8.1 sabre-io/dav#1404

Merged

Fix encoding detection on PHP 8.1 sabre-io/vobject#575

Merged

phil-davis mentioned this pull request Jun 27, 2022

Mangles vcards with diacritics sabre-io/dav#1405

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix encoding detection on PHP 8.1 #182

Fix encoding detection on PHP 8.1 #182

come-nc commented Apr 25, 2022

staabm commented Apr 25, 2022

codecov bot commented Apr 25, 2022 •

edited

Loading

come-nc commented Apr 25, 2022

staabm commented Apr 25, 2022

come-nc commented Apr 26, 2022

staabm Apr 26, 2022

PVince81 commented Jun 7, 2022

come-nc commented Jun 9, 2022

DeepDiver1975 commented Jun 13, 2022

phil-davis commented Jun 24, 2022 •

edited

Loading

Fix encoding detection on PHP 8.1 #182

Fix encoding detection on PHP 8.1 #182

Conversation

come-nc commented Apr 25, 2022

staabm commented Apr 25, 2022

codecov bot commented Apr 25, 2022 • edited Loading

Codecov Report

come-nc commented Apr 25, 2022

staabm commented Apr 25, 2022

come-nc commented Apr 26, 2022

staabm Apr 26, 2022

Choose a reason for hiding this comment

PVince81 commented Jun 7, 2022

come-nc commented Jun 9, 2022

DeepDiver1975 commented Jun 13, 2022

phil-davis commented Jun 24, 2022 • edited Loading

codecov bot commented Apr 25, 2022 •

edited

Loading

phil-davis commented Jun 24, 2022 •

edited

Loading