Encode QR code data to UTF-8 #24350

dkurt · 2023-10-02T12:41:01Z

Pull Request Readiness Checklist

Merge with extra: opencv/opencv_extra#1105

resolves #23728

This is first PR in a series. Here we just return a raw Unicode. Later I will try expand QR codes decoding methods to use ECI assignment number and return a string with proper encoding, not only UTF-8 or raw unicode.

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

modules/python/src2/cv2_convert.cpp

modules/objdetect/misc/python/test/test_qrcode_detect.py

opencv-alalek · 2023-10-05T12:58:28Z

modules/objdetect/src/qrcode.cpp

@@ -2760,6 +2802,9 @@ bool QRDecode::decodingProcess()
    {
        result_info += qr_code_data.payload[i];
    }
+    if (qr_code_data.data_type == QUIRC_DATA_TYPE_BYTE && !checkUTF8(result_info)) {


data type check should go before the first loop on the line 2801.

Do we really need checkUTF8? Which test cases fail without it?

A QR code from #23728 is created in Bytes mode but the sequence is not UTF-8 (probably, decoded just a raw bytes array of the unicode string):

qr code content (qr_code_data.payload):

83, 80, 67, 13, 10, 48, 50, 48, 48, 13, 10, 49, 13, 10, 67, 72, 48, 52, 51, 48, 48, 48, 53, 50, 51, 48, 50, 50, 50, 52, 52, 57, 48, 49, 72, 13, 10, 83, 13, 10, 69, 109, 105, 108, 32, 70, 114, 101, 121, 32, 66, 101, 116, 114, 105, 101, 98, 115, 32, 65, 71, 13, 10, 66, 97, 104, 110, 104, 111, 102, 115, 116, 114, 97, 115, 115, 101, 32, 49, 55, 13, 10, 13, 10, 53, 55, 52, 53, 13, 10, 83, 97, 102, 101, 110, 119, 105, 108, 13, 10, 67, 72, 13, 10, 13, 10, 13, 10, 13, 10, 13, 10, 13, 10, 13, 10, 13, 10, 51, 50, 50, 56, 46, 53, 48, 13, 10, 67, 72, 70, 13, 10, 83, 13, 10, 83, 105, 120, 116, 32, 114, 101, 110, 116, 32, 97, 32, 67, 97, 114, 32, 65, 71, 32, 13, 10, 77, 252, 108, 108, 104, 101, 105, 109, 115, 116, 114, 97, 115, 115, 101, 32, 49, 57, 53, 13, 10, 13, 10, 52, 48, 53, 55, 13, 10, 66, 97, 115, 101, 108, 13, 10, 67, 72, 13, 10, 81, 82, 82, 13, 10, 50, 54, 55, 50, 55, 52, 48, 51, 53, 56, 49, 48, 49, 48, 52, 56, 51, 48, 48, 48, 57, 54, 51, 57, 52, 51, 48, 13, 10, 75, 100, 110, 114, 32, 57, 54, 51, 57, 52, 51, 44, 32, 48, 51, 53, 56, 49, 45, 48, 49, 48, 52, 56, 51, 48, 13, 10, 69, 80, 68,

byte array of the text:

text = u""" SPC 0200 1 CH043000523022244901H S Emil Frey Betriebs AG Bahnhofstrasse 17 5745 Safenwil CH 3228.50 CHF S Sixt rent a Car AG Müllheimstrasse 195 4057 Basel CH QRR 267274035810104830009639430 Kdnr 963943, 03581-0104830 EPD """ print([int(v) for v in bytearray(text.encode('ISO-8859-1'))])

[83, 80, 67, 10, 48, 50, 48, 48, 10, 49, 10, 67, 72, 48, 52, 51, 48, 48, 48, 53, 50, 51, 48, 50, 50, 50, 52, 52, 57, 48, 49, 72, 10, 83, 10, 69, 109, 105, 108, 32, 70, 114, 101, 121, 32, 66, 101, 116, 114, 105, 101, 98, 115, 32, 65, 71, 10, 66, 97, 104, 110, 104, 111, 102, 115, 116, 114, 97, 115, 115, 101, 32, 49, 55, 10, 10, 53, 55, 52, 53, 10, 83, 97, 102, 101, 110, 119, 105, 108, 10, 67, 72, 10, 10, 10, 10, 10, 10, 10, 10, 51, 50, 50, 56, 46, 53, 48, 10, 67, 72, 70, 10, 83, 10, 83, 105, 120, 116, 32, 114, 101, 110, 116, 32, 97, 32, 67, 97, 114, 32, 65, 71, 10, 77, 252, 108, 108, 104, 101, 105, 109, 115, 116, 114, 97, 115, 115, 101, 32, 49, 57, 53, 10, 10, 52, 48, 53, 55, 10, 66, 97, 115, 101, 108, 10, 67, 72, 10, 81, 82, 82, 10, 50, 54, 55, 50, 55, 52, 48, 51, 53, 56, 49, 48, 49, 48, 52, 56, 51, 48, 48, 48, 57, 54, 51, 57, 52, 51, 48, 10, 75, 100, 110, 114, 32, 57, 54, 51, 57, 52, 51, 44, 32, 48, 51, 53, 56, 49, 45, 48, 49, 48, 52, 56, 51, 48, 10, 69, 80, 68, 10]

However, the UTF-8 byte array is different:

print([int(v) for v in bytearray(text.encode('UTF-8'))])

[83, 80, 67, 10, 48, 50, 48, 48, 10, 49, 10, 67, 72, 48, 52, 51, 48, 48, 48, 53, 50, 51, 48, 50, 50, 50, 52, 52, 57, 48, 49, 72, 10, 83, 10, 69, 109, 105, 108, 32, 70, 114, 101, 121, 32, 66, 101, 116, 114, 105, 101, 98, 115, 32, 65, 71, 10, 66, 97, 104, 110, 104, 111, 102, 115, 116, 114, 97, 115, 115, 101, 32, 49, 55, 10, 10, 53, 55, 52, 53, 10, 83, 97, 102, 101, 110, 119, 105, 108, 10, 67, 72, 10, 10, 10, 10, 10, 10, 10, 10, 51, 50, 50, 56, 46, 53, 48, 10, 67, 72, 70, 10, 83, 10, 83, 105, 120, 116, 32, 114, 101, 110, 116, 32, 97, 32, 67, 97, 114, 32, 65, 71, 10, 77, 195, 188, 108, 108, 104, 101, 105, 109, 115, 116, 114, 97, 115, 115, 101, 32, 49, 57, 53, 10, 10, 52, 48, 53, 55, 10, 66, 97, 115, 101, 108, 10, 67, 72, 10, 81, 82, 82, 10, 50, 54, 55, 50, 55, 52, 48, 51, 53, 56, 49, 48, 49, 48, 52, 56, 51, 48, 48, 48, 57, 54, 51, 57, 52, 51, 48, 10, 75, 100, 110, 114, 32, 57, 54, 51, 57, 52, 51, 44, 32, 48, 51, 53, 56, 49, 45, 48, 49, 48, 52, 56, 51, 48, 10, 69, 80, 68, 10]

There is a statement in the ISO that storing bytes array is generally fine, but the encoding step is up to user (alternative is to create a QR code in ECI mode which keeps an info about the encoding standard, but seems like not all the generators propose it):

In closed-system national or application-specific implementations of QR Code, an alternative 8-bit character set, for example as defined in an appropriate part of ISO/IEC 8859, may be specified for Byte mode. When an alternative character set is specified, however, the parties intending to read the QR Code 2005 symbols require to be notified of the applicable character set in the application specification or by bilateral agreement.

According to our docstring, OpenCV should return result in UTF-8 format:

opencv/modules/objdetect/include/opencv2/objdetect/graphical_code_detector.hpp

Lines 30 to 37 in 3dcaf1f

/** @brief Decodes graphical code in image once it's found by the detect() method.

Returns UTF8-encoded output string or empty string if the code cannot be decoded.

@param img grayscale or color (BGR) image containing graphical code.

@param points Quadrangle vertices found by detect() method (or some other algorithm).

@param straight_code The optional output image containing binarized code, will be empty if not found.

*/

CV_WRAP std::string decode(InputArray img, InputArray points, OutputArray straight_code = noArray()) const;

@opencv-alalek, perhaps I misunderstood the question. Do you mean can we apply encoding right in the loop, without checkUTF8 method?

Without checkUTF8 failed tests are:

[ FAILED ] Objdetect_QRCode.regression/24, where GetParam() = "russian.jpg" [ FAILED ] Objdetect_QRCode.regression/25, where GetParam() = "kanji.jpg" [ FAILED ] Objdetect_QRCode_Multi.regression/6, where GetParam() = ("4_qrcodes.png", "contours_based") [ FAILED ] Objdetect_QRCode_Multi.regression/7, where GetParam() = ("4_qrcodes.png", "aruco_based") [ FAILED ] Objdetect_QRCode_Multi.regression/8, where GetParam() = ("5_qrcodes.png", "contours_based") [ FAILED ] Objdetect_QRCode_Multi.regression/12, where GetParam() = ("7_qrcodes.png", "contours_based") [ FAILED ] Objdetect_QRCode_Multi.regression/13, where GetParam() = ("7_qrcodes.png", "aruco_based")

There is completely missing code for proper handling of data type and ECI information.
Detail: https://en.wikipedia.org/wiki/Extended_Channel_Interpretation

Unfortunately it requires sometimes code-page maps.

P.S. Kanji is not properly handled (UTF-8 conversion is still required)

Agree, I wanted to take a look later to Kanji test too.

opencv-alalek

API should be extended to return metadata (ECI) for decoded streams.

asmorkalov · 2023-10-12T06:57:24Z

modules/objdetect/src/qrcode.cpp

+                }
+                result_info.assign((const char*)qr_code_data.payload, qr_code_data.payload_len);
+            } else if (qr_code_data.eci == 25/*ECI_UTF_16BE*/) {
+                CV_LOG_INFO(NULL, "QR: UTF-16BE ECI is not supported");


I propose to make it CV_LOG_WARING. INFO is not printed in regular builds.

We should not spam with that message. QR detector is usually called for each frame.

asmorkalov

👍

Encode QR code data to UTF-8 opencv#24350 ### Pull Request Readiness Checklist **Merge with extra**: opencv/opencv_extra#1105 resolves opencv#23728 This is first PR in a series. Here we just return a raw Unicode. Later I will try expand QR codes decoding methods to use ECI assignment number and return a string with proper encoding, not only UTF-8 or raw unicode. See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake

dkurt added the category: python bindings label Oct 2, 2023

dkurt changed the title ~~Decode string as raw unicode if UTF-8 failed~~ Decode Python string as raw unicode if UTF-8 failed Oct 2, 2023

dkurt force-pushed the py_return_non_utf8_string branch from cf6fbaa to 5d46532 Compare October 2, 2023 13:07

dkurt mentioned this pull request Oct 2, 2023

wechat qrcode UnicodeDecodeError in python3.6.9 opencv/opencv_contrib#2852

Open

dkurt force-pushed the py_return_non_utf8_string branch 3 times, most recently from 1d2404c to 609a3ed Compare October 2, 2023 14:28

asmorkalov requested review from VadimLevin and opencv-alalek October 2, 2023 16:16

dkurt force-pushed the py_return_non_utf8_string branch 2 times, most recently from 99aefb8 to 03e470e Compare October 2, 2023 16:47

opencv-alalek reviewed Oct 3, 2023

View reviewed changes

modules/python/src2/cv2_convert.cpp Outdated Show resolved Hide resolved

dkurt marked this pull request as draft October 3, 2023 03:48

dkurt changed the title ~~Decode Python string as raw unicode if UTF-8 failed~~ Encode QR code data to UTF-8 Oct 3, 2023

dkurt marked this pull request as ready for review October 3, 2023 11:38

dkurt force-pushed the py_return_non_utf8_string branch from e0a9985 to 2f87aa4 Compare October 3, 2023 11:41

opencv-alalek requested review from SinM9 and removed request for VadimLevin October 3, 2023 13:41

dkurt force-pushed the py_return_non_utf8_string branch from c93b5f5 to 5db0155 Compare October 3, 2023 17:11

opencv-alalek reviewed Oct 3, 2023

View reviewed changes

modules/objdetect/misc/python/test/test_qrcode_detect.py Outdated Show resolved Hide resolved

dkurt added 4 commits October 4, 2023 09:11

Decode string as raw unicode if UTF-8 failed

3e41c2a

Skip test for python2

20bf6c4

Encode QR code output to UTF-8

ece682f

Encode UTF-8 only for Bytes mode

3890347

dkurt force-pushed the py_return_non_utf8_string branch from 5db0155 to 008ad44 Compare October 4, 2023 06:23

dkurt mentioned this pull request Oct 4, 2023

QR code detecting in python with OpenCV raises UnicodeDecodeError: 'utf-8' codec can't decode byte #23728

Closed

4 tasks

Add C++ test and fix warnings

979c64f

dkurt force-pushed the py_return_non_utf8_string branch from 008ad44 to 979c64f Compare October 4, 2023 07:04

dkurt added category: objdetect and removed category: python bindings labels Oct 5, 2023

opencv-alalek reviewed Oct 5, 2023

View reviewed changes

dkurt marked this pull request as draft October 5, 2023 14:42

opencv-pushbot force-pushed the py_return_non_utf8_string branch from 5fb98ba to c0aaad8 Compare October 6, 2023 05:39

dkurt marked this pull request as ready for review October 6, 2023 06:17

objdetect(qr): properly handle data type information

f79bf88

opencv-pushbot force-pushed the py_return_non_utf8_string branch from c0aaad8 to f79bf88 Compare October 6, 2023 09:55

opencv-alalek added bug test labels Oct 6, 2023

opencv-alalek added this to the 4.9.0 milestone Oct 6, 2023

opencv-alalek approved these changes Oct 6, 2023

View reviewed changes

dkurt requested a review from asmorkalov October 10, 2023 09:34

SinM9 approved these changes Oct 10, 2023

View reviewed changes

asmorkalov reviewed Oct 12, 2023

View reviewed changes

asmorkalov approved these changes Oct 12, 2023

View reviewed changes

asmorkalov merged commit 5ddf3de into opencv:4.x Oct 12, 2023
24 checks passed

asmorkalov mentioned this pull request Oct 17, 2023

(5.x) Merge 4.x #24416

Merged

dkurt deleted the py_return_non_utf8_string branch October 18, 2023 19:18

dkurt mentioned this pull request Oct 18, 2023

Consider QRCode ECI encoding #24426

Draft

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encode QR code data to UTF-8 #24350

Encode QR code data to UTF-8 #24350

dkurt commented Oct 2, 2023 •

edited

Loading

opencv-alalek Oct 5, 2023

dkurt Oct 5, 2023 •

edited

Loading

dkurt Oct 5, 2023

dkurt Oct 5, 2023

dkurt Oct 5, 2023

opencv-alalek Oct 6, 2023

dkurt Oct 6, 2023

opencv-alalek left a comment

asmorkalov Oct 12, 2023

opencv-alalek Oct 12, 2023

asmorkalov left a comment

	/** @brief Decodes graphical code in image once it's found by the detect() method.

	Returns UTF8-encoded output string or empty string if the code cannot be decoded.
	@param img grayscale or color (BGR) image containing graphical code.
	@param points Quadrangle vertices found by detect() method (or some other algorithm).
	@param straight_code The optional output image containing binarized code, will be empty if not found.
	*/
	CV_WRAP std::string decode(InputArray img, InputArray points, OutputArray straight_code = noArray()) const;

Encode QR code data to UTF-8 #24350

Encode QR code data to UTF-8 #24350

Conversation

dkurt commented Oct 2, 2023 • edited Loading

Pull Request Readiness Checklist

Choose a reason for hiding this comment

dkurt Oct 5, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

opencv-alalek left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asmorkalov left a comment

Choose a reason for hiding this comment

dkurt commented Oct 2, 2023 •

edited

Loading

dkurt Oct 5, 2023 •

edited

Loading