-
-
Notifications
You must be signed in to change notification settings - Fork 55.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encode QR code data to UTF-8 #24350
Encode QR code data to UTF-8 #24350
Conversation
cf6fbaa
to
5d46532
Compare
1d2404c
to
609a3ed
Compare
99aefb8
to
03e470e
Compare
e0a9985
to
2f87aa4
Compare
c93b5f5
to
5db0155
Compare
5db0155
to
008ad44
Compare
008ad44
to
979c64f
Compare
modules/objdetect/src/qrcode.cpp
Outdated
@@ -2760,6 +2802,9 @@ bool QRDecode::decodingProcess() | |||
{ | |||
result_info += qr_code_data.payload[i]; | |||
} | |||
if (qr_code_data.data_type == QUIRC_DATA_TYPE_BYTE && !checkUTF8(result_info)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data type check should go before the first loop on the line 2801.
Do we really need checkUTF8
? Which test cases fail without it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A QR code from #23728 is created in Bytes mode but the sequence is not UTF-8 (probably, decoded just a raw bytes array of the unicode string):
qr code content (qr_code_data.payload):
83, 80, 67, 13, 10, 48, 50, 48, 48, 13, 10, 49, 13, 10, 67, 72, 48, 52, 51, 48, 48, 48, 53, 50, 51, 48, 50, 50, 50, 52, 52, 57, 48, 49, 72, 13, 10, 83, 13, 10, 69, 109, 105, 108, 32, 70, 114, 101, 121, 32, 66, 101, 116, 114, 105, 101, 98, 115, 32, 65, 71, 13, 10, 66, 97, 104, 110, 104, 111, 102, 115, 116, 114, 97, 115, 115, 101, 32, 49, 55, 13, 10, 13, 10, 53, 55, 52, 53, 13, 10, 83, 97, 102, 101, 110, 119, 105, 108, 13, 10, 67, 72, 13, 10, 13, 10, 13, 10, 13, 10, 13, 10, 13, 10, 13, 10, 13, 10, 51, 50, 50, 56, 46, 53, 48, 13, 10, 67, 72, 70, 13, 10, 83, 13, 10, 83, 105, 120, 116, 32, 114, 101, 110, 116, 32, 97, 32, 67, 97, 114, 32, 65, 71, 32, 13, 10, 77, 252, 108, 108, 104, 101, 105, 109, 115, 116, 114, 97, 115, 115, 101, 32, 49, 57, 53, 13, 10, 13, 10, 52, 48, 53, 55, 13, 10, 66, 97, 115, 101, 108, 13, 10, 67, 72, 13, 10, 81, 82, 82, 13, 10, 50, 54, 55, 50, 55, 52, 48, 51, 53, 56, 49, 48, 49, 48, 52, 56, 51, 48, 48, 48, 57, 54, 51, 57, 52, 51, 48, 13, 10, 75, 100, 110, 114, 32, 57, 54, 51, 57, 52, 51, 44, 32, 48, 51, 53, 56, 49, 45, 48, 49, 48, 52, 56, 51, 48, 13, 10, 69, 80, 68,
byte array of the text:
text = u"""
SPC
0200
1
CH043000523022244901H
S
Emil Frey Betriebs AG
Bahnhofstrasse 17
5745
Safenwil
CH
3228.50
CHF
S
Sixt rent a Car AG
Müllheimstrasse 195
4057
Basel
CH
QRR
267274035810104830009639430
Kdnr 963943, 03581-0104830
EPD
"""
print([int(v) for v in bytearray(text.encode('ISO-8859-1'))])
[83, 80, 67, 10, 48, 50, 48, 48, 10, 49, 10, 67, 72, 48, 52, 51, 48, 48, 48, 53, 50, 51, 48, 50, 50, 50, 52, 52, 57, 48, 49, 72, 10, 83, 10, 69, 109, 105, 108, 32, 70, 114, 101, 121, 32, 66, 101, 116, 114, 105, 101, 98, 115, 32, 65, 71, 10, 66, 97, 104, 110, 104, 111, 102, 115, 116, 114, 97, 115, 115, 101, 32, 49, 55, 10, 10, 53, 55, 52, 53, 10, 83, 97, 102, 101, 110, 119, 105, 108, 10, 67, 72, 10, 10, 10, 10, 10, 10, 10, 10, 51, 50, 50, 56, 46, 53, 48, 10, 67, 72, 70, 10, 83, 10, 83, 105, 120, 116, 32, 114, 101, 110, 116, 32, 97, 32, 67, 97, 114, 32, 65, 71, 10, 77, 252, 108, 108, 104, 101, 105, 109, 115, 116, 114, 97, 115, 115, 101, 32, 49, 57, 53, 10, 10, 52, 48, 53, 55, 10, 66, 97, 115, 101, 108, 10, 67, 72, 10, 81, 82, 82, 10, 50, 54, 55, 50, 55, 52, 48, 51, 53, 56, 49, 48, 49, 48, 52, 56, 51, 48, 48, 48, 57, 54, 51, 57, 52, 51, 48, 10, 75, 100, 110, 114, 32, 57, 54, 51, 57, 52, 51, 44, 32, 48, 51, 53, 56, 49, 45, 48, 49, 48, 52, 56, 51, 48, 10, 69, 80, 68, 10]
However, the UTF-8 byte array is different:
print([int(v) for v in bytearray(text.encode('UTF-8'))])
[83, 80, 67, 10, 48, 50, 48, 48, 10, 49, 10, 67, 72, 48, 52, 51, 48, 48, 48, 53, 50, 51, 48, 50, 50, 50, 52, 52, 57, 48, 49, 72, 10, 83, 10, 69, 109, 105, 108, 32, 70, 114, 101, 121, 32, 66, 101, 116, 114, 105, 101, 98, 115, 32, 65, 71, 10, 66, 97, 104, 110, 104, 111, 102, 115, 116, 114, 97, 115, 115, 101, 32, 49, 55, 10, 10, 53, 55, 52, 53, 10, 83, 97, 102, 101, 110, 119, 105, 108, 10, 67, 72, 10, 10, 10, 10, 10, 10, 10, 10, 51, 50, 50, 56, 46, 53, 48, 10, 67, 72, 70, 10, 83, 10, 83, 105, 120, 116, 32, 114, 101, 110, 116, 32, 97, 32, 67, 97, 114, 32, 65, 71, 10, 77, 195, 188, 108, 108, 104, 101, 105, 109, 115, 116, 114, 97, 115, 115, 101, 32, 49, 57, 53, 10, 10, 52, 48, 53, 55, 10, 66, 97, 115, 101, 108, 10, 67, 72, 10, 81, 82, 82, 10, 50, 54, 55, 50, 55, 52, 48, 51, 53, 56, 49, 48, 49, 48, 52, 56, 51, 48, 48, 48, 57, 54, 51, 57, 52, 51, 48, 10, 75, 100, 110, 114, 32, 57, 54, 51, 57, 52, 51, 44, 32, 48, 51, 53, 56, 49, 45, 48, 49, 48, 52, 56, 51, 48, 10, 69, 80, 68, 10]
There is a statement in the ISO that storing bytes array is generally fine, but the encoding step is up to user (alternative is to create a QR code in ECI mode which keeps an info about the encoding standard, but seems like not all the generators propose it):
In closed-system national or application-specific implementations of QR Code, an alternative 8-bit character set, for example as defined in an appropriate part of ISO/IEC 8859, may be specified for Byte mode. When an alternative character set is specified, however, the parties intending to read the QR Code 2005 symbols require to be notified of the applicable character set in the application specification or by bilateral agreement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to our docstring, OpenCV should return result in UTF-8 format:
opencv/modules/objdetect/include/opencv2/objdetect/graphical_code_detector.hpp
Lines 30 to 37 in 3dcaf1f
/** @brief Decodes graphical code in image once it's found by the detect() method. | |
Returns UTF8-encoded output string or empty string if the code cannot be decoded. | |
@param img grayscale or color (BGR) image containing graphical code. | |
@param points Quadrangle vertices found by detect() method (or some other algorithm). | |
@param straight_code The optional output image containing binarized code, will be empty if not found. | |
*/ | |
CV_WRAP std::string decode(InputArray img, InputArray points, OutputArray straight_code = noArray()) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@opencv-alalek, perhaps I misunderstood the question. Do you mean can we apply encoding right in the loop, without checkUTF8 method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without checkUTF8
failed tests are:
[ FAILED ] Objdetect_QRCode.regression/24, where GetParam() = "russian.jpg"
[ FAILED ] Objdetect_QRCode.regression/25, where GetParam() = "kanji.jpg"
[ FAILED ] Objdetect_QRCode_Multi.regression/6, where GetParam() = ("4_qrcodes.png", "contours_based")
[ FAILED ] Objdetect_QRCode_Multi.regression/7, where GetParam() = ("4_qrcodes.png", "aruco_based")
[ FAILED ] Objdetect_QRCode_Multi.regression/8, where GetParam() = ("5_qrcodes.png", "contours_based")
[ FAILED ] Objdetect_QRCode_Multi.regression/12, where GetParam() = ("7_qrcodes.png", "contours_based")
[ FAILED ] Objdetect_QRCode_Multi.regression/13, where GetParam() = ("7_qrcodes.png", "aruco_based")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is completely missing code for proper handling of data type and ECI information.
Detail: https://en.wikipedia.org/wiki/Extended_Channel_Interpretation
Unfortunately it requires sometimes code-page maps.
P.S. Kanji is not properly handled (UTF-8 conversion is still required)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I wanted to take a look later to Kanji test too.
5fb98ba
to
c0aaad8
Compare
c0aaad8
to
f79bf88
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API should be extended to return metadata (ECI) for decoded streams.
} | ||
result_info.assign((const char*)qr_code_data.payload, qr_code_data.payload_len); | ||
} else if (qr_code_data.eci == 25/*ECI_UTF_16BE*/) { | ||
CV_LOG_INFO(NULL, "QR: UTF-16BE ECI is not supported"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose to make it CV_LOG_WARING. INFO is not printed in regular builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not spam with that message. QR detector is usually called for each frame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Encode QR code data to UTF-8 opencv#24350 ### Pull Request Readiness Checklist **Merge with extra**: opencv/opencv_extra#1105 resolves opencv#23728 This is first PR in a series. Here we just return a raw Unicode. Later I will try expand QR codes decoding methods to use ECI assignment number and return a string with proper encoding, not only UTF-8 or raw unicode. See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
Encode QR code data to UTF-8 opencv#24350 ### Pull Request Readiness Checklist **Merge with extra**: opencv/opencv_extra#1105 resolves opencv#23728 This is first PR in a series. Here we just return a raw Unicode. Later I will try expand QR codes decoding methods to use ECI assignment number and return a string with proper encoding, not only UTF-8 or raw unicode. See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
Encode QR code data to UTF-8 opencv#24350 ### Pull Request Readiness Checklist **Merge with extra**: opencv/opencv_extra#1105 resolves opencv#23728 This is first PR in a series. Here we just return a raw Unicode. Later I will try expand QR codes decoding methods to use ECI assignment number and return a string with proper encoding, not only UTF-8 or raw unicode. See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
Pull Request Readiness Checklist
Merge with extra: opencv/opencv_extra#1105
resolves #23728
This is first PR in a series. Here we just return a raw Unicode. Later I will try expand QR codes decoding methods to use ECI assignment number and return a string with proper encoding, not only UTF-8 or raw unicode.
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.