New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace all utf8 4-byte characters in tracked urls with � #8765

Merged
merged 2 commits into from Sep 15, 2015

Conversation

Projects
None yet
4 participants
@sgiehl
Member

sgiehl commented Sep 11, 2015

As described in #7766 all 4-byte characters in urls currently fail to be tracked.
As long as we won't switch the table layouts to utf8mb4 this "hack" makes it possible to track those urls, even if they are wrong afterwards, as some characters might get replaced.

@sgiehl sgiehl added the Needs Review label Sep 11, 2015

@mattab mattab modified the milestone: 2.15.0 Sep 14, 2015

@mattab

This comment has been minimized.

Show comment
Hide comment
@mattab

mattab Sep 14, 2015

Member

IMHO, this is "nice hack" - if one can write this.

Feedback:

  • Can you add such URL in one System test? (in a new test case or editing an existing one). it would be good to have both integration + system test, covering this use case.
Member

mattab commented Sep 14, 2015

IMHO, this is "nice hack" - if one can write this.

Feedback:

  • Can you add such URL in one System test? (in a new test case or editing an existing one). it would be good to have both integration + system test, covering this use case.
@mattab

This comment has been minimized.

Show comment
Hide comment
@mattab

mattab Sep 15, 2015

Member

@sgiehl I've added a system tests, once build is green will merge it 👍

Member

mattab commented Sep 15, 2015

@sgiehl I've added a system tests, once build is green will merge it 👍

mattab pushed a commit that referenced this pull request Sep 15, 2015

Matthieu Aubry
Merge pull request #8765 from piwik/handle_utf_4bytes_in_urls
Replace all utf8 4-byte characters in tracked urls with �

@mattab mattab merged commit fe51956 into master Sep 15, 2015

0 of 3 checks passed

Scrutinizer Created
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
continuous-integration/travis-ci/push The Travis CI build is in progress
Details

@mattab mattab deleted the handle_utf_4bytes_in_urls branch Sep 15, 2015

@sgiehl

This comment has been minimized.

Show comment
Hide comment
@sgiehl

sgiehl Sep 15, 2015

Member

👍

Member

sgiehl commented Sep 15, 2015

👍

@mattab mattab added the c: Usability label Oct 13, 2015

@sgiehl sgiehl referenced this pull request Oct 22, 2015

Closed

signs instead of unicode #9078

@saqib16

This comment has been minimized.

Show comment
Hide comment
@saqib16

saqib16 Apr 12, 2017

Hi,

We have updated to Piwik 3.0.2 and PHP 7.0.16 and getting following error:

Error in Piwik (tracker): Error query: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xD0_\xD0\xBB\xD0\xB5...' for column 'name' at row 1 In query: INSERT INTO piwik_log_action (name, hash, type, url_prefix) VALUES (?,CRC32(?),?,?) Parameters: array ( 0 => '/products/Счетчики Ð_лектроÑ_нергии/?cid=91701', 1 => '/products/Счетчики Ð_лектроÑ_нергии/?cid=91701', 2 => 1, 3 => 0, )

Other characters are been shown fine in dashboard but this error is appearing in PHP error log ?

saqib16 commented Apr 12, 2017

Hi,

We have updated to Piwik 3.0.2 and PHP 7.0.16 and getting following error:

Error in Piwik (tracker): Error query: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xD0_\xD0\xBB\xD0\xB5...' for column 'name' at row 1 In query: INSERT INTO piwik_log_action (name, hash, type, url_prefix) VALUES (?,CRC32(?),?,?) Parameters: array ( 0 => '/products/Счетчики Ð_лектроÑ_нергии/?cid=91701', 1 => '/products/Счетчики Ð_лектроÑ_нергии/?cid=91701', 2 => 1, 3 => 0, )

Other characters are been shown fine in dashboard but this error is appearing in PHP error log ?

@gmariani

This comment has been minimized.

Show comment
Hide comment
@gmariani

gmariani May 9, 2018

Still having this issue with 3.4.0 on PHP 7.2

[09-May-2018 14:11:33 UTC] Error in Matomo: Your Matomo version 3.4.0 is up to date.
[09-May-2018 14:11:43 UTC] Error in Matomo (tracker): Error query: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xF0\x9F\x8F\xA1 C...' for column 'name' at row 1 In query: INSERT INTO piwik_log_action (name, hash, type, url_prefix) VALUES (?,CRC32(?),?,?) Parameters: array ( 0 => '� Chandler Arizona Luxury Homes | [John Cunningham 2018]', 1 => '� Chandler Arizona Luxury Homes | [John Cunningham 2018]', 2 => 4, 3 => NULL, )

gmariani commented May 9, 2018

Still having this issue with 3.4.0 on PHP 7.2

[09-May-2018 14:11:33 UTC] Error in Matomo: Your Matomo version 3.4.0 is up to date.
[09-May-2018 14:11:43 UTC] Error in Matomo (tracker): Error query: SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xF0\x9F\x8F\xA1 C...' for column 'name' at row 1 In query: INSERT INTO piwik_log_action (name, hash, type, url_prefix) VALUES (?,CRC32(?),?,?) Parameters: array ( 0 => '� Chandler Arizona Luxury Homes | [John Cunningham 2018]', 1 => '� Chandler Arizona Luxury Homes | [John Cunningham 2018]', 2 => 4, 3 => NULL, )

@sgiehl

This comment has been minimized.

Show comment
Hide comment
@sgiehl

sgiehl May 9, 2018

Member

@gmariani to be able to save utf8 4byte charaters correctly we need to do a schema change. As this will affect potentially very big tables we can do that with major release only. See #8790

Member

sgiehl commented May 9, 2018

@gmariani to be able to save utf8 4byte charaters correctly we need to do a schema change. As this will affect potentially very big tables we can do that with major release only. See #8790

@gmariani

This comment has been minimized.

Show comment
Hide comment
@gmariani

gmariani May 9, 2018

Ok, i get that the collation/schema needs to change to SUPPORT mb4. But I thought the solution was to modify the text as to avoid the need for mb4 in the first place? That the emoji would be replaced with the � character? Which would mean it's not doing it's job properly. Is that incorrect?

Looking at the error message closer, i do see the � in there, maybe it's not properly replacing the entirety of the emoji?

gmariani commented May 9, 2018

Ok, i get that the collation/schema needs to change to SUPPORT mb4. But I thought the solution was to modify the text as to avoid the need for mb4 in the first place? That the emoji would be replaced with the � character? Which would mean it's not doing it's job properly. Is that incorrect?

Looking at the error message closer, i do see the � in there, maybe it's not properly replacing the entirety of the emoji?

@gmariani

This comment has been minimized.

Show comment
Hide comment
@mattab

This comment has been minimized.

Show comment
Hide comment
@mattab

mattab May 9, 2018

Member

@gmariani as it is not supposed to trigger an error, could you please paste in a new issue (this one is already closed), the piwik.php?.... request that creates this error? We will make sure to address this. Thanks

Member

mattab commented May 9, 2018

@gmariani as it is not supposed to trigger an error, could you please paste in a new issue (this one is already closed), the piwik.php?.... request that creates this error? We will make sure to address this. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment