Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Exotic encoded keywords should be stored as utf-8 in the DB #435

Closed
anonymous-piwik-user opened this Issue · 3 comments

2 participants

Anonymous Piwik user Matthieu Aubry
Anonymous Piwik user

Currently keywords are stored encoded in the mysql log table.

Code is around: https://github.com/piwik/piwik/blob/master/core/Tracker/Visit.php#L693

For some search engine, like yandex.ru, keywords are encoded in the URL. Piwik should have, for each search engine encoding keywords, the encoding used, and Piwik should only store utf-8 valid keywords in the log table.

This would fix two bugs:

  • Russian keywords from the most popular russian search-engine ‘yandex.ru’ are shown like number of questions in the UI.
  • Searching for a keyword using piwik in-table search, would also work for exotic keyword. Currently you search for "" piwik will look for such a keyword in the list, but it won’t look for the encoded value of this keyword. It expects the keyword to be stored at the right format.

If you encounter this bug, please report any example URL of a search in a search engine that doesn’t work well with Piwik. We need more example to solve this bug. thanks! Keywords: 0.2.23

Anonymous Piwik user

First row are showed like number of questions. Two next rows are displayed correctly. ``` INSERT INTO `piwik_log_visit` VALUES (190, 1, ‘22:06:13’, ‘cefdd83aa209ddb0629b69f93bd21833’, 1, ‘2008-11-25 20:09:37’, ‘2008-11-25 20:53:31’, ‘2008-11-25’, 844, 1, 13, 2634, 2, ‘Yandex’, ‘http://yandex.ru/yandsearch?rpt=rad&text=%F1%EF%EE%F0%F2%E7%E4%F0%E0%E2’, ‘%f1%ef%ee%f0%f2%e7%e4%f0%e0%e2’, ‘2a4878b5dc06902b8d797475aca2cf88’, ‘WXP’, ‘FF’, ‘2.0’, ‘1280×1024’, 1, 1, 1, 0, 0, 0, 1, 1, 1334750774, ‘ru,en-us;q=0.7,en;q=’, ‘ru’, ‘asi’, ‘rus-com.net’); INSERT INTO `piwik_log_visit` VALUES (421, 2, ‘11:22:44’, ‘de7439d63b66c5cc3ba4be30d202d5e1’, 0, ‘2008-11-26 09:24:52’, ‘2008-11-26 09:25:22’, ‘2008-11-26’, 1695, 1695, 3, 30, 2, ‘Yandex’, ‘http://yandex.ru/yandsearch?text=%D0%BA%D0%B0%D0%BA+%D0%B7%D0%B0%D0%BA%D0%B0%D0%B7%D1%8B%D0%B2%D0%B0%D1%82%D1%8C+%D0%BD%D0%B0+juno&stpar2=%2Fh1%2Ftm24%2Fs4&stpar4=%2Fs4&stpar1=%2Fu0’, ‘%d0%ba%d0%b0%d0%ba+%d0%b7%d0%b0%d0%ba%d0%b0%d0%b7%d1%8b%d0%b2%d0%b0%d1%82%d1%8c+%d0%bd%d0%b0+juno’, ‘9a3f5eed8c75404aa5ee552dbc0eb69a’, ‘WXP’, ‘FF’, ‘3.0’, ‘1680×1050’, 0, 1, 1, 0, 1, 1, 1, 1, 1048951584, ‘ru,en-us;q=0.7,en;q=’, ‘ru’, ‘asi’, ‘ufacom.ru’); INSERT INTO `piwik_log_visit` VALUES (623, 2, ‘19:02:07’, ‘f195e9f77e961b4471b2e35682b7d00d’, 0, ‘2008-11-26 19:01:10’, ‘2008-11-26 19:01:10’, ‘2008-11-26’, 2157, 2157, 1, 10, 2, ‘Yandex’, ‘http://yandex.ru/yandsearch?text=%D1%87%D0%B0%D1%81%D1%82%D0%BE%D1%82%D0%B0+%D1%80%D0%B0%D1%81%D0%BF%D0%B0%D0%B4%D0%B0+%D1%81%D1%82%D0%B5%D0%BA%D0%BB%D0%B0&stpar2=%2Fh1%2Ftm11%2Fs1&stpar4=%2Fs1&stpar1=%2Fu0’, ‘%d1%87%d0%b0%d1%81%d1%82%d0%be%d1%82%d0%b0+%d1%80%d0%b0%d1%81%d0%bf%d0%b0%d0%b4%d0%b0+%d1%81%d1%82%d0%b5%d0%ba%d0%bb%d0%b0’, ‘6e3337843122cd43da5e0f7e35806cdc’, ‘WXP’, ‘IE’, ‘7.0’, ‘1600×900’, 0, 1, 1, 0, 0, 1, 1, 1, 1439330483, ‘ru’, ‘ru’, ‘asi’, ‘lianet.ru’); ```

Anonymous Piwik user

The problem isn’t associated with referer search engine I think, same is happening for visits from google.com if the keywords are georgian. Here are most popular georgian search engines: google.ge, holmes.ge

Matthieu Aubry
Owner

(In 1014) – cleaning up the search engine parsing code, adding tests, recording UTF8 keywords in the DB rather than encoded (as tables are now utf8, refs #5730) - adding tests in url.test.php and fixed double encoding in some edge cases - fixed #589 Piwik fails to properly decode and store some chinese keywords (eg. from baidu.com) - fixed #435 Exotic encoded keywords should be stored as utf-8 in the DB - refs #575 hopefully fixed, will give it a few days of tests on piwik.org

Anonymous Piwik user anonymous-piwik-user added this to the RobotRock milestone
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.