Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Piwik fails to properly decode and store some chinese keywords (eg. from baidu.com) #589

Closed
mattab opened this Issue · 3 comments

2 participants

@mattab
Owner

Baidu is the biggest search engine in China and currently Piwik fails detecting keywords from baidu.

Example queries:

```

http://www.baidu.com/s?lm=0&si=&rn=10&ie=gb2312&ct=0&wd=%BF%DA%D3%EF+%CD%F2%C4%DC&pn=10&ver=0&cl=3&uim=0&usm=0

http://www.baidu.com/s?kw=&sc=web&cl=3&tn=sitehao123&ct=0&rn=&lm=&ie=gb2312&rs2=&myselectvalue=&f=&pv=&z=&from=&word=%B7%E8%BF%F1%CB%B5%D3%A2%D3%EF+%D4%DA%CF%DF%B9%DB%BF%B4 http://www.baidu.com/s?wd=%C1%F7%D0%D0%C3%C0%D3%EF%CF%C2%D4%D8 http://www.baidu.com/s?wd=%C1%F7%D0%D0%C3%C0%D3%EF+%CF%C2%D4%D8&lm=0&si=&rn=10&ie=gb2312&ct=0&cl=3&f=1&rsp=3&oq=VOA%C1%F7%D0%D0%C3%C0%D3%EF http://web.gougou.com/search?search=%e6%b5%81%e8%a1%8c%e7%be%8e%e8%af%ad%20%e4%b8%8b%e8%bd%bd ```

Resolving this issue involves writing unit test to cover these bits of code. Also we should check whether the code path around line 715 in core/Tracker/Visit.php is useful, if not fix it or delete it.

@robocoder

The problems with baidu might be more complex than at first glance:

  • the second url uses the variable name “word” instead of “wd”
  • gb2312 is an encoding; are the keywords not utf-8?
@mattab
Owner

also see #435 which is very related

@mattab
Owner

(In 1014) – cleaning up the search engine parsing code, adding tests, recording UTF8 keywords in the DB rather than encoded (as tables are now utf8, refs #5730) - adding tests in url.test.php and fixed double encoding in some edge cases - fixed #589 Piwik fails to properly decode and store some chinese keywords (eg. from baidu.com) - fixed #435 Exotic encoded keywords should be stored as utf-8 in the DB - refs #575 hopefully fixed, will give it a few days of tests on piwik.org

@mattab mattab added this to the RobotRock milestone
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.