search engine results encoding has changed #2761

anonymous-piwik-user opened this Issue Nov 5, 2011 · 8 comments

3 participants


File: SearchEngines.php

Original (shows incorrect encoding):
'' => array('Mailru', 'q', 'search?q={k}', 'windows-1251'),

I changed to:
'' => array('Mailru', 'q', 'search?rch=e&q={k}'),

And now it seems to work correctly.


(In [5413]) fixes #2761 - confirmed that search results are now utf-8


As for now, uses UTF-8 in most cases. But rarely it still uses windows-1251 too.

I had to change extractSearchEngineInformationFromUrl function in /core/Common.php

    && isset($searchEngines[$refererHost][3]))
    // accepts string, array or comma separated list string in preferred order
    if (!is_array($searchEngines[$refererHost][3]))
        $charsets = explode(',', $searchEngines[$refererHost][3]);
        $charsets = $searchEngines[$refererHost][3];

        $charset = mb_detect_encoding($key, $charsets);
        if ($charset === false)
            $charset = $charsets[0];

        $newkey = @iconv($charset, 'UTF-8//IGNORE', $key);
            $key = $newkey;

It works with

'' => array('Mailru', 'q', 'search?q={k}', array('UTF-8', 'windows-1251')),

in /core/DataFiles/SearchEngines.php


Thanks for the patch.

I don't think we need to support comma separated list. We do have to check for mbstring and have a unit test.


Comma separated list is already supported by mb_detect_encoding.

By the way, mb_strtolower is already used in Common.php (in original Piwik code in the extractSearchEngineInformationFromUrl function) without any checks tests.


Can you provide a sample referrer url with windows-1251 encoding?

I've done some refactoring and added some more tests, but can never have enough.


Awesome! Thanks!


(In [5682]) fixes #2761

@anonymous-piwik-user anonymous-piwik-user added this to the 1.7 Piwik 1.7 milestone Jul 8, 2014
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment