New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possibly broken encoding & rule in tablet.yml #5355
Comments
That is a test fixture. And even if it is unlikely for useragents to conatin such weird characters it is still "valid". Btw. most of that test fixtures are copied from log files. Maybe you need to parse that file with another encoding. |
Unfortunately this seems to confuse all libyaml based parsers. The characters in question are now in https://github.com/piwik/device-detector/blob/master/Tests/fixtures/tablet-1.yml#L4295 and seem to be control characters: http://www.fileformat.info/info/unicode/char/0090/index.htm See here for some errors of parsers for different languages: Python: https://yaml-online-parser.appspot.com/?url=https%3A%2F%2Fgithub.com%2Fpiwik%2Fdevice-detector%2Fraw%2Fmaster%2FTests%2Ffixtures%2Ftablet-1.yml Seems like the PHP yaml parser your are using is a little bit more forgiving than the libyaml based ones. It would be really cool if we could solve this problem together (I'm not a unicode specialist unfortunately :) ), because I think the database you have built here is quite an achievement and is really useful for other projects as well. We are using it for our Ruby implementation of device_detector here for example: https://github.com/podigee/device_detector |
As mentioned this useragent was taken from a log file, so a parser should be able to handle it. |
Any progress on this issue? We are unable to parse the test fixture yaml due to the encoding issue in the Ruby Port of DeviceDetector. Currently, the file is called |
Sorry, I forgot about this. Will fix it now. |
Hello piwik team,
we've been trying to pull the latest test fixtures and found that parsing the YAML from
tablet.yml
fails because of some wrong characters. The line that causes the trouble is this one: https://github.com/piwik/device-detector/blob/master/Tests/fixtures/tablet.yml#L10557The A's with different diacritics seem a bit weird to me for a user agent:
Could you please take a look at it?
The text was updated successfully, but these errors were encountered: