Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibly broken encoding & rule in tablet.yml #5355

Closed
matisojka opened this issue Jun 21, 2015 · 5 comments
Closed

Possibly broken encoding & rule in tablet.yml #5355

matisojka opened this issue Jun 21, 2015 · 5 comments

Comments

@matisojka
Copy link

Hello piwik team,

we've been trying to pull the latest test fixtures and found that parsing the YAML from tablet.yml fails because of some wrong characters. The line that causes the trouble is this one: https://github.com/piwik/device-detector/blob/master/Tests/fixtures/tablet.yml#L10557

The A's with different diacritics seem a bit weird to me for a user agent:

(KHTML, �º�°�º Gecko)

Could you please take a look at it?

@matisojka matisojka changed the title Possibly broken encoding / rule in tablet.yml Possibly broken encoding & rule in tablet.yml Jun 21, 2015
@sgiehl
Copy link
Member

sgiehl commented Jun 21, 2015

That is a test fixture. And even if it is unlikely for useragents to conatin such weird characters it is still "valid". Btw. most of that test fixtures are copied from log files.

Maybe you need to parse that file with another encoding.
Not sure, where the problem is, as our tests are all running...

@sgiehl sgiehl closed this as completed Jul 12, 2015
@benzimmer
Copy link
Contributor

Unfortunately this seems to confuse all libyaml based parsers.

The characters in question are now in https://github.com/piwik/device-detector/blob/master/Tests/fixtures/tablet-1.yml#L4295 and seem to be control characters: http://www.fileformat.info/info/unicode/char/0090/index.htm

See here for some errors of parsers for different languages:

Python: https://yaml-online-parser.appspot.com/?url=https%3A%2F%2Fgithub.com%2Fpiwik%2Fdevice-detector%2Fraw%2Fmaster%2FTests%2Ffixtures%2Ftablet-1.yml
Ruby (sorry, had to use a screenshot):
screen shot 2016-07-16 at 12 39 32

Seems like the PHP yaml parser your are using is a little bit more forgiving than the libyaml based ones.

It would be really cool if we could solve this problem together (I'm not a unicode specialist unfortunately :) ), because I think the database you have built here is quite an achievement and is really useful for other projects as well. We are using it for our Ruby implementation of device_detector here for example: https://github.com/podigee/device_detector

@sgiehl
Copy link
Member

sgiehl commented Jul 16, 2016

As mentioned this useragent was taken from a log file, so a parser should be able to handle it.
But we could add a test containing non unicode characters within the php tests directly, and remove this one from the list here

@sgiehl sgiehl reopened this Jul 16, 2016
@matisojka
Copy link
Author

matisojka commented Sep 2, 2016

Any progress on this issue? We are unable to parse the test fixture yaml due to the encoding issue in the Ruby Port of DeviceDetector.

Currently, the file is called tablet-1.yml

@sgiehl
Copy link
Member

sgiehl commented Sep 3, 2016

Sorry, I forgot about this. Will fix it now.

@sgiehl sgiehl closed this as completed in 33bb26e Sep 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants