-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding on Swedish is broken #10
Comments
Hey Jan, thanks for the report. We just noticed this yesterday and I'm looking into it now. It appears to be an upstream encoding change, as we've pushed no code changes in the past week or so. Not sure if there was an Labs announcement about this, but it is pretty inconvenient either way, sorry about that. When we fix it I'll regenerate the old pages and let you know here. |
(and definitely don't go look at Chinese, it's a disaster) |
I can now confirm, the upstream service data encoding broke this week: Here is the raw data for the 26th. If you |
I'm as surprised by this as you are. There was a deployment on Monday which allows you to pass in uri-encoded titles into the per-article endpoint (not the one you use). The other thing we added is to specify utf8 in the content-type header. But that was released a while ago and should have only helped with this issue. So if this was fine on Wednesday but broken Friday, then maybe the issue is in the front-end restbase instance that proxies requests to us. I will take a look but sadly am away from the laptop until tomorrow night. I don't have access to phabricator so if someone could add an unbreak-now task and tag with Analytics, that'd be useful. Thanks and sorry for the annoyance. |
The characters look garbled when requesting directly from the backend from within the cluster: However, characters are fine both internally and externally for older dates: https://wikimedia.org/api/rest_v1/metrics/pageviews/top/sv.wikipedia/all-access/2016/01/26 This suggests that something broke the top-title data stored to Cassandra recently. |
Same or Latvian: http://top.hatnote.com/lv/ |
Created a separate issue for the RSS variety, but we're still waiting on the fix from WMF Analytics team for the encoding. |
The important comment is https://phabricator.wikimedia.org/T128295#2074948 On Tue, Mar 1, 2016 at 3:25 AM, Gabriel Wicke notifications@github.com
|
We've regenerated Feb. 23 - 29, now that the pageview API has properly encoded titles for that period. |
The letters å, ä and ö are not displayed correctly, as can be seen on place 3, 9 and 29 (amongst others) here: http://top.hatnote.com/sv/wikipedia/2016/2/24.html
The text was updated successfully, but these errors were encountered: