Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upservo needs a user agent string #4331
Comments
Let’s try to just send https://wiki.mozilla.org/B2G/User_Agent |
|
I think Note, IE11 just added If people are dogfooding I think it would be worthwhile to track site problems that may come up as a result of the UA string. I'm happy to help diagnose these kinds of issues. As I mentioned to @metajack and @dherman in Portland, you guys could use https://webcompat.com/ to report site issues for Servo (which is just a shiny front-end to https://github.com/webcompat/web-bugs/issues/), or we can stand up another instance pointing to this repo or something like servo/web-bugs. |
|
As UA String module owner, I want to say that this is a difficult problem, and one which will require much thought and testing. My initial back-of-the-envelope suggestion is something like: Mozilla/5.0 (X11; Linux i686) Servo/XX.XX (like Gecko) Firefox/36.0 might be the right sort of thing - with all the appropriate variants for platform, mobile, tablet etc. But there are lots of questions to investigate:
Some of these will need compat testing using tools the webcompat team have. I think we will need to distinguish Servo from Gecko, because they will inevitably have different bugs. But, as other browsers appear to have found out, to get standards-compliant code we probably need a "like Gecko". Gerv |
|
Here's some quick-and-dirty numbers that may be helpful as you think about this. I dug up some old scripts and ran a quick test with the following UAs:
Using each UA string, the HTML content was downloaded from each site on the Alexa Global Top 1000 list. Then, this HTML content using each UA was compared to the others using Python's difflib.quick_ratio() function. HTML that was completely different would get a score of 0.0, while content that was identical would receive a score of 1.0. By comparing scores of between various browsers, you can get a pretty good idea of how sites treat different UAs. Do they send them mobile content? Desktop? WAP? All the same? Here are the average difflib scores for each combination of UAs. So, for example, the first row says that, on average, the content sent to Desktop Firefox (UA 0), and Desktop MSIE (UA 1) had an average difflib score of .972 across the 1000 sites that were tested. And so on...: {(0, 1): (0.972, 1000), The ones we are interested in are UAs 5, 6, and 7, and how they compare to other existing browser UAs: (0, 5): (0.979, 1000), We can see that the Servo Desktop Proposal 1 (5) got content quite close to what was sent to Desktop Firefox (0), Desktop MSIE (1), Desktop Chrome (3) and Desktop Safari (4). As for the mobile versions, we see that the Servo Mobile Proposal 1 (6) got better scores when compared to Mobile Safari (2) than the second option did (7). This type of analysis is useful for high-level sanity checks. I can clean up the code and post it somewhere if you'd like. Let me know. |
|
John: it seems that with the UA you picked, you have more variables than you want. You are comparing Firefox Desktop on Mac with IE on Windows and Servo on Linux! There may therefore be other reasons for the variants other than the rendering engine information. We should try and do this controlling for all other variables (which probably means using Windows as our standard UA OS). For everyone else: it's worth noting that the difflib scores would never be 1.0, even between two runs with an identical UA, because of page-variable content like ads. While this analysis is useful, it's a blunt tool. Having said all that, it seems like something like Servo Desktop Proposal 1 might be a winner on desktop. It seems to get desktop content pretty often. I'd be interested to see what happened if we dropped the "like Gecko" - does that get us noticeably further away from desktop content? In order to work out what to do on Mobile, I think we definitely need the Firefox for Android and current B2G UAs in the mix, as that's what the Servo UAs are based on and that gives us the best guess for the sort of content we want. (We don't want Webkit-specific stuff.) John: when you do another run, can you include both of those UAs? Fx for Android is: Mozilla/5.0 (Android; Mobile; rv:12.0) Gecko/12.0 Firefox/12.0 |
|
@jjnsn Instead of the average, could you post somewhere the scatter plots for each of them. It will give I guess a better understanding on the data quality. It's a cool technique you have used as a first approximation. Something I didn't completely get from your explanation. You said:
Do you mean just an For the |
|
This discussion seems to have stalled... jjnsn: are you planning to run some more tests? Gerv |
|
Hi all, Apologies -- I had been playing with some mail filters and all the notifications from this issue had disappeared from my inbox! Some responses:
You'd be surprised, actually. There's are more 1.0s than I initially expected. It's partly due to the crude nature of the technique.
OK, I can rerun this.
OK.
I think you mean frequency distributions/histograms. Yes, I can do that.
It's just an
Yes. I've used that in the past with WebKit/Gecko comparisons. Unfortunately I'm not in the position to do screenshots with Servo. I'll be able to look at this stuff later this week. Apologies for the delay. John |
|
No issue. Thanks a lot for the answers.
|
I’m not sure what you mean by behave the same, but if sites are doing feature detection (as they should!) they can definitely test for a feature that Firefox implements and Servo doesn’t. |
|
@SimonSapin sorry for the poor writing, but your answer gave me enough hints. |
|
Hi there, Here's the latest run data. Sections below:
|
what was measured? :) |
|
:) OK, for clarity's sake:
|
|
@jjnsn Thanks! Really cool. Did you follow the HTTP redirection (301, 302, etc.)? Just to know if we are measuring the diff of the first request or the final destination of the HTTP requests. Checking the COV:
It seems logical because people for redirecting most of the time do it on Android keyword or not and if done on Mobile for Firefox they I'm still a bit worried on the JS part. We need to test that. |
Yup. It just uses the requests library, which follows by redirects by default and returns them in its response object. It also looks for meta refreshes: def get_meta_refresh(html):
print 'Checking to see if there is a refresh...'
#BeautifulSoup is slow, do a quick check to see if it is necessary
if 'http-equiv' in html.lower():
print '\tthere is a http-equiv'
soup = BeautifulSoup(html)
metas = soup.findAll('meta')
for m in metas:
if m.has_key('http-equiv'):
if m['http-equiv'] == 'refresh':
if m.has_key('content'):
url = m['content']
if re.match('\d+;.*', url):
murl = ''.join(url.split(';')[1:]).strip()
if murl.lower().startswith('url='):
murl = murl[4:]
print '\treturning',murl
return murl
return FalseI really should just clean up this code and submit it. Will try to in the coming days.
Yup. IMHO I think this approach is OK for coarse-level decisions ("This one will won't work at all"), but for small tweaks it would be worthwhile to do something more robust that looks at Javascript, screenshots, etc. |
|
Hi jjnsn, Thanks for doing all this work. There are some issues with the strings chosen which, if we could fix, might help us get clearer comparisons. For example, I think it would help reduce noise in the data if you used the same OS as much as possible. At the moment, the base Firefox string (string 0) is Mac, but the Servo test strings (5 and 6) are X11. Strings 7 and 9: rv values normally match Firefox values, so a better version of a string we'd use for Servo on mobile would have rv:36.0, not rv:1.0. Strings 5 and 6, which are proposed desktop Servo strings, are missing the rv: token altogether. We should add it back in (as well as changing the OS to match the base desktop string, string 0). |
|
Still, doing some analysis is quite possible. We start by defining a goal, which I suggest should be: we want the same content as Gecko Firefox gets on the same platform. If that's the goal, then:
Overall, replacing the Gecko token with a Servo token and otherwise leaving the UA unchanged seems like a fine starting strategy for setting Servo's UA. We may also want to enhance the dynamic UA system we've built for B2G so that it can provide dynamic UAs for Servo-based browsers too. Gerv |
No problem. The easiest thing to do is to just create an etherpad or something where each UA string is listed on a separate lines. Figure out which ones you'd like to test, put them in the list, point me at it and I'll update the issue here, no problem. |
|
@gerv started a list on the etherpad. https://etherpad.mozilla.org/uaservostats |
|
karlcow: I've updated the list in the Etherpad. Do you have further tweaks? |
|
@gerv I guess that sounds right. +1 for the postfix. |
Actually, just a CRLF-separated list would have been easier. No biggie, though, I'll just use dict.items() Otherwise, is this ready to go? |
|
Fine by me. Thank you :-) Gerv |
|
Here you go.
|
|
That data seems to support the suggestion that the right UA for Servo is the same one as used with Gecko, but simply replacing the Gecko string with a Servo string, and making no other changes. 0 vs 1: Android Gecko vs. Servo. Mean: 0.977 All of these figures sem high enough to me. |
|
This data does look great! We would accept a patch that gives Servo a real user agent string, as suggested by @gerv, with the one caveat that it should also land or at least log issues for any new breakage introduced in sites we have tried to fix like wikipedia, github, CNN, reddit, etc. due to the presence of different content. |
|
As an (interesting?) data point, presto based Opera, after playing the UA string compatibility charade for years, eventually gave on the convoluted UA strings, and went with this: Opera/9.50 (Macintosh; Intel Mac OS X; U; en) Which later evolved into: Opera/9.61 (Macintosh; Intel Mac OS X; U; en) Presto/2.1.1 Which later evolved into this final form, needed because some UA string sniffers would barf on a 2 digit version number after "Opera/": Opera/9.80 (Macintosh; Intel Mac OS X; U; en) Presto/2.6.30 Version/10.61 On the other hand, through this period, Opera also maintained a list of domains for which UA string spoofing was needed. Having that gives you a bit more freedom to break things with your default UA string. Regardless of which UA string servo ends up going with, I think such a list is important to have, as there will be misbehaving UA string sniffers no matter what you do, and it is important to be able to work around them. (As a side note, Opera also maintained a stack of custom js patches to fix misbehaving sites, which Servo, as a minority market-share browser, might want to take inspiration from. This is unfortunately somewhat resource intensive, but you can do as little or as much as you want, and if the alternative is broken sites... https://dev.opera.com/blog/opera-s-site-patching/) |
Add the Servo User Agent strings Fixes #4331. I've tested this out on desktop and Android on "the usual" sites (reddit, cnn, github, wikipedia, etc.). r? @mbrubeck @metajack <!-- Reviewable:start --> [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/servo/servo/7143) <!-- Reviewable:end -->
|
This probably isn't helpful, but perhaps including the commit hash in the user agent string that the servo engine was built from for testing purposes would be helpful for tracking down bugs? It would probably break a bunch of stuff though, so this is more of a what-if than a serious suggestion. |
|
This would make fingerprinting Servo users pretty easy unfortunately. Browser.html does include a bug reporter that includes relevant info though, but it's done opt-in by the user. |
Currently we send no User-Agent header, unless a specific user agent string is provided on the command line.
What should the user agent string be now when we are experimenting with minimal shells? Is it possible to escape the current practice of pretending to be everyone? What should the string be in the a future where Servo can live inside Firefox?
Note that this is not a new issue, but I'm filing it to have a public place to focus discussion.