Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

servo needs a user agent string #4331

Closed
metajack opened this issue Dec 11, 2014 · 30 comments
Closed

servo needs a user agent string #4331

metajack opened this issue Dec 11, 2014 · 30 comments
Labels

Comments

@metajack
Copy link
Contributor

@metajack metajack commented Dec 11, 2014

Currently we send no User-Agent header, unless a specific user agent string is provided on the command line.

What should the user agent string be now when we are experimenting with minimal shells? Is it possible to escape the current practice of pretending to be everyone? What should the string be in the a future where Servo can live inside Firefox?

Note that this is not a new issue, but I'm filing it to have a public place to focus discussion.

@SimonSapin
Copy link
Member

@SimonSapin SimonSapin commented Dec 11, 2014

Is it possible to escape the current practice of pretending to be everyone?

Let’s try to just send Servo and find out. Though I’m not very hopeful, given the lengths other browsers go to tweak their UA string just right. See for example:

https://wiki.mozilla.org/B2G/User_Agent
https://bugzilla.mozilla.org/show_bug.cgi?id=782453 "Add site-specific User Agent infrastructure"

@miketaylr
Copy link

@miketaylr miketaylr commented Dec 11, 2014

I think Servo/N and Servo/N Mobile would be interesting experiments. I'm not sure either will succeed, but there's plenty of time to test I think.

Note, IE11 just added rv and like Gecko to their UA string: Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko . But for mobile it's a lot messier: Mozilla/5.0 (Windows Phone 8.1; ARM; Trident/7.0; Touch; rv:11; IEMobile/11.0) like Android 4.1.2; compatible) like iPhone OS 7_0_3 Mac OS X WebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.99 Mobile Safari /537.36) 😢

If people are dogfooding I think it would be worthwhile to track site problems that may come up as a result of the UA string. I'm happy to help diagnose these kinds of issues.

As I mentioned to @metajack and @dherman in Portland, you guys could use https://webcompat.com/ to report site issues for Servo (which is just a shiny front-end to https://github.com/webcompat/web-bugs/issues/), or we can stand up another instance pointing to this repo or something like servo/web-bugs.

@gerv
Copy link

@gerv gerv commented Dec 11, 2014

As UA String module owner, I want to say that this is a difficult problem, and one which will require much thought and testing.

My initial back-of-the-envelope suggestion is something like:

Mozilla/5.0 (X11; Linux i686) Servo/XX.XX (like Gecko) Firefox/36.0

might be the right sort of thing - with all the appropriate variants for platform, mobile, tablet etc. But there are lots of questions to investigate:

  • Can Mozilla/5.0 be eliminated these days? Previous compat testing suggested it might not make much difference. Although it's odd for Mozilla to make the first modern browser which doesn't claim to be Mozilla!
  • Do we need an "rv:" string?
  • Does Servo need a version number separate from the Firefox number?
  • Does the above back-of-the-envelope sketch match various UA string format specs?
  • Does "like Gecko" work for us like I hope it does?

Some of these will need compat testing using tools the webcompat team have.

I think we will need to distinguish Servo from Gecko, because they will inevitably have different bugs. But, as other browsers appear to have found out, to get standards-compliant code we probably need a "like Gecko".

Gerv

@jjnsn
Copy link

@jjnsn jjnsn commented Dec 19, 2014

Here's some quick-and-dirty numbers that may be helpful as you think about this. I dug up some old scripts and ran a quick test with the following UAs:

  • 0: Desktop Firefox
    • Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:37.0) Gecko/20100101 Firefox/37.0
  • 1: Desktop MSIE
    • Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko
  • 2: Mobile Safari
    • Mozilla/5.0 (Windows Phone 8.1; ARM; Trident/7.0; Touch; rv:11; IEMobile/11.0) like Android 4.1.2; compatible) like iPhone OS 7_0_3 Mac OS X WebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.99 Mobile Safari /537.36
  • 3: Desktop Chrome
    • Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36
  • 4: Desktop Safari
    • Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5
  • 5: Servo Desktop Proposal 1
    • Mozilla/5.0 (X11; Linux i686) Servo/1.0 (like Gecko) Firefox/36.0
  • 6: Servo Mobile Proposal 1
    • Mozilla/5.0 (Android; Mobile; rv:1.0) Servo/1.0 Firefox/36.0
  • 7: Servo Mobile Proposal 2
    • Mozilla/5.0 (Mobile; rv:1.0) Servo/1.0 Firefox/36.0

Using each UA string, the HTML content was downloaded from each site on the Alexa Global Top 1000 list. Then, this HTML content using each UA was compared to the others using Python's difflib.quick_ratio() function. HTML that was completely different would get a score of 0.0, while content that was identical would receive a score of 1.0.

By comparing scores of between various browsers, you can get a pretty good idea of how sites treat different UAs. Do they send them mobile content? Desktop? WAP? All the same?

Here are the average difflib scores for each combination of UAs. So, for example, the first row says that, on average, the content sent to Desktop Firefox (UA 0), and Desktop MSIE (UA 1) had an average difflib score of .972 across the 1000 sites that were tested. And so on...:

{(0, 1): (0.972, 1000),
(0, 2): (0.73, 1000),
(0, 3): (0.971, 1000),
(0, 4): (0.969, 1000),
(0, 5): (0.979, 1000),
(0, 6): (0.75, 1000),
(0, 7): (0.82, 1000),
(1, 2): (0.732, 1000),
(1, 3): (0.973, 1000),
(1, 4): (0.976, 1000),
(1, 5): (0.973, 1000),
(1, 6): (0.747, 1000),
(1, 7): (0.816, 1000),
(2, 3): (0.731, 1000),
(2, 4): (0.73, 1000),
(2, 5): (0.729, 1000),
(2, 6): (0.933, 1000),
(2, 7): (0.852, 1000),
(3, 4): (0.973, 1000),
(3, 5): (0.972, 1000),
(3, 6): (0.745, 1000),
(3, 7): (0.818, 1000),
(4, 5): (0.972, 1000),
(4, 6): (0.748, 1000),
(4, 7): (0.816, 1000),
(5, 6): (0.749, 1000),
(5, 7): (0.824, 1000),
(6, 7): (0.88, 1000)}

The ones we are interested in are UAs 5, 6, and 7, and how they compare to other existing browser UAs:

(0, 5): (0.979, 1000),
(0, 6): (0.75, 1000),
(0, 7): (0.82, 1000),
(1, 5): (0.973, 1000),
(1, 6): (0.747, 1000),
(1, 7): (0.816, 1000),
(2, 5): (0.729, 1000),
(2, 6): (0.933, 1000),
(2, 7): (0.852, 1000),
(3, 5): (0.972, 1000),
(3, 6): (0.745, 1000),
(3, 7): (0.818, 1000),
(4, 5): (0.972, 1000),
(4, 6): (0.748, 1000),
(4, 7): (0.816, 1000),

We can see that the Servo Desktop Proposal 1 (5) got content quite close to what was sent to Desktop Firefox (0), Desktop MSIE (1), Desktop Chrome (3) and Desktop Safari (4).

As for the mobile versions, we see that the Servo Mobile Proposal 1 (6) got better scores when compared to Mobile Safari (2) than the second option did (7).

This type of analysis is useful for high-level sanity checks. I can clean up the code and post it somewhere if you'd like. Let me know.

@gerv
Copy link

@gerv gerv commented Dec 22, 2014

John: it seems that with the UA you picked, you have more variables than you want. You are comparing Firefox Desktop on Mac with IE on Windows and Servo on Linux! There may therefore be other reasons for the variants other than the rendering engine information. We should try and do this controlling for all other variables (which probably means using Windows as our standard UA OS).

For everyone else: it's worth noting that the difflib scores would never be 1.0, even between two runs with an identical UA, because of page-variable content like ads. While this analysis is useful, it's a blunt tool.

Having said all that, it seems like something like Servo Desktop Proposal 1 might be a winner on desktop. It seems to get desktop content pretty often. I'd be interested to see what happened if we dropped the "like Gecko" - does that get us noticeably further away from desktop content?

In order to work out what to do on Mobile, I think we definitely need the Firefox for Android and current B2G UAs in the mix, as that's what the Servo UAs are based on and that gives us the best guess for the sort of content we want. (We don't want Webkit-specific stuff.) John: when you do another run, can you include both of those UAs?

Fx for Android is: Mozilla/5.0 (Android; Mobile; rv:12.0) Gecko/12.0 Firefox/12.0
Firefox OS is: Mozilla/5.0 (Mobile; rv:12.0) Gecko/12.0 Firefox/12.0
(obviously not with "12.0" :-)

@karlcow
Copy link

@karlcow karlcow commented Dec 22, 2014

@jjnsn Instead of the average, could you post somewhere the scatter plots for each of them. It will give I guess a better understanding on the data quality. It's a cool technique you have used as a first approximation.

Something I didn't completely get from your explanation. You said:

Using each UA string, the HTML content was downloaded from each site on the Alexa Global Top 1000 list.

Do you mean just an HTTP GET with the UA string or a full DOM once interpreted by the rendering engine Servo? I'm asking because there are multiple types of UA sniffing. Some of them are client sides ;) aka JS. And also sites sending CSS vendor properties not rendered by the other engines.

For the difflib.quick_ratio() you can also use a rendered screenshot of the page instead of the content. It introduces other types of variability such as ads or photos like @gerv mentioned specifically for news sites.

@gerv
Copy link

@gerv gerv commented Mar 26, 2015

This discussion seems to have stalled... jjnsn: are you planning to run some more tests?

Gerv

@jjnsn
Copy link

@jjnsn jjnsn commented Apr 7, 2015

Hi all,

Apologies -- I had been playing with some mail filters and all the notifications from this issue had disappeared from my inbox! Some responses:

it's worth noting that the difflib scores would never be 1.0, even between two runs with an identical UA, because of page-variable content like ads. While this analysis is useful, it's a blunt tool.

You'd be surprised, actually. There's are more 1.0s than I initially expected. It's partly due to the crude nature of the technique.

I'd be interested to see what happened if we dropped the "like Gecko"

OK, I can rerun this.

John: when you do another run, can you include both of those UAs?

OK.

could you post somewhere the scatter plots for each of them

I think you mean frequency distributions/histograms. Yes, I can do that.

Do you mean just an HTTP GET with the UA string or a full DOM once interpreted by the rendering engine Servo?

It's just an HTTP GET. I don't have the time (or, probably, the ability) to instrument Servo to get the full DOM....

you can also use a rendered screenshot of the page instead of the content.

Yes. I've used that in the past with WebKit/Gecko comparisons. Unfortunately I'm not in the position to do screenshots with Servo.

I'll be able to look at this stuff later this week. Apologies for the delay.

John

@karlcow
Copy link

@karlcow karlcow commented Apr 8, 2015

No issue. Thanks a lot for the answers.

Mozilla/5.0

I would love to see if Mozilla/5.0 has still any influences on most scripts. Because most of the UA sniffing scripts I see on client side, I don't remember seeing Mozilla/5.0 (though I have the feeling that testing without it will reveal surprises.)

UA sniffing and redirection

Currently what we see with UA sniffing one or a combination of these:

  1. Server side sniffing HTTP (User-Agent: header)
  2. HTML redirection <meta http-equiv="refresh" content="5;url=http://example.org/there" />
  3. Client side sniffing scripting (navigator.userAgent) with sometimes a lot of subtle variations on version numbers such as Android \d
  4. Client side sniffing based on screen sizes, orientations, and other insanities.

For any UA chosen for servo at the end, we need to have a big prep work (aka before release of any product), a lot in advance for testing all already existing libs and propose modifications if necessary.

It's why I was asking about tests on Servo + DOM and JS.
Question: If we run an experiment with a current Firefox but a pseudo servo UA, would the JS part behaves the same way than Servo would? If yes, maybe we can do something about it for testing in Web Compat. ping @miketaylr @hallvors for FYI.

@SimonSapin
Copy link
Member

@SimonSapin SimonSapin commented Apr 8, 2015

Question: If we run an experiment with a current Firefox but a pseudo servo UA, would the JS part behaves the same way than Servo would?

I’m not sure what you mean by behave the same, but if sites are doing feature detection (as they should!) they can definitely test for a feature that Firefox implements and Servo doesn’t.

@karlcow
Copy link

@karlcow karlcow commented Apr 8, 2015

@SimonSapin sorry for the poor writing, but your answer gave me enough hints.
So I guess we can't really test on Firefox:Gecko + Servo UA, but we really need to test on Firefox:Servo to get a sense of the breakage. Thanks.

@jjnsn
Copy link

@jjnsn jjnsn commented Apr 8, 2015

Hi there,

Here's the latest run data. Sections below:

  • List of all UAs with a numerical identifier for each
  • For each pair of UAs, statistics on the dissimilarity measures between each UA
    • Pair = Which UAs compared
    • N = total number of sites compared (usually less than 1000 due to network/other issues)
    • Min = Minimum difflib ratio
    • Max = Maximum difflib ratio
    • Median = Median difflib ratio
    • Mean = Mean difflib ratio
    • StDev = Standard deviation of difflib ratios
    • CoV = Coefficient of Variation for difflib ratios
Num UA
0 Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:37.0) Gecko/20100101 Firefox/37.0
1 Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko
2 Mozilla/5.0 (Windows Phone 8.1; ARM; Trident/7.0; Touch; rv:11; IEMobile/11.0) like Android 4.1.2; compatible) like iPhone OS 7_0_3 Mac OS X WebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.99 Mobile Safari /537.36
3 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36
4 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5
5 Mozilla/5.0 (X11; Linux i686) Servo/1.0 (like Gecko) Firefox/36.0
6 Mozilla/5.0 (X11; Linux i686) Servo/1.0 Firefox/36.0
7 Mozilla/5.0 (Android; Mobile; rv:1.0) Servo/1.0 Firefox/36.0
8 Mozilla/5.0 (Android; Mobile; rv:26.0) Gecko/26.0 Firefox/26.0
9 Mozilla/5.0 (Mobile; rv:1.0) Servo/1.0 Firefox/36.0
10 Mozilla/5.0 (Mobile; rv:26.0) Gecko/26.0 Firefox/26.0
Pair N Min Max Median Mean StDev CoV
(0, 1) 929 0.000 1.000 1.000 0.988 0.071 0.072
(0, 2) 910 0.000 1.000 0.978 0.737 0.321 0.436
(0, 3) 927 0.000 1.000 1.000 0.988 0.062 0.062
(0, 4) 928 0.000 1.000 1.000 0.989 0.063 0.063
(0, 5) 929 0.524 1.000 1.000 0.994 0.030 0.030
(0, 6) 930 0.654 1.000 1.000 0.995 0.027 0.027
(0, 7) 911 0.001 1.000 0.994 0.758 0.317 0.418
(0, 8) 912 0.001 1.000 0.985 0.745 0.319 0.428
(0, 9) 922 0.001 1.000 0.999 0.833 0.294 0.353
(0, 10) 920 0.001 1.000 0.999 0.819 0.300 0.367
(1, 2) 910 0.004 1.000 0.968 0.737 0.320 0.435
(1, 3) 927 0.158 1.000 1.000 0.991 0.039 0.040
(1, 4) 929 0.158 1.000 1.000 0.992 0.040 0.041
(1, 5) 930 0.000 1.000 1.000 0.986 0.075 0.076
(1, 6) 930 0.000 1.000 1.000 0.986 0.074 0.075
(1, 7) 911 0.000 1.000 0.992 0.754 0.319 0.424
(1, 8) 911 0.000 1.000 0.983 0.741 0.321 0.433
(1, 9) 921 0.000 1.000 0.999 0.829 0.298 0.360
(1, 10) 920 0.000 1.000 0.999 0.816 0.303 0.372
(2, 3) 909 0.004 1.000 0.973 0.737 0.321 0.435
(2, 4) 910 0.004 1.000 0.976 0.737 0.321 0.436
(2, 5) 910 0.000 1.000 0.978 0.736 0.321 0.436
(2, 6) 910 0.000 1.000 0.971 0.736 0.321 0.437
(2, 7) 912 0.000 1.000 0.999 0.948 0.162 0.171
(2, 8) 911 0.000 1.000 0.999 0.957 0.150 0.156
(2, 9) 911 0.000 1.000 0.998 0.873 0.243 0.278
(2, 10) 910 0.000 1.000 0.998 0.886 0.234 0.265
(3, 4) 927 0.627 1.000 1.000 0.994 0.020 0.021
(3, 5) 927 0.000 1.000 1.000 0.986 0.068 0.068
(3, 6) 927 0.000 1.000 1.000 0.986 0.067 0.067
(3, 7) 910 0.000 1.000 0.994 0.755 0.321 0.425
(3, 8) 910 0.000 1.000 0.984 0.742 0.322 0.434
(3, 9) 920 0.000 1.000 0.999 0.831 0.298 0.359
(3, 10) 919 0.000 1.000 0.999 0.816 0.304 0.373
(4, 5) 929 0.000 1.000 1.000 0.986 0.069 0.069
(4, 6) 929 0.000 1.000 1.000 0.986 0.067 0.068
(4, 7) 912 0.000 1.000 0.994 0.755 0.321 0.425
(4, 8) 911 0.000 1.000 0.980 0.742 0.322 0.434
(4, 9) 922 0.000 1.000 0.999 0.831 0.298 0.359
(4, 10) 920 0.000 1.000 0.999 0.816 0.305 0.373
(5, 6) 930 0.524 1.000 1.000 0.996 0.020 0.020
(5, 7) 911 0.001 1.000 0.992 0.757 0.317 0.418
(5, 8) 911 0.001 1.000 0.987 0.745 0.319 0.428
(5, 9) 921 0.001 1.000 0.999 0.835 0.294 0.351
(5, 10) 920 0.001 1.000 0.999 0.818 0.300 0.367
(6, 7) 911 0.001 1.000 0.994 0.757 0.317 0.419
(6, 8) 912 0.001 1.000 0.984 0.744 0.319 0.429
(6, 9) 922 0.001 1.000 0.999 0.835 0.294 0.352
(6, 10) 920 0.001 1.000 0.999 0.819 0.301 0.367
(7, 8) 913 0.027 1.000 1.000 0.980 0.097 0.099
(7, 9) 913 0.005 1.000 0.999 0.903 0.217 0.240
(7, 10) 911 0.005 1.000 0.999 0.895 0.222 0.248
(8, 9) 913 0.005 1.000 0.999 0.890 0.229 0.258
(8, 10) 912 0.005 1.000 0.999 0.907 0.214 0.236
(9, 10) 923 0.020 1.000 1.000 0.977 0.103 0.105
@karlcow
Copy link

@karlcow karlcow commented Apr 9, 2015

statistics on the dissimilarity measures between each UA

what was measured? :)

@jjnsn
Copy link

@jjnsn jjnsn commented Apr 9, 2015

:)

OK, for clarity's sake:

Using each UA string, the HTML content was downloaded from each site on the Alexa Global Top 1000 list. Then, this HTML content using each UA was compared to the others using Python's difflib.quick_ratio() function. HTML that was completely different would get a score of 0.0, while content that was identical would receive a score of 1.0.

By comparing scores of between various browsers, you can get a pretty good idea of how sites treat different UAs.

@karlcow
Copy link

@karlcow karlcow commented Apr 9, 2015

@jjnsn Thanks! Really cool.

Did you follow the HTTP redirection (301, 302, etc.)? Just to know if we are measuring the diff of the first request or the final destination of the HTTP requests.

Checking the COV:

  • 0.236 Firefox OS vs Firefox Android (Gecko) (8,10) this is the classical Web Compat issue we have currently. aka redirecting to Mobile or not.
  • 0.240 Firefox OS vs Firefox Android (Servo) (7, 9) Pretty similar.

It seems logical because people for redirecting most of the time do it on Android keyword or not and if done on Mobile for Firefox they mobile.*firefox instead of mobile.*gecko So I guess here it's a benefit (while it's painful for other products using Gecko such as https://bugzilla.mozilla.org/show_bug.cgi?id=334967 )

I'm still a bit worried on the JS part. We need to test that.

@jjnsn
Copy link

@jjnsn jjnsn commented Apr 9, 2015

Did you follow the HTTP redirection (301, 302, etc.)?

Yup. It just uses the requests library, which follows by redirects by default and returns them in its response object. It also looks for meta refreshes:

def get_meta_refresh(html):
    print 'Checking to see if there is a refresh...'
    #BeautifulSoup is slow, do a quick check to see if it is necessary
    if 'http-equiv' in html.lower():
        print '\tthere is a http-equiv'
        soup = BeautifulSoup(html)
        metas = soup.findAll('meta')
        for m in metas:
            if m.has_key('http-equiv'):
                if m['http-equiv'] == 'refresh':
                    if m.has_key('content'):
                        url = m['content']
                        if re.match('\d+;.*', url):
                            murl = ''.join(url.split(';')[1:]).strip()
                            if murl.lower().startswith('url='):
                                murl = murl[4:]
                            print '\treturning',murl
                            return murl
    return False

I really should just clean up this code and submit it. Will try to in the coming days.

I'm still a bit worried on the JS part. We need to test that.

Yup. IMHO I think this approach is OK for coarse-level decisions ("This one will won't work at all"), but for small tweaks it would be worthwhile to do something more robust that looks at Javascript, screenshots, etc.

@gerv
Copy link

@gerv gerv commented Apr 9, 2015

Hi jjnsn,

Thanks for doing all this work. There are some issues with the strings chosen which, if we could fix, might help us get clearer comparisons.

For example, I think it would help reduce noise in the data if you used the same OS as much as possible. At the moment, the base Firefox string (string 0) is Mac, but the Servo test strings (5 and 6) are X11.

Strings 7 and 9: rv values normally match Firefox values, so a better version of a string we'd use for Servo on mobile would have rv:36.0, not rv:1.0.

Strings 5 and 6, which are proposed desktop Servo strings, are missing the rv: token altogether. We should add it back in (as well as changing the OS to match the base desktop string, string 0).

@gerv
Copy link

@gerv gerv commented Apr 9, 2015

Still, doing some analysis is quite possible. We start by defining a goal, which I suggest should be: we want the same content as Gecko Firefox gets on the same platform.

If that's the goal, then:

  • Comparing strings 9 and 10 shows us that string 9 could be a reasonable UA for Servo on B2G as it gets much the same content as standard B2G (difflib 0.977)
  • Comparing strings 7 and 8 shows us that string 7 could be a reasonable UA for Servo on Android as it gets much the same content as Firefox Gecko on Android (difflib 0.980)
  • Comparing strings 5 and 6 shows us that "like Gecko" doesn't seem to make too much difference (difflib 0.996)
  • Comparing strings 0 and 6 (or 5) shows us that string 6 could be a reasonable UA for Servo on Desktop as it gets much the same content as Firefox Gecko Desktop (difflib 0.995)

Overall, replacing the Gecko token with a Servo token and otherwise leaving the UA unchanged seems like a fine starting strategy for setting Servo's UA. We may also want to enhance the dynamic UA system we've built for B2G so that it can provide dynamic UAs for Servo-based browsers too.

Gerv

@jjnsn
Copy link

@jjnsn jjnsn commented Apr 9, 2015

There are some issues with the strings chosen which, if we could fix, might help us get clearer comparisons.

No problem. The easiest thing to do is to just create an etherpad or something where each UA string is listed on a separate lines. Figure out which ones you'd like to test, put them in the list, point me at it and I'll update the issue here, no problem.

@karlcow
Copy link

@karlcow karlcow commented Apr 10, 2015

@gerv started a list on the etherpad.
@jjnsn as you are using python, I made it a dictionary.

https://etherpad.mozilla.org/uaservostats
We can adjust the list.

@gerv
Copy link

@gerv gerv commented Apr 10, 2015

karlcow: I've updated the list in the Etherpad. Do you have further tweaks?

@karlcow
Copy link

@karlcow karlcow commented Apr 10, 2015

@gerv I guess that sounds right. +1 for the postfix.

@jjnsn
Copy link

@jjnsn jjnsn commented Apr 13, 2015

as you are using python, I made it a dictionary.

Actually, just a CRLF-separated list would have been easier. No biggie, though, I'll just use dict.items()

Otherwise, is this ready to go?

@gerv
Copy link

@gerv gerv commented Apr 14, 2015

Fine by me. Thank you :-)

Gerv

@jjnsn
Copy link

@jjnsn jjnsn commented Apr 15, 2015

Here you go.

Num UA
0 Mozilla/5.0 (Android; Mobile; rv:37.0) Gecko/37.0 Firefox/37.0
1 Mozilla/5.0 (Android; Mobile; rv:37.0) Servo/1.0 Firefox/37.0
2 Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:37.0) Gecko/20100101 Firefox/37.0
3 Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:37.0) Servo/1.0 Firefox/37.0
4 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36
5 Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5
6 Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0
7 Mozilla/5.0 (Mobile; rv:37.0) Servo/1.0 Firefox/37.0
8 Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko
9 Mozilla/5.0 (Windows Phone 8.1; ARM; Trident/7.0; Touch; rv:11; IEMobile/11.0) like Android 4.1.2; compatible) like iPhone OS 7_0_3 Mac OS X WebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.99 Mobile Safari/537.36
Pair N Min Max Median Mean StDev CoV
(0, 1) 894 0.026 1.000 0.999 0.977 0.108 0.110
(0, 2) 895 0.001 1.000 0.958 0.730 0.327 0.448
(0, 3) 896 0.001 1.000 0.958 0.728 0.327 0.450
(0, 4) 898 0.000 1.000 0.962 0.728 0.330 0.453
(0, 5) 898 0.000 1.000 0.963 0.728 0.330 0.453
(0, 6) 894 0.006 1.000 0.998 0.887 0.235 0.265
(0, 7) 895 0.001 1.000 0.998 0.879 0.244 0.277
(0, 8) 895 0.000 1.000 0.956 0.727 0.329 0.452
(0, 9) 897 0.000 1.000 0.999 0.954 0.153 0.160
(1, 2) 893 0.001 1.000 0.984 0.744 0.324 0.435
(1, 3) 892 0.001 1.000 0.984 0.743 0.324 0.435
(1, 4) 893 0.000 1.000 0.985 0.741 0.327 0.441
(1, 5) 895 0.000 1.000 0.985 0.742 0.326 0.439
(1, 6) 888 0.006 1.000 0.998 0.898 0.224 0.249
(1, 7) 893 0.001 1.000 0.998 0.894 0.229 0.256
(1, 8) 895 0.000 1.000 0.983 0.741 0.326 0.439
(1, 9) 891 0.000 1.000 0.999 0.946 0.164 0.174
(2, 3) 898 0.303 1.000 0.999 0.993 0.039 0.039
(2, 4) 896 0.000 1.000 0.999 0.985 0.078 0.079
(2, 5) 898 0.000 1.000 0.999 0.985 0.080 0.082
(2, 6) 892 0.001 1.000 0.998 0.820 0.303 0.369
(2, 7) 898 0.001 1.000 0.998 0.820 0.303 0.370
(2, 8) 898 0.000 1.000 0.999 0.985 0.087 0.088
(2, 9) 896 0.000 1.000 0.950 0.724 0.328 0.453
(3, 4) 898 0.000 1.000 0.999 0.984 0.080 0.081
(3, 5) 895 0.000 1.000 0.999 0.984 0.080 0.081
(3, 6) 894 0.001 1.000 0.998 0.819 0.303 0.370
(3, 7) 898 0.001 1.000 0.998 0.821 0.304 0.370
(3, 8) 897 0.000 1.000 0.999 0.984 0.086 0.087
(3, 9) 894 0.000 1.000 0.952 0.725 0.327 0.452
(4, 5) 898 0.020 1.000 0.999 0.990 0.061 0.062
(4, 6) 893 0.000 1.000 0.998 0.817 0.308 0.376
(4, 7) 899 0.000 1.000 0.998 0.821 0.304 0.370
(4, 8) 897 0.006 1.000 0.999 0.987 0.073 0.074
(4, 9) 897 0.003 1.000 0.949 0.726 0.328 0.451
(5, 6) 892 0.000 1.000 0.998 0.817 0.306 0.375
(5, 7) 896 0.000 1.000 0.998 0.819 0.304 0.371
(5, 8) 895 0.006 1.000 0.999 0.989 0.064 0.065
(5, 9) 895 0.003 1.000 0.953 0.728 0.325 0.447
(6, 7) 894 0.001 1.000 0.999 0.986 0.083 0.084
(6, 8) 893 0.000 1.000 0.998 0.817 0.306 0.375
(6, 9) 892 0.000 1.000 0.997 0.874 0.244 0.279
(7, 8) 898 0.000 1.000 0.998 0.817 0.305 0.374
(7, 9) 894 0.001 1.000 0.997 0.866 0.250 0.288
(8, 9) 896 0.003 1.000 0.941 0.726 0.326 0.449
@gerv
Copy link

@gerv gerv commented Apr 21, 2015

That data seems to support the suggestion that the right UA for Servo is the same one as used with Gecko, but simply replacing the Gecko string with a Servo string, and making no other changes.

0 vs 1: Android Gecko vs. Servo. Mean: 0.977
2 vs 3: Desktop Gecko vs. Servo. Mean: 0.993
6 vs 7: B2G Gecko vs. Servo. Mean: 0.986

All of these figures sem high enough to me.

@larsbergstrom
Copy link
Contributor

@larsbergstrom larsbergstrom commented Apr 29, 2015

This data does look great! We would accept a patch that gives Servo a real user agent string, as suggested by @gerv, with the one caveat that it should also land or at least log issues for any new breakage introduced in sites we have tried to fix like wikipedia, github, CNN, reddit, etc. due to the presence of different content.

@frivoal
Copy link

@frivoal frivoal commented Jun 2, 2015

As an (interesting?) data point, presto based Opera, after playing the UA string compatibility charade for years, eventually gave on the convoluted UA strings, and went with this:

Opera/9.50 (Macintosh; Intel Mac OS X; U; en)

Which later evolved into:

Opera/9.61 (Macintosh; Intel Mac OS X; U; en) Presto/2.1.1

Which later evolved into this final form, needed because some UA string sniffers would barf on a 2 digit version number after "Opera/":

Opera/9.80 (Macintosh; Intel Mac OS X; U; en) Presto/2.6.30 Version/10.61

On the other hand, through this period, Opera also maintained a list of domains for which UA string spoofing was needed. Having that gives you a bit more freedom to break things with your default UA string.

Regardless of which UA string servo ends up going with, I think such a list is important to have, as there will be misbehaving UA string sniffers no matter what you do, and it is important to be able to work around them.

(As a side note, Opera also maintained a stack of custom js patches to fix misbehaving sites, which Servo, as a minority market-share browser, might want to take inspiration from. This is unfortunately somewhat resource intensive, but you can do as little or as much as you want, and if the alternative is broken sites... https://dev.opera.com/blog/opera-s-site-patching/)

bors-servo pushed a commit that referenced this issue Aug 11, 2015
Add the Servo User Agent strings

Fixes #4331. I've tested this out on desktop and Android on "the usual" sites (reddit, cnn, github, wikipedia, etc.).

r? @mbrubeck @metajack

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/servo/servo/7143)
<!-- Reviewable:end -->
@sbrl
Copy link

@sbrl sbrl commented Mar 15, 2017

This probably isn't helpful, but perhaps including the commit hash in the user agent string that the servo engine was built from for testing purposes would be helpful for tracking down bugs?

It would probably break a bunch of stuff though, so this is more of a what-if than a serious suggestion.

@metajack
Copy link
Contributor Author

@metajack metajack commented Mar 15, 2017

This would make fingerprinting Servo users pretty easy unfortunately. Browser.html does include a bug reporter that includes relevant info though, but it's done opt-in by the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
10 participants
You can’t perform that action at this time.