Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upRealistic Page Load Time Test #10452
Comments
|
Is this something you plan on doing yourself, or are you looking for someone to work on this? If so, I'm interested :) |
|
I'd be very interested in seeing page load time as the most important piece of information but also capturing the raw output of the profiling data. @rzambre and I are working on some patches to make that easier to grab for automated systems, which we hope to land very soon. The only other obvious piece of information that would be really helpful is the output of memory profiling, though I don't know how practical that is to collect for initial page load scenarios. @nnethercote can you comment on whether that would be a useful measure to track or if we need more steady-state browsing data? CC: @jgraham @jdm @metajack @Ms2ger @edunham whom I expect to have other feedback Thanks for writing this up - I'm very excited for this work! |
|
So, generally the idea of using something like tp5 and measuring the load times seems like a sensible first step. I would encourage you to build as little infrastructure as possible though; we already have solutions for monitoring performance data and I suspect that anything you invent now will be about the same amount of effort to get working as reporting to perfherder, but will be more of a maintenance burden in the future. @wlach is the expert here and will be able to provide hints. For harnesses, wptrunner already provides a mechanism to launch servo, but that's pretty much all you'll be using in this case. It will be possible to adapt to your usecase, but if your plan is literally just to launch servo for each url and read the timing data from stdout (which seems like the easiest implementation for now), then it's probably overkill. I think just using purely custom python code is quite defensible. I would avoid unittest. I agree that in the future using recorded loads instead of static copies of sites will be a much better simulation of the real world. |
|
Yeah, I'd really encourage you to consider using Perfherder, which not only solves the problem of storing and visualizing performance data but also acting on it. I've spent the last few quarters working on a performance sheriffing view, which we've been using for tracking regressions in Talos and other things: https://treeherder.mozilla.org/perf.html#/alerts?status=-1&framework=1 Perfherder automatically detects regressions and provides a simple method for filing bugs on them based on a template. We'd probably need to make some minor adaptations to support Servo, but nothing major. I'm in the middle of a similar effort to make Perfherder a good solution for sheriffing AreWeFastYet data, which I think should cover most of your use case: http://wlach.github.io/blog/2016/03/are-we-fast-yet-and-perfherder/ Submitting data to perfherder is not hard, all that is involved is creating a standard treeherder job and adding a "performance artifact" to it (there's plenty of sample code for this). We've used treeherder successfully with github projects before (bugzilla, gaia), so I don't see why Servo would be a problem. |
|
I'm +1 on using perfherder, since you'll almost certainly get better support and performance with a tool that people focus on full time than a one-off competing with many other projects for my, Jack's, and Lars's time. @wlach, does perfherder expose a public API of the data it collects, as well as the built-in metrics visualization? |
|
Memory usage on page load would be reasonably useful. Tracking that would be a lot better than tracking nothing. |
|
Let me summarize the above discussion:
The technique used in AreWeSlimYet seems daunting to me, I'd appreciate if anyone can point me to any tool or document I can study. @autrilla : Any help would be most welcome :) I'll open a new repo for this project and try to merge it back when it's mature. |
|
I started some experiment in this repo: https://github.com/shinglyu/servo-perf |
|
@edunham: Perfherder has a bunch of endpoints for getting series data (the UI uses these): https://treeherder.mozilla.org/docs/#!/project/Performance_Datum_list And also one to get a list of "alerts" (detected changes) programatically: https://treeherder.mozilla.org/docs/#!/performance/Performance_Alert_Summary_list Feel free to ask either me or jmaher on irc.mozilla.org #treeherder or #perfherder if you have more questions |
|
@wlach Thank you for your information, but I still don't understand how Perfherder works.
Thank you! |
|
Oh I found this: https://treeherder.readthedocs.org/submitting_data.html |
|
@shinglyu I think you figured this out for yourself, but yes, that's the guide to use for submitting data. https://github.com/mozilla/autophone/blob/master/autophonetreeherder.py Since this is the first time we'll be submitting Servo data to treeherder, we'll also need to send revision information. There's some guidance on doing that in the submitting data document that you linked to. Eventually you might want to consider using TaskCluster for scheduling jobs and submitting data, which I believe might take care of some of those details for you. To answer your earlier questions, Treeherder/Perfherder does actually aggregate performance data in an easy-to-digest form, which is how we provide all the frontend views at https://treeherder.mozilla.org/perf.html My recommendation would be as follows:
|
|
@wlach : Thanks a lot! I'll start step 1 and 2 and contact you when I'm ready for step 3. :) |
|
Update: the test runner is almost ready https://github.com/shinglyu/servo-perf |
|
@wlach : The |
|
@shinglyu Good catch! I updated the version on pypi to reflect what's in the tree (treeherder-client-2.1.0). Please use new the new version. The docs should be up-to-date at this point. If they're not, please file a PR to fix them or let me know what's wrong so I can do so. |
|
@shinglyu Sometimes you may have better results with |
|
@wlach: Thank you |
|
@wlach I was able to submit a ResultSetCollection and JobCollection through the python API But I can't figure out how to format a performance artifact, I found the following code: I thought the log should be a JOSNified |
|
Answering myself: Just found this test: https://github.com/mozilla/treeherder/blob/4a357b297fde5d5ba3f93c27a53aea53292f53a9/tests/e2e/test_perf_ingestion.py |
|
Edit: I commited the wrong file... |
|
@larsbergstrom I'm not sure how to get the revision information when I submit data to treeherder. I am thinking about dumping the |
|
@larsbergstrom @wlach Also, I'm not sure how to present the data point. Talos' TP5 use this kind of summarization
That is one (median) time for each website, and one mean time for the whole suite. By measurement
By website
|
|
@shinglyu For getting commit information, I wonder if it might not be easiest to use a library like GitPython (http://gitpython.readthedocs.org/) For the second question, I think definitely seperating by measurement makes the most sense. However, I would question the utility of measuring anything but the time for the document being fully loaded and painted (which is what tp5o measures). There's a cost of complexity of recording additional information, I'd personally probably just start with the same metric as tp5o, then add additional measurements if they're proven to be needed. |
|
@wlach I wants to separate my build step and test step, so I'll package the servo binary into a zip and copy it to my test runner's directory. So the test runner doesn't need to access the Servo code base. I think I'll use Your suggestion makes a lot of sense. I think I'll only submit the |
|
@shinglyu: BTW, soon treeherder will have the capability of ingesting github revision data (on a push level, no less) which I think will work much better than you submitting revision data by hand. So I'd just get something hacky working there for now (your solution sounds fine) and hopefully we can switch to something better later in this quarter. |
|
@wlach Good to know! I have automated the whole build > test > submit to local perfherder flow. I'll let it run for a few days to see if everything is stable enough for submitting to staging. |
|
@shinglyu Awesome! |
|
@shinglyu Submitting to treeherder stage should be no problem, just follow the procedure here to add credentials and ping me again when you've done so: |
|
@metajack: Thanks for the information |
|
Related bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1269629 |
|
Some initial data can be seen on the staging server now: https://treeherder.allizom.org/#/jobs?repo=servo&selectedJob=1 |
|
I documented how I submitted data to Perfherder: http://shinglyu.github.io/web/2016/05/07/visualizing_performance_data_on_perfherder.html cc: @wlach |
|
Hey @shinglyu, great post! I would like to get some of that integrated within the treeherder documentation. One thing, one should not be submitting servo data with the "talos" framework, as that's intended solely for the Gecko platform. I'd like to add a new performance framework for "servo", see: https://bugzilla.mozilla.org/show_bug.cgi?id=1271472 |
|
@shinglyu I see jobs are getting sent to staging on a regular basis, including the performance artifacts. Is there a way to compare these results with firefox yet? @wlach If we have a new framework (seems like servo-perf is what was chosen) can we then compare performance results against things in Talos? We definitely want to be able to see how we're doing against Firefox's tp5 results. |
|
@shinglyu I'm seeing some issues with this:
@metajack I think it's going to be really hard to compare against Firefox unless you're running the exact same test, which I don't believe you are at this point. Maybe the easiest route is to somehow run servo-perf against firefox, perhaps on a nightly basis? |
|
How does our tp5 test differ from the one that Firefox runs? |
|
@metajack I'm not familiar with exactly what servo-perf is testing, if it's using the same pageset as talos tp5 that's a great start at measuring the same thing. But even if the pageset is the same, you would have to make sure that the harness is recording information in the same way. The numbers from talos vs. servo-perf seem pretty far off from one another: |
|
@wlach I'd expect Servo's numbers to be pretty far off - we have done nearly zero "complete page load" performance work yet, and there's a ton of known low-hanging fruit. So, that chart may be pretty close to reality :-) |
|
I may be missing something, but trying to compare performance numbers from different implementations of the "same" testsuite running on different hardware seems like it isn't going to produce good results? The infrastructure that produces results for Servo should also should submit its own results for Firefox, running with the same harness on the same hardware in order to get meaningful numbers. |
|
@metajack We can't compare out servo-perf test with existing tp5-Firefox-talos test. The reason is that we have our custom test runner ( Open PR: #11107). It runs a subset of tp5 tests, because some pages makes servo run forever (see #11087 ). Also it measures @wlach I changed the framework, but I broke the test runner, so it failed to submit data for 2 days. The data you are looking at is probably 2 days old. The latest one should be correct: https://treeherder.allizom.org/#/jobs?repo=servo&selectedJob=16 About the "performance series signature", is that the And yes, I haven't push the log to S3 and create a link in the artifact. It's on my backlog and I'll open a bug for that. |
|
@wlach: Now the data points are in the same graph again. https://treeherder.allizom.org/perf.html#/graphs?series=[servo,f6067f4bc04fef24aa4eec8ff55794727bfe5f7f,1]&selected=[servo,f6067f4bc04fef24aa4eec8ff55794727bfe5f7f,9,17] I assume the problem is because I'm transitioning from |
|
In case you are confused, the May 10 commit is still using |
|
@shinglyu No the performance signature is distinct from the job_guid. The signature is calculated based on the properties of PERFHERDER_DATA (suite name, test name, options) as well as various reference data from the job (machine platform, options, ...). And yes, if you change the performance framework you'll get a new series (though the signature should remain the same, since performance framework is not currently incorporated into the signature). |
|
@wlach: Thanks. Do the last 4~5 tests have the same signature? The only changing part should be the timestamp and performance numbers. I might have changed the subtest name when I fixed some bugs a week ago, but the latest 4 tests should be stable. Can you point me to the performance signature code/docs so I can double check? |
|
@larsbergstrom @metajack I tried to prioritize the remaining work. Please let me know if we need to adjust anything. Priority 1 (Critial for June Preview)
Prioirty 2 (Must have before, say, end of Q3 2016)
Priority 3 (Nice to have)
Edit: move automatic alert to P3 |
|
@shinglyu: Code for signature calculation is here (warning: it has some rough edges): https://github.com/mozilla/treeherder/blob/master/treeherder/etl/perf.py |
|
@wlach: Thanks. So did you check the signature by manual querying it in the SQL DB or is it shown on the UI? |
|
cc @aneeshusa |
|
cc @fitzgen |
|
Out of curiosity, what will it take to allow measuring non-master branches? Once the off-thread HTML parsing work is ready, it would be useful to be able to compare the before and after timing before the changes are actually merged. |
|
@jdm I don't think it would be very complicated at all. As far as I know currently we're still manually copying the servo binary to the performance test runner's directory, so it'd simply be a matter of adding the branch as a command line parameter to the runner, so that it gets sent as another treeherder project (or whatever they're called) and copying the servo binary from the other branch. |


I had some discussion with @larsbergstrom about this, and here is my proposal. Feedbacks welcome!
Goal
Measuring Servo's page load performance on top websites (e.g. Alexa 500), and comparing them to other browser's performance.
Existing Solutions
Proposal
Measurements
window.performanceAPITest Environment
Test Harness
unittestorpy.test)performance.timingand collect the resultsTest Cases
Visualization & Notificaion
Plan
performance.timingdata