Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use an already-installed Homebrew at /usr/local #24753

Merged
merged 10 commits into from Nov 17, 2019
Merged

Use an already-installed Homebrew at /usr/local #24753

merged 10 commits into from Nov 17, 2019

Conversation

@SimonSapin
Copy link
Member

SimonSapin commented Nov 15, 2019

This requires servo/taskcluster-config#4 to be deployed.

Having the standard location helps pkg-config (CC #24688), and allows installing pre-compiled pakcages (which is much faster than compiling from source).

@nox
Copy link
Member

nox commented Nov 15, 2019

@bors-servo
Copy link
Contributor

bors-servo commented Nov 15, 2019

📌 Commit d04a274 has been approved by nox

SimonSapin added 4 commits Nov 16, 2019
This adds lines such as

```
   Completed cssparser v0.27.1 custom-build in 2.4s
   Completed cssparser v0.27.1 custom-build (run) in 0.6s
   Completed cssparser v0.27.1 in 1.1s
```
```
(git init servo.git &&
    cd servo.git &&
    time git fetch https://github.com/servo/servo master $ARGS &&
); rm -rf servo.git

Full: 724.75 MiB
      57s home fiber in Paris
    1m25s AWS us-west-2 Oregon
    3m23s Macstadium DC1 Atlanta
    4m22s Macstadium DC2 Las Vegas

--depth 100: 129.00 MiB
    1m21s home
    1m18s AWS
    1m30s Macstadium 1
    1m24s Macstadium 2

--depth 50: 97.62 MiB
    30s home
    30s AWS
    41s Macstadium 1
    40s Macstadium 2

--depth 30: 92.47 MiB
    17s home
    18s AWS
    27s Macstadium 1
    26s Macstadium 2

--depth 10: 88.25 MiB
    11s home
    12s AWS
    26s Macstadium 1
    25s Macstadium 2

--depth 1: 87.53 MiB
    10s home
    10s AWS
    22s Macstadium 1
    28s Macstadium 2
```
@SimonSapin SimonSapin force-pushed the mainstream-brew branch from a19995d to c8ec8b3 Nov 16, 2019
@SimonSapin
Copy link
Member Author

SimonSapin commented Nov 16, 2019

@bors-servo try=mac

@bors-servo
Copy link
Contributor

bors-servo commented Nov 16, 2019

Trying commit c8ec8b3 with merge 4cab317...

bors-servo added a commit that referenced this pull request Nov 16, 2019
Use an already-installed Homebrew at /usr/local

This requires servo/taskcluster-config#4 to be deployed.

Having the standard location helps `pkg-config` (CC #24688), and allows installing pre-compiled pakcages (which is much faster than compiling from source).
@bors-servo
Copy link
Contributor

bors-servo commented Nov 16, 2019

💔 Test failed - status-taskcluster

@SimonSapin SimonSapin force-pushed the mainstream-brew branch from c8ec8b3 to b839544 Nov 16, 2019
@SimonSapin
Copy link
Member Author

SimonSapin commented Nov 16, 2019

@bors-servo try=mac

@bors-servo
Copy link
Contributor

bors-servo commented Nov 16, 2019

Trying commit b839544 with merge 09f8c98...

bors-servo added a commit that referenced this pull request Nov 16, 2019
Use an already-installed Homebrew at /usr/local

This requires servo/taskcluster-config#4 to be deployed.

Having the standard location helps `pkg-config` (CC #24688), and allows installing pre-compiled pakcages (which is much faster than compiling from source).
@bors-servo
Copy link
Contributor

bors-servo commented Nov 16, 2019

☀️ Test successful - status-taskcluster
State: approved= try=True

@SimonSapin SimonSapin force-pushed the mainstream-brew branch from b839544 to a16856d Nov 16, 2019
@bors-servo
Copy link
Contributor

bors-servo commented Nov 17, 2019

💔 Test failed - linux-rel-css

@jdm
Copy link
Member

jdm commented Nov 17, 2019

@bors-servo
Copy link
Contributor

bors-servo commented Nov 17, 2019

Testing commit 0dad48f with merge 8c3be0e...

bors-servo added a commit that referenced this pull request Nov 17, 2019
Use an already-installed Homebrew at /usr/local

This requires servo/taskcluster-config#4 to be deployed.

Having the standard location helps `pkg-config` (CC #24688), and allows installing pre-compiled pakcages (which is much faster than compiling from source).
@SimonSapin
Copy link
Member Author

SimonSapin commented Nov 17, 2019

@bors-servo
Copy link
Contributor

bors-servo commented Nov 17, 2019

Testing commit 0dad48f with merge dfa7898...

bors-servo added a commit that referenced this pull request Nov 17, 2019
Use an already-installed Homebrew at /usr/local

This requires servo/taskcluster-config#4 to be deployed.

Having the standard location helps `pkg-config` (CC #24688), and allows installing pre-compiled pakcages (which is much faster than compiling from source).
@SimonSapin
Copy link
Member Author

SimonSapin commented Nov 17, 2019

Oops, I still had a browser tab opened that did not update after #24753 (comment)

@bors-servo
Copy link
Contributor

bors-servo commented Nov 17, 2019

☀️ Test successful - linux-rel-css, linux-rel-wpt, status-taskcluster
Approved by: jdm
Pushing dfa7898 to master...

@bors-servo bors-servo merged commit 0dad48f into master Nov 17, 2019
2 checks passed
2 checks passed
Community-TC (pull_request) TaskGroup: success
Details
homu Test successful
Details
@bors-servo bors-servo deleted the mainstream-brew branch Nov 17, 2019
SimonSapin added a commit that referenced this pull request Nov 18, 2019
## Before this

Before this PR, we had roughly as many chunks as available workers.
Because the the number of test files is a poor estimate for the time
needed to run them, we have significant variation in the completion time
between chunks when testing one given PR.

servo/taskcluster-config#9 adds a tool to collect
this data. Here are two full runs of `test_wpt` before this PR:

https://community-tc.services.mozilla.com/tasks/groups/DBt9ki9gTdWmwAk-VDorzw

```
count 1, total 0:00:32, max: 0:00:32	docker	0:00:32
count 1, total 0:59:14, max: 0:59:14	macos-disabled-mac1	0:59:14
count 6, total 4:12:16, max: 1:01:14	macos-disabled-mac1 WPT	0:40:29 0:18:55 0:46:50 0:44:38 1:01:14 0:40:10
count 1, total 0:55:19, max: 0:55:19	macos-disabled-mac9	0:55:19
count 6, total 4:25:09, max: 1:01:40	macos-disabled-mac9 WPT	0:37:58 0:37:24 0:27:18 1:01:40 0:46:17 0:54:31
```

Times for a given chunk vary between 19 minutes and 61 minutes.
Assuming no `try` testing, with Homu’s serial scheduling of `r+` testing
this means that that worker sits idle for 42 minutes
and our limited CPU resources are under-utilized.

When there *are* `try` PRs being tested however, they compete with
each other and any `r+` PR for the same workers. If we get unlucky,
a 61 minute task could only *start* after some other tasks have finished,
Increasing the overall time-to-merge a lot.

## This

This PR changes the number of chunks to be significantly more
than the number of available workers. When one of them finishes,
that worker can pick up another one instead of sitting idle.

Now the ratio of number of tasks to number of workers doesn’t matter:
the differences in run time between tasks becomes somewhat of an advantage
and the distribution to workers evens out on average.

The number 30 is a bit arbitrary. A higher number reduces resource
under-utilization, but increases the effect of per-task overhead.
The git cache added in #24753
reduced that overhead, though.

Another worry I had if whether this would make wose the similar problem
of unequal scheduling between processes within a task,
where some CPU cores sit idle while the rest processes finish their
assigned work.

This turned out not to be enough of a problem to negatively affect
the total machine time:

```
https://community-tc.services.mozilla.com/tasks/groups/VnDac92HQU6QmrpzWPCR2w
count 1, total 0:00:48, max: 0:00:48	docker	0:00:48
count 1, total 0:39:04, max: 0:39:04	macos-disabled-mac9	0:39:04
count 31, total 4:03:29, max: 0:15:29	macos-disabled-mac9 WPT
	0:07:26 0:08:39 0:04:21 0:07:13 0:12:47 0:10:11 0:04:01 0:03:36
	0:10:43 0:12:57 0:04:47 0:04:06 0:10:09 0:12:00 0:12:42 0:04:40
	0:04:24 0:12:20 0:12:15 0:03:03 0:07:35 0:11:35 0:07:01 0:04:16
	0:09:40 0:05:08 0:05:01 0:06:29 0:15:29 0:02:28 0:06:27
```

(4h03min is even lower than above, but seems within variation.)

## After this

#23655 proposes automatically
restarting failed WPT tasks, in case the failure is intermittent.
With the test suite split into more chunks we have fewer tests per chunk,
and therefore lower probability that a given one fails.
Restarting one of them also causes less repeated work.
bors-servo added a commit that referenced this pull request Nov 18, 2019
Split WPT macOS testing into many more chunks

## Before this

Before this PR, we had roughly as many chunks as available workers. Because the the number of test files is a poor estimate for the time needed to run them, we have significant variation in the completion time between chunks when testing one given PR.

servo/taskcluster-config#9 adds a tool to collect this data. Here are two full runs of `test_wpt` before this PR:

https://community-tc.services.mozilla.com/tasks/groups/DBt9ki9gTdWmwAk-VDorzw

```
count 1, total 0:00:32, max: 0:00:32    docker  0:00:32
count 1, total 0:59:14, max: 0:59:14    macos-disabled-mac1     0:59:14
count 6, total 4:12:16, max: 1:01:14    macos-disabled-mac1 WPT 0:40:29 0:18:55 0:46:50 0:44:38 1:01:14 0:40:10
count 1, total 0:55:19, max: 0:55:19    macos-disabled-mac9     0:55:19
count 6, total 4:25:09, max: 1:01:40    macos-disabled-mac9 WPT 0:37:58 0:37:24 0:27:18 1:01:40 0:46:17 0:54:31
```

Times for a given chunk vary between 19 minutes and 61 minutes. Assuming no `try` testing, with Homu’s serial scheduling of `r+` testing this means that that worker sits idle for 42 minutes and our limited CPU resources are under-utilized.

When there *are* `try` PRs being tested however, they compete with each other and any `r+` PR for the same workers. If we get unlucky, a 61 minute task could only *start* after some other tasks have finished, Increasing the overall time-to-merge a lot.

## This

This PR changes the number of chunks to be significantly more than the number of available workers. When one of them finishes, that worker can pick up another one instead of sitting idle.

Now the ratio of number of tasks to number of workers doesn’t matter: the differences in run time between tasks becomes somewhat of an advantage and the distribution to workers evens out on average.

The number 30 is a bit arbitrary. A higher number reduces resource under-utilization, but increases the effect of per-task overhead. The git cache added in #24753 reduced that overhead, though.

Another worry I had was whether this would make worse the similar problem of unequal scheduling between processes within a task, where some CPU cores sit idle while the rest processes finish their assigned work.

This turned out not to be enough of a problem to negatively affect the total machine time:

https://community-tc.services.mozilla.com/tasks/groups/VnDac92HQU6QmrpzWPCR2w
```
count 1, total 0:00:48, max: 0:00:48    docker  0:00:48
count 1, total 0:39:04, max: 0:39:04    macos-disabled-mac9     0:39:04
count 31, total 4:03:29, max: 0:15:29   macos-disabled-mac9 WPT
        0:07:26 0:08:39 0:04:21 0:07:13 0:12:47 0:10:11 0:04:01 0:03:36
        0:10:43 0:12:57 0:04:47 0:04:06 0:10:09 0:12:00 0:12:42 0:04:40
        0:04:24 0:12:20 0:12:15 0:03:03 0:07:35 0:11:35 0:07:01 0:04:16
        0:09:40 0:05:08 0:05:01 0:06:29 0:15:29 0:02:28 0:06:27
```

(4h03min is even lower than above, but seems within variation.)

## After this

#23655 proposes automatically restarting failed WPT tasks, in case the failure is intermittent. With the test suite split into more chunks we have fewer tests per chunk, and therefore lower probability that a given one fails. Restarting one of them also causes less repeated work.
SimonSapin added a commit that referenced this pull request Nov 18, 2019
## Before this

Before this PR, we had roughly as many chunks as available workers.
Because the the number of test files is a poor estimate for the time
needed to run them, we have significant variation in the completion time
between chunks when testing one given PR.

servo/taskcluster-config#9 adds a tool to collect
this data. Here are two full runs of `test_wpt` before this PR:

https://community-tc.services.mozilla.com/tasks/groups/DBt9ki9gTdWmwAk-VDorzw

```
count 1, total 0:00:32, max: 0:00:32	docker	0:00:32
count 1, total 0:59:14, max: 0:59:14	macos-disabled-mac1	0:59:14
count 6, total 4:12:16, max: 1:01:14	macos-disabled-mac1 WPT	0:40:29 0:18:55 0:46:50 0:44:38 1:01:14 0:40:10
count 1, total 0:55:19, max: 0:55:19	macos-disabled-mac9	0:55:19
count 6, total 4:25:09, max: 1:01:40	macos-disabled-mac9 WPT	0:37:58 0:37:24 0:27:18 1:01:40 0:46:17 0:54:31
```

Times for a given chunk vary between 19 minutes and 61 minutes.
Assuming no `try` testing, with Homu’s serial scheduling of `r+` testing
this means that that worker sits idle for 42 minutes
and our limited CPU resources are under-utilized.

When there *are* `try` PRs being tested however, they compete with
each other and any `r+` PR for the same workers. If we get unlucky,
a 61 minute task could only *start* after some other tasks have finished,
Increasing the overall time-to-merge a lot.

## This

This PR changes the number of chunks to be significantly more
than the number of available workers. When one of them finishes,
that worker can pick up another one instead of sitting idle.

Now the ratio of number of tasks to number of workers doesn’t matter:
the differences in run time between tasks becomes somewhat of an advantage
and the distribution to workers evens out on average.

The number 30 is a bit arbitrary. A higher number reduces resource
under-utilization, but increases the effect of per-task overhead.
The git cache added in #24753
reduced that overhead, though.

Another worry I had was whether this would make worse the similar problem
of unequal scheduling between processes within a task,
where some CPU cores sit idle while the rest processes finish their
assigned work.

This turned out not to be enough of a problem to negatively affect
the total machine time:

https://community-tc.services.mozilla.com/tasks/groups/VnDac92HQU6QmrpzWPCR2w
```
count 1, total 0:00:48, max: 0:00:48	docker	0:00:48
count 1, total 0:39:04, max: 0:39:04	macos-disabled-mac9	0:39:04
count 31, total 4:03:29, max: 0:15:29	macos-disabled-mac9 WPT
	0:07:26 0:08:39 0:04:21 0:07:13 0:12:47 0:10:11 0:04:01 0:03:36
	0:10:43 0:12:57 0:04:47 0:04:06 0:10:09 0:12:00 0:12:42 0:04:40
	0:04:24 0:12:20 0:12:15 0:03:03 0:07:35 0:11:35 0:07:01 0:04:16
	0:09:40 0:05:08 0:05:01 0:06:29 0:15:29 0:02:28 0:06:27
```

(4h03min is even lower than above, but seems within variation.)

## After this

#23655 proposes automatically
restarting failed WPT tasks, in case the failure is intermittent.
With the test suite split into more chunks we have fewer tests per chunk,
and therefore lower probability that a given one fails.
Restarting one of them also causes less repeated work.
bors-servo added a commit that referenced this pull request Nov 18, 2019
Split WPT macOS testing into many more chunks

## Before this

Before this PR, we had roughly as many chunks as available workers. Because the the number of test files is a poor estimate for the time needed to run them, we have significant variation in the completion time between chunks when testing one given PR.

servo/taskcluster-config#9 adds a tool to collect this data. Here are two full runs of `test_wpt` before this PR:

https://community-tc.services.mozilla.com/tasks/groups/DBt9ki9gTdWmwAk-VDorzw

```
count 1, total 0:00:32, max: 0:00:32    docker  0:00:32
count 1, total 0:59:14, max: 0:59:14    macos-disabled-mac1     0:59:14
count 6, total 4:12:16, max: 1:01:14    macos-disabled-mac1 WPT 0:40:29 0:18:55 0:46:50 0:44:38 1:01:14 0:40:10
count 1, total 0:55:19, max: 0:55:19    macos-disabled-mac9     0:55:19
count 6, total 4:25:09, max: 1:01:40    macos-disabled-mac9 WPT 0:37:58 0:37:24 0:27:18 1:01:40 0:46:17 0:54:31
```

Times for a given chunk vary between 19 minutes and 61 minutes. Assuming no `try` testing, with Homu’s serial scheduling of `r+` testing this means that that worker sits idle for 42 minutes and our limited CPU resources are under-utilized.

When there *are* `try` PRs being tested however, they compete with each other and any `r+` PR for the same workers. If we get unlucky, a 61 minute task could only *start* after some other tasks have finished, Increasing the overall time-to-merge a lot.

## This

This PR changes the number of chunks to be significantly more than the number of available workers. When one of them finishes, that worker can pick up another one instead of sitting idle.

Now the ratio of number of tasks to number of workers doesn’t matter: the differences in run time between tasks becomes somewhat of an advantage and the distribution to workers evens out on average.

The number 30 is a bit arbitrary. A higher number reduces resource under-utilization, but increases the effect of per-task overhead. The git cache added in #24753 reduced that overhead, though.

Another worry I had was whether this would make worse the similar problem of unequal scheduling between processes within a task, where some CPU cores sit idle while the rest processes finish their assigned work.

This turned out not to be enough of a problem to negatively affect the total machine time:

https://community-tc.services.mozilla.com/tasks/groups/VnDac92HQU6QmrpzWPCR2w
```
count 1, total 0:00:48, max: 0:00:48    docker  0:00:48
count 1, total 0:39:04, max: 0:39:04    macos-disabled-mac9     0:39:04
count 31, total 4:03:29, max: 0:15:29   macos-disabled-mac9 WPT
        0:07:26 0:08:39 0:04:21 0:07:13 0:12:47 0:10:11 0:04:01 0:03:36
        0:10:43 0:12:57 0:04:47 0:04:06 0:10:09 0:12:00 0:12:42 0:04:40
        0:04:24 0:12:20 0:12:15 0:03:03 0:07:35 0:11:35 0:07:01 0:04:16
        0:09:40 0:05:08 0:05:01 0:06:29 0:15:29 0:02:28 0:06:27
```

(4h03min is even lower than above, but seems within variation.)

## After this

#23655 proposes automatically restarting failed WPT tasks, in case the failure is intermittent. With the test suite split into more chunks we have fewer tests per chunk, and therefore lower probability that a given one fails. Restarting one of them also causes less repeated work.
@SimonSapin SimonSapin mentioned this pull request Nov 18, 2019
4 of 4 tasks complete
bors-servo added a commit that referenced this pull request Nov 18, 2019
Split WPT macOS testing into many more chunks

## Before this

Before this PR, we had roughly as many chunks as available workers. Because the the number of test files is a poor estimate for the time needed to run them, we have significant variation in the completion time between chunks when testing one given PR.

servo/taskcluster-config#9 adds a tool to collect this data. Here are two full runs of `test_wpt` before this PR:

https://community-tc.services.mozilla.com/tasks/groups/DBt9ki9gTdWmwAk-VDorzw

```
count 1, total 0:00:32, max: 0:00:32    docker  0:00:32
count 1, total 0:59:14, max: 0:59:14    macos-disabled-mac1     0:59:14
count 6, total 4:12:16, max: 1:01:14    macos-disabled-mac1 WPT 0:40:29 0:18:55 0:46:50 0:44:38 1:01:14 0:40:10
count 1, total 0:55:19, max: 0:55:19    macos-disabled-mac9     0:55:19
count 6, total 4:25:09, max: 1:01:40    macos-disabled-mac9 WPT 0:37:58 0:37:24 0:27:18 1:01:40 0:46:17 0:54:31
```

Times for a given chunk vary between 19 minutes and 61 minutes. Assuming no `try` testing, with Homu’s serial scheduling of `r+` testing this means that that worker sits idle for 42 minutes and our limited CPU resources are under-utilized.

When there *are* `try` PRs being tested however, they compete with each other and any `r+` PR for the same workers. If we get unlucky, a 61 minute task could only *start* after some other tasks have finished, Increasing the overall time-to-merge a lot.

## This

This PR changes the number of chunks to be significantly more than the number of available workers. When one of them finishes, that worker can pick up another one instead of sitting idle.

Now the ratio of number of tasks to number of workers doesn’t matter: the differences in run time between tasks becomes somewhat of an advantage and the distribution to workers evens out on average.

The number 30 is a bit arbitrary. A higher number reduces resource under-utilization, but increases the effect of per-task overhead. The git cache added in #24753 reduced that overhead, though.

Another worry I had was whether this would make worse the similar problem of unequal scheduling between processes within a task, where some CPU cores sit idle while the rest processes finish their assigned work.

This turned out not to be enough of a problem to negatively affect the total machine time:

https://community-tc.services.mozilla.com/tasks/groups/VnDac92HQU6QmrpzWPCR2w
```
count 1, total 0:00:48, max: 0:00:48    docker  0:00:48
count 1, total 0:39:04, max: 0:39:04    macos-disabled-mac9     0:39:04
count 31, total 4:03:29, max: 0:15:29   macos-disabled-mac9 WPT
        0:07:26 0:08:39 0:04:21 0:07:13 0:12:47 0:10:11 0:04:01 0:03:36
        0:10:43 0:12:57 0:04:47 0:04:06 0:10:09 0:12:00 0:12:42 0:04:40
        0:04:24 0:12:20 0:12:15 0:03:03 0:07:35 0:11:35 0:07:01 0:04:16
        0:09:40 0:05:08 0:05:01 0:06:29 0:15:29 0:02:28 0:06:27
```

(4h03min is even lower than above, but seems within variation.)

## After this

#23655 proposes automatically restarting failed WPT tasks, in case the failure is intermittent. With the test suite split into more chunks we have fewer tests per chunk, and therefore lower probability that a given one fails. Restarting one of them also causes less repeated work.
bors-servo added a commit that referenced this pull request Nov 18, 2019
Split WPT macOS testing into many more chunks

## Before this

Before this PR, we had roughly as many chunks as available workers. Because the the number of test files is a poor estimate for the time needed to run them, we have significant variation in the completion time between chunks when testing one given PR.

servo/taskcluster-config#9 adds a tool to collect this data. Here are two full runs of `test_wpt` before this PR:

https://community-tc.services.mozilla.com/tasks/groups/DBt9ki9gTdWmwAk-VDorzw

```
count 1, total 0:00:32, max: 0:00:32    docker  0:00:32
count 1, total 0:59:14, max: 0:59:14    macos-disabled-mac1     0:59:14
count 6, total 4:12:16, max: 1:01:14    macos-disabled-mac1 WPT 0:40:29 0:18:55 0:46:50 0:44:38 1:01:14 0:40:10
count 1, total 0:55:19, max: 0:55:19    macos-disabled-mac9     0:55:19
count 6, total 4:25:09, max: 1:01:40    macos-disabled-mac9 WPT 0:37:58 0:37:24 0:27:18 1:01:40 0:46:17 0:54:31
```

Times for a given chunk vary between 19 minutes and 61 minutes. Assuming no `try` testing, with Homu’s serial scheduling of `r+` testing this means that that worker sits idle for 42 minutes and our limited CPU resources are under-utilized.

When there *are* `try` PRs being tested however, they compete with each other and any `r+` PR for the same workers. If we get unlucky, a 61 minute task could only *start* after some other tasks have finished, Increasing the overall time-to-merge a lot.

## This

This PR changes the number of chunks to be significantly more than the number of available workers. When one of them finishes, that worker can pick up another one instead of sitting idle.

Now the ratio of number of tasks to number of workers doesn’t matter: the differences in run time between tasks becomes somewhat of an advantage and the distribution to workers evens out on average.

The number 30 is a bit arbitrary. A higher number reduces resource under-utilization, but increases the effect of per-task overhead. The git cache added in #24753 reduced that overhead, though.

Another worry I had was whether this would make worse the similar problem of unequal scheduling between processes within a task, where some CPU cores sit idle while the rest processes finish their assigned work.

This turned out not to be enough of a problem to negatively affect the total machine time:

https://community-tc.services.mozilla.com/tasks/groups/VnDac92HQU6QmrpzWPCR2w
```
count 1, total 0:00:48, max: 0:00:48    docker  0:00:48
count 1, total 0:39:04, max: 0:39:04    macos-disabled-mac9     0:39:04
count 31, total 4:03:29, max: 0:15:29   macos-disabled-mac9 WPT
        0:07:26 0:08:39 0:04:21 0:07:13 0:12:47 0:10:11 0:04:01 0:03:36
        0:10:43 0:12:57 0:04:47 0:04:06 0:10:09 0:12:00 0:12:42 0:04:40
        0:04:24 0:12:20 0:12:15 0:03:03 0:07:35 0:11:35 0:07:01 0:04:16
        0:09:40 0:05:08 0:05:01 0:06:29 0:15:29 0:02:28 0:06:27
```

(4h03min is even lower than above, but seems within variation.)

## After this

#23655 proposes automatically restarting failed WPT tasks, in case the failure is intermittent. With the test suite split into more chunks we have fewer tests per chunk, and therefore lower probability that a given one fails. Restarting one of them also causes less repeated work.
jdm added a commit to jdm/servo that referenced this pull request Dec 14, 2019
## Before this

Before this PR, we had roughly as many chunks as available workers.
Because the the number of test files is a poor estimate for the time
needed to run them, we have significant variation in the completion time
between chunks when testing one given PR.

servo/taskcluster-config#9 adds a tool to collect
this data. Here are two full runs of `test_wpt` before this PR:

https://community-tc.services.mozilla.com/tasks/groups/DBt9ki9gTdWmwAk-VDorzw

```
count 1, total 0:00:32, max: 0:00:32	docker	0:00:32
count 1, total 0:59:14, max: 0:59:14	macos-disabled-mac1	0:59:14
count 6, total 4:12:16, max: 1:01:14	macos-disabled-mac1 WPT	0:40:29 0:18:55 0:46:50 0:44:38 1:01:14 0:40:10
count 1, total 0:55:19, max: 0:55:19	macos-disabled-mac9	0:55:19
count 6, total 4:25:09, max: 1:01:40	macos-disabled-mac9 WPT	0:37:58 0:37:24 0:27:18 1:01:40 0:46:17 0:54:31
```

Times for a given chunk vary between 19 minutes and 61 minutes.
Assuming no `try` testing, with Homu’s serial scheduling of `r+` testing
this means that that worker sits idle for 42 minutes
and our limited CPU resources are under-utilized.

When there *are* `try` PRs being tested however, they compete with
each other and any `r+` PR for the same workers. If we get unlucky,
a 61 minute task could only *start* after some other tasks have finished,
Increasing the overall time-to-merge a lot.

## This

This PR changes the number of chunks to be significantly more
than the number of available workers. When one of them finishes,
that worker can pick up another one instead of sitting idle.

Now the ratio of number of tasks to number of workers doesn’t matter:
the differences in run time between tasks becomes somewhat of an advantage
and the distribution to workers evens out on average.

The number 30 is a bit arbitrary. A higher number reduces resource
under-utilization, but increases the effect of per-task overhead.
The git cache added in servo#24753
reduced that overhead, though.

Another worry I had was whether this would make worse the similar problem
of unequal scheduling between processes within a task,
where some CPU cores sit idle while the rest processes finish their
assigned work.

This turned out not to be enough of a problem to negatively affect
the total machine time:

https://community-tc.services.mozilla.com/tasks/groups/VnDac92HQU6QmrpzWPCR2w
```
count 1, total 0:00:48, max: 0:00:48	docker	0:00:48
count 1, total 0:39:04, max: 0:39:04	macos-disabled-mac9	0:39:04
count 31, total 4:03:29, max: 0:15:29	macos-disabled-mac9 WPT
	0:07:26 0:08:39 0:04:21 0:07:13 0:12:47 0:10:11 0:04:01 0:03:36
	0:10:43 0:12:57 0:04:47 0:04:06 0:10:09 0:12:00 0:12:42 0:04:40
	0:04:24 0:12:20 0:12:15 0:03:03 0:07:35 0:11:35 0:07:01 0:04:16
	0:09:40 0:05:08 0:05:01 0:06:29 0:15:29 0:02:28 0:06:27
```

(4h03min is even lower than above, but seems within variation.)

## After this

servo#23655 proposes automatically
restarting failed WPT tasks, in case the failure is intermittent.
With the test suite split into more chunks we have fewer tests per chunk,
and therefore lower probability that a given one fails.
Restarting one of them also causes less repeated work.
jdm added a commit to jdm/servo that referenced this pull request Dec 20, 2019
## Before this

Before this PR, we had roughly as many chunks as available workers.
Because the the number of test files is a poor estimate for the time
needed to run them, we have significant variation in the completion time
between chunks when testing one given PR.

servo/taskcluster-config#9 adds a tool to collect
this data. Here are two full runs of `test_wpt` before this PR:

https://community-tc.services.mozilla.com/tasks/groups/DBt9ki9gTdWmwAk-VDorzw

```
count 1, total 0:00:32, max: 0:00:32	docker	0:00:32
count 1, total 0:59:14, max: 0:59:14	macos-disabled-mac1	0:59:14
count 6, total 4:12:16, max: 1:01:14	macos-disabled-mac1 WPT	0:40:29 0:18:55 0:46:50 0:44:38 1:01:14 0:40:10
count 1, total 0:55:19, max: 0:55:19	macos-disabled-mac9	0:55:19
count 6, total 4:25:09, max: 1:01:40	macos-disabled-mac9 WPT	0:37:58 0:37:24 0:27:18 1:01:40 0:46:17 0:54:31
```

Times for a given chunk vary between 19 minutes and 61 minutes.
Assuming no `try` testing, with Homu’s serial scheduling of `r+` testing
this means that that worker sits idle for 42 minutes
and our limited CPU resources are under-utilized.

When there *are* `try` PRs being tested however, they compete with
each other and any `r+` PR for the same workers. If we get unlucky,
a 61 minute task could only *start* after some other tasks have finished,
Increasing the overall time-to-merge a lot.

## This

This PR changes the number of chunks to be significantly more
than the number of available workers. When one of them finishes,
that worker can pick up another one instead of sitting idle.

Now the ratio of number of tasks to number of workers doesn’t matter:
the differences in run time between tasks becomes somewhat of an advantage
and the distribution to workers evens out on average.

The number 30 is a bit arbitrary. A higher number reduces resource
under-utilization, but increases the effect of per-task overhead.
The git cache added in servo#24753
reduced that overhead, though.

Another worry I had was whether this would make worse the similar problem
of unequal scheduling between processes within a task,
where some CPU cores sit idle while the rest processes finish their
assigned work.

This turned out not to be enough of a problem to negatively affect
the total machine time:

https://community-tc.services.mozilla.com/tasks/groups/VnDac92HQU6QmrpzWPCR2w
```
count 1, total 0:00:48, max: 0:00:48	docker	0:00:48
count 1, total 0:39:04, max: 0:39:04	macos-disabled-mac9	0:39:04
count 31, total 4:03:29, max: 0:15:29	macos-disabled-mac9 WPT
	0:07:26 0:08:39 0:04:21 0:07:13 0:12:47 0:10:11 0:04:01 0:03:36
	0:10:43 0:12:57 0:04:47 0:04:06 0:10:09 0:12:00 0:12:42 0:04:40
	0:04:24 0:12:20 0:12:15 0:03:03 0:07:35 0:11:35 0:07:01 0:04:16
	0:09:40 0:05:08 0:05:01 0:06:29 0:15:29 0:02:28 0:06:27
```

(4h03min is even lower than above, but seems within variation.)

## After this

servo#23655 proposes automatically
restarting failed WPT tasks, in case the failure is intermittent.
With the test suite split into more chunks we have fewer tests per chunk,
and therefore lower probability that a given one fails.
Restarting one of them also causes less repeated work.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.