ensure /tcktestdir created for FileConnection tests #1862

mykmelez · 2015-08-27T18:25:24Z

I think the TCK FileConnection tests fail sometimes because there's a race between two fs.init promise handlers: first, in fs-init.js, which mkdirs /tcktestdir after fs.init; and second, in fstests.js, which calls fs.clear after fs.init.

When fstests.js clears the database before fs-init.js creates /tcktestdir, then that directory exists, and the FileConnection tests pass. But when fstests.js clears the database after fs-init.js creates /tcktestdir, then that directory gets deleted after being created, so it doesn't exist when the FileConnection tests are run, and they fail.

This branch explicitly creates /tcktestdir if we're running the FileConnection tests. It also explicitly deletes the database before running those tests, so it should lead to more consistent success (or failure!) of that test suite.

mykmelez · 2015-08-27T20:15:30Z

The bad news is that the FileConnection tests are still failing on this branch. The good news is that they're failing consistently, at least on Travis (they pass locally). So it should be a bit easier to figure out why.

mykmelez · 2015-08-27T22:02:49Z

I figured out how to get them to fail locally (blow away the database before running the tests), and this is actually a regression from #1201, which made the VM start running before the filesystem is initialized.

When loading a midlet, that's ok, because the VM itself (and apparently the MIDletSuiteLoader) don't need the fs, and #1201 ensures that the fs will be inited in MIDletSuiteUtils.vmBeginStartUp before MIDletSuiteLoader starts the midlet.

But the FileConnection tests are run by com.ibm.tck.client.TestRunner, which is a main program (specified by the main config parameter, not the midletClassName parameter). And the VM doesn't wait for the fs to be inited before starting it. Which means it can race fs initialization (causing test failures).

The simplest (and most robust) fix is to revert the fs part of #1201 and make VM startup wait on fs initialization. Presumably that would regress startup perf, but I don't see any numbers in #1201, and when I run the startup benchmark for WA on desktop, it shows that doing so actually improves perf:

Test	Baseline Mean	Mean	+/-	%	P	Min	Max
startupTime	1,015ms	979ms	-36ms	-3.57	BETTER	944ms	1,042ms
vmStartupTime	200ms	208ms	9ms	4.28	WORSE	183ms	243ms
bgStartupTime	49ms	48ms	-1ms	-1.50	BETTER	47ms	50ms
fgStartupTime	683ms	631ms	-52ms	-7.65	BETTER	608ms	684ms
fgRefreshStartupTime	83ms	91ms	8ms	9.91	WORSE	66ms	111ms
fgRestartTime	n/a	NaNms	n/a	n/a	n/a	Infinityms	-Infinityms
totalSize	30,376kb	29,608kb	-767kb	-2.53	BETTER	28,413kb	30,292kb
domSize	167kb	167kb	0kb	0.00	SAME	167kb	167kb
styleSize	395kb	395kb	0kb	-0.03	SAME	393kb	396kb
jsObjectsSize	19,849kb	19,757kb	-93kb	-0.47	BETTER	19,713kb	19,770kb
jsStringsSize	1,182kb	655kb	-527kb	-44.59	BETTER	654kb	655kb
jsOtherSize	8,555kb	8,408kb	-148kb	-1.73	SAME	7,224kb	9,081kb
otherSize	227kb	227kb	0kb	0.09	SAME	225kb	230kb
USS	n/a	NaNkb	n/a	n/a	n/a	Infinitykb	-Infinitykb
peakRSS	363,775kb	354,459kb	-9,317kb	-2.56	BETTER	310,316kb	365,060kb

Ditto on a Flame (albeit insignificantly):

Test	Baseline Mean	Mean	+/-	%	P	Min	Max
startupTime	16,535ms	16,208ms	-327ms	-1.98	SAME	15,325ms	18,323ms
vmStartupTime	2,790ms	2,777ms	-13ms	-0.48	SAME	2,408ms	3,044ms
bgStartupTime	896ms	884ms	-12ms	-1.36	SAME	695ms	1,169ms
fgStartupTime	11,680ms	11,487ms	-193ms	-1.65	SAME	10,525ms	13,751ms
fgRefreshStartupTime	1,169ms	1,061ms	-108ms	-9.24	SAME	686ms	2,199ms
fgRestartTime	n/a	NaNms	n/a	n/a	n/a	Infinityms	-Infinityms
totalSize	22,959kb	22,840kb	-119kb	-0.52	SAME	21,970kb	23,690kb
domSize	74kb	74kb	0kb	-0.03	SAME	74kb	74kb
styleSize	310kb	310kb	-1kb	-0.19	SAME	302kb	319kb
jsObjectsSize	16,079kb	16,105kb	26kb	0.16	SAME	15,687kb	16,570kb
jsStringsSize	1,327kb	1,212kb	-115kb	-8.64	BETTER	1,075kb	1,236kb
jsOtherSize	4,898kb	4,872kb	-26kb	-0.53	SAME	4,156kb	5,531kb
otherSize	270kb	267kb	-4kb	-1.40	SAME	204kb	354kb
USS	52,633kb	53,414kb	782kb	1.49	WORSE	51,020kb	56,756kb
peakRSS	0kb	NaNkb	NaNkb	NaN	SAME	NaNkb	NaNkb

So I think the simple, robust fix is the best one here.

marco-c · 2015-08-27T22:38:22Z

The simplest (and most robust) fix is to revert the fs part of #1201 and make VM startup wait on fs initialization.

This probably depends on the order in which we perform initialization steps.
Forcing the FS to be loaded before starting the VM could not be a regression in our current situation, but could be a regression if we slightly modify the order of the initialization steps.
Another thing that comes to mind is that FS initialization is likely to be slower when there's more data in the IndexedDB store, so the results could be different if you use WA for a while (receive messages, send/receive media files, etc.).

when I run the startup benchmark for WA on desktop, it shows that doing so actually improves perf:

This is definitely weird, how can this change improve startup perf? It could leave it unchanged (if the other promises in loadingPromises are resolved after the FS is initialized), but I don't see how it could improve perf!
Another weird thing is that on desktop it's improving fgStartupTime, which is totally unaffected by this.

So, I'd keep the optimization from #1201 and make TestRunner a MIDlet.

mykmelez · 2015-08-27T22:51:14Z

Another thing that comes to mind is that FS initialization is likely to be slower when there's more data in the IndexedDB store, so the results could be different if you use WA for a while (receive messages, send/receive media files, etc.).

The number of files could affect fs initialization, but I'm not sure it's significant, given that we use IDBObjectStore.getAll to get their metadata, and it is presumably designed to be efficient at retrieving all records, such that the majority of its cost is unrelated to the number of records it retrieves (which is why it's so much faster than retrieving all records via a standard request).

This is definitely weird, how can this change improve startup perf? It could leave it unchanged (if the other promises in loadingPromises are resolved after the FS is initialized), but I don't see how it could improve perf!

One possibility is that there's a hidden cost to context switching between fs initialization and other work we do during VM startup (like Ion compilation). It's also possible that my results on desktop were faulty.

So, I'd keep the optimization from #1201 and make TestRunner a MIDlet.

That would solve the problem for TestRunner, but other Java programs would still be susceptible to the underlying bug. I'd prefer a more robust solution.

marco-c · 2015-08-27T23:53:09Z

The number of files could affect fs initialization, but I'm not sure it's significant, given that we use IDBObjectStore.getAll to get their metadata, and it is presumably designed to be efficient at retrieving all records, such that the majority of its cost is unrelated to the number of records it retrieves (which is why it's so much faster than retrieving all records via a standard request).

Right, I always forget that we're only loading the metadata.

One possibility is that there's a hidden cost to context switching between fs initialization and other work we do during VM startup (like Ion compilation). It's also possible that my results on desktop were faulty.

I'll run the startup benchmark again on my machine as well.

That would solve the problem for TestRunner, but other Java programs would still be susceptible to the underlying bug. I'd prefer a more robust solution.

OK. We could wait for the FS to be initialized somewhere else (at a point in common between normal Java programs and MIDlets), but let's do that only if the benchmark shows there's a regression.

marco-c · 2015-09-02T14:12:21Z

@mykmelez why did you close this PR?

mykmelez · 2015-09-02T15:34:50Z

@mykmelez why did you close this PR?

Hmm, I didn't intend to. I was doing some branch cleanup and accidentally deleted the branch for which I was requesting the pull, hence these messages about my deleting and restoring the branch:

mykmelez deleted the mykmelez:intermittent-file-connection-failure branch 2 days ago
@mykmelez mykmelez restored the mykmelez:intermittent-file-connection-failure branch 2 days ago

I guess @github autotragically closed the PR when I deleted the branch and didn't reopen it when I restored it (and didn't tell me what it was doing/not doing in either case).

marco-c · 2015-09-06T16:59:14Z

This is my result. Baseline is without the optimization (with your patch).
I'm testing startup to the contact list, my exported FS is 55 MB.

Config	Value
User Agent	Mozilla/5.0 (X11; Linux x86_64; rv:42.0) Gecko/20100101 Firefox/42.0
Rounds	30
Delay(ms)	5000
Delete FS	no
Delete JIT CACHE	no
Warm startup	no

Test	Baseline Mean	Mean	+/-	%	P	Min	Max
startupTime	2,066ms	2,015ms	-51ms	-2.47	BETTER	1,973ms	2,100ms
vmStartupTime	294ms	283ms	-12ms	-3.94	BETTER	272ms	319ms
bgStartupTime	69ms	69ms	0ms	-0.02	SAME	65ms	83ms
fgStartupTime	1,380ms	1,351ms	-29ms	-2.13	BETTER	1,313ms	1,415ms
fgRefreshStartupTime	323ms	313ms	-10ms	-3.14	SAME	292ms	337ms
fgRestartTime	n/a	NaNms	n/a	n/a	n/a	Infinityms	-Infinityms
totalSize	40,021kb	46,097kb	6,077kb	15.18	WORSE	40,893kb	50,693kb
domSize	136kb	135kb	0kb	-0.28	SAME	134kb	140kb
styleSize	425kb	421kb	-3kb	-0.71	SAME	412kb	461kb
jsObjectsSize	24,122kb	27,283kb	3,161kb	13.11	WORSE	25,672kb	28,937kb
jsStringsSize	1,447kb	969kb	-478kb	-33.02	BETTER	950kb	1,042kb
jsOtherSize	13,636kb	17,034kb	3,398kb	24.92	WORSE	13,426kb	20,004kb
otherSize	256kb	255kb	-1kb	-0.47	SAME	250kb	279kb
USS	117,660kb	131,675kb	14,015kb	11.91	WORSE	123,024kb	141,056kb
peakRSS	506,280kb	506,280kb	0kb	0.00	SAME	506,280kb	506,280kb

mykmelez · 2015-09-07T04:41:55Z

I reran the tests on desktop and saw no difference this time (baseline is master, benchmark is this branch):

Config	Value
User Agent	Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:43.0) Gecko/20100101 Firefox/43.0
Rounds	30
Delay(ms)	5000
Delete FS	no
Delete JIT CACHE	no
Warm startup	no

Test	Baseline Mean	Mean	+/-	%	P	Min	Max
startupTime	1,143ms	1,140ms	-3ms	-0.26	SAME	1,097ms	1,265ms
vmStartupTime	263ms	260ms	-3ms	-1.25	SAME	223ms	271ms
bgStartupTime	59ms	58ms	-2ms	-2.54	BETTER	57ms	60ms
fgStartupTime	743ms	742ms	0ms	-0.03	SAME	699ms	868ms
fgRefreshStartupTime	78ms	80ms	2ms	2.58	SAME	73ms	91ms
fgRestartTime	n/a	NaNms	n/a	n/a	n/a	Infinityms	-Infinityms
totalSize	29,207kb	29,214kb	7kb	0.02	SAME	28,072kb	29,792kb
domSize	171kb	171kb	0kb	-0.00	SAME	171kb	171kb
styleSize	386kb	385kb	0kb	-0.12	BETTER	385kb	387kb
jsObjectsSize	19,425kb	19,431kb	6kb	0.03	WORSE	19,413kb	19,438kb
jsStringsSize	653kb	653kb	0kb	0.04	SAME	652kb	656kb
jsOtherSize	8,353kb	8,353kb	0kb	0.00	SAME	7,229kb	8,928kb
otherSize	220kb	221kb	1kb	0.42	WORSE	219kb	222kb
USS	n/a	NaNkb	n/a	n/a	n/a	Infinitykb	-Infinitykb
peakRSS	343,555kb	359,052kb	15,497kb	4.51	WORSE	324,928kb	361,900kb

But I'm testing startup to the Terms screen, and my exported FS is only 4.9MB, so it's possible that a difference would appear with a larger FS or on startup to a different screen of the midlet.

Nevertheless, I still don't like the optimization, since it can break midlets and other Java programs in ways that are hard to diagnose. I'd prefer correctness in this case, even if it means taking a perf hit. But I'm open to alternatives, like initializing the FS at some common point.

@marco-c Any idea what that common point might be?

marco-c · 2015-09-07T12:13:08Z

Given that the regression is small and is only present in some cases, I think we can merge this without waiting for the alternative optimization (I've filed #1871 to investigate it).

…ailure ensure /tcktestdir created for FileConnection tests

ensure /tcktestdir created for FileConnection tests

32d1cd3

mykmelez added 3 commits August 27, 2015 13:23

enable fs debugging

5daabf4

re-disable fs debugging

b3b0ab9

require initFS before loading non-midlet program

1f10a97

mykmelez mentioned this pull request Aug 30, 2015

Re-enable preemption. #1849

Closed

mykmelez closed this Aug 31, 2015

mykmelez deleted the intermittent-file-connection-failure branch August 31, 2015 16:49

mykmelez restored the intermittent-file-connection-failure branch August 31, 2015 16:49

mykmelez reopened this Sep 2, 2015

Merge branch 'master' into intermittent-file-connection-failure

4630f8b

marco-c mentioned this pull request Sep 7, 2015

Start the VM before the FS is initialized #1871

Open

marco-c pushed a commit that referenced this pull request Sep 7, 2015

Merge pull request #1862 from mykmelez/intermittent-file-connection-f…

1697dd6

…ailure ensure /tcktestdir created for FileConnection tests

marco-c merged commit 1697dd6 into mozilla:master Sep 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ensure /tcktestdir created for FileConnection tests #1862

ensure /tcktestdir created for FileConnection tests #1862

mykmelez commented Aug 27, 2015

mykmelez commented Aug 27, 2015

mykmelez commented Aug 27, 2015

marco-c commented Aug 27, 2015

mykmelez commented Aug 27, 2015

marco-c commented Aug 27, 2015

marco-c commented Sep 2, 2015

mykmelez commented Sep 2, 2015

marco-c commented Sep 6, 2015

mykmelez commented Sep 7, 2015

marco-c commented Sep 7, 2015

ensure /tcktestdir created for FileConnection tests #1862

ensure /tcktestdir created for FileConnection tests #1862

Conversation

mykmelez commented Aug 27, 2015

mykmelez commented Aug 27, 2015

mykmelez commented Aug 27, 2015

marco-c commented Aug 27, 2015

mykmelez commented Aug 27, 2015

marco-c commented Aug 27, 2015

marco-c commented Sep 2, 2015

mykmelez commented Sep 2, 2015

marco-c commented Sep 6, 2015

mykmelez commented Sep 7, 2015

marco-c commented Sep 7, 2015