Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leak: extfs uses unbounded cache for unicode normalization #4010

Closed
jrieken opened this issue Mar 11, 2016 · 21 comments
Closed

Leak: extfs uses unbounded cache for unicode normalization #4010

jrieken opened this issue Mar 11, 2016 · 21 comments
Assignees
Labels
bug Issue identified by VS Code Team member as probable bug verified Verification succeeded
Milestone

Comments

@jrieken
Copy link
Member

jrieken commented Mar 11, 2016

  • Set up a workspace as outlined here: tsserver crashes with large projects TypeScript#7444
  • find the tmp folder in the explorer and expand it (there is only 1 items)
  • take a heap snapshot
  • run ember serve, wait for it start, notice the tmp folder being populated, and stop the server.
  • repeat the previous step 2 or 3 times
  • take another heap snapshot
  • memory grew by ~1MB
@jrieken
Copy link
Member Author

jrieken commented Mar 11, 2016

before and after heap snapshots Archive.zip

@bpasero bpasero added the bug Issue identified by VS Code Team member as probable bug label Mar 11, 2016
@bpasero bpasero added this to the March 2016 milestone Mar 11, 2016
@bpasero
Copy link
Member

bpasero commented Mar 17, 2016

Similar as #4123 running heap allocations I am not seeing concrete objects on our end leaking:

image

The Object, WinJS and Array things are pending delayers from the git world because a file event triggers a git refresh.

@bpasero bpasero closed this as completed Mar 17, 2016
@bpasero bpasero assigned jrieken and unassigned bpasero Mar 17, 2016
@jrieken jrieken reopened this Mar 17, 2016
@jrieken
Copy link
Member Author

jrieken commented Mar 17, 2016

Disagree - I don't have the workspace under git or any other UI element open except for the explorer. Taking two snapshots and comparing them reveals 1.3MB growth and it does contain FileStat, URI, ViewItem and many html elements.

screen shot 2016-03-17 at 09 35 00

@jrieken jrieken assigned bpasero and unassigned jrieken Mar 17, 2016
@jrieken
Copy link
Member Author

jrieken commented Mar 17, 2016

For instance those +18 uri objects are held in some RowCache.legacy array and they carry outdated data (some path inside /tmp/babel... which doesn't exist anymore)

@jrieken
Copy link
Member Author

jrieken commented Mar 17, 2016

screen shot 2016-03-17 at 09 43 35

@jrieken
Copy link
Member Author

jrieken commented Mar 17, 2016

@bpasero @joaomoreno let's use this issue to test/compare the different profiling tools, cos' I don't get what the heap timeline shows. I don't get matching results with the heap snapshot comparison nor with the Timeline feature (see how heap and node usage grows in parallel).

screen shot 2016-03-17 at 10 52 42

@bpasero
Copy link
Member

bpasero commented Mar 17, 2016

@jrieken the difference to comparing snap shots is that it is easier to understand if some objects are still in memory because they leak or because they are not yet disposed from the previous run.

To give an example: imagine a component that only disposes its data the next time data is being displayed in that component. If you use a snapshot before showing data and after showing data, you would think the component leaks data. If you look at the timeline you will notice that the data is being disposed once the widget is used again. since you can repeat this process many times, it is easier to distinguish a real leak from something I would call a "late dispose of things".

@jrieken
Copy link
Member Author

jrieken commented Mar 17, 2016

I understand that and that's why I always run the action in question before I take the first snapshot, then repeat, and then take the second snapshot.

@jrieken
Copy link
Member Author

jrieken commented Mar 17, 2016

always run the action in question before I take the first snapshot,

which I realised I have not mentioned in the steps about..

@bpasero
Copy link
Member

bpasero commented Mar 17, 2016

For example in comment #4010 (comment) I suspect that a template from a previous tree refresh() was not disposed yet, but it is hard to know if this is true or not.

Looking at the timeline I find it easier to distinguish false positives from real leaks because you get a better feeling of what happens before and after you invoked an action. It also gives you a lot more freedom to select the range you want to compare for leak detection.

@jrieken
Copy link
Member Author

jrieken commented Mar 17, 2016

screen shot 2016-03-17 at 11 28 21

I understand all of that, but I don't understand how the heap grow in the picture above. Snapshot 1 was taken before any action, Snapshot 2 was taken after the first ember serve & stop cycle and the growth could be templates, caches etc. Now Snapshot 3 is taken after another cycle of serve&stop and in my model the heap should have not grown anymore.

@bpasero
Copy link
Member

bpasero commented Mar 17, 2016

To me looking at the total heap size is wrong because it is up to V8 when to do GCs and bring down the heap. That must not necessarily mean there is a leak between the snapshots.

I just tried again with a very simple example (just 1 file and 1 folder) by deleting the file/folder and undoing that and comparing the snapshots. I do see heap increase but I do not see objects lying around that come from our world.

@jrieken
Copy link
Member Author

jrieken commented Mar 17, 2016

To me looking at the total heap size is wrong because it is up to V8 when to do GCs and bring down the heap. That must not necessarily mean there is a leak between the snapshots.

Prior to a heap snapshot a GC is performed otherwise the whole feature would be useless.

In the end it's our code that causes this and drilling a little deeper I see how the (array) portion keeps growing, which contains referenced to deoptimized code (FileStat.find for instance). From the profiler you see that method being hit often but somehow V8 cannot keep it native. This might not be a leak in the typical case but it's part of growing heap pressure.

@jrieken
Copy link
Member Author

jrieken commented Mar 17, 2016

ticks_fs.txt

scan for LazyCompile: the tilde means it cannot be compiled to native, the asterisk means it can. My theory is that code path become hot (like the explorer trying find the right position for X-thousand file events), the VM failing to compile some functions to native and dumping the deopt result onto the heap.

@jrieken
Copy link
Member Author

jrieken commented Mar 17, 2016

@bpasero So apart from the crazy talk about VM internals and heap snapshot vs head timeline the problem in this concrete scenario is to not drill deep enough into the (array) etc items. I happen to stumble over an object with suspicious keys (file names) which is very large and voila I see a leak: https://github.com/Microsoft/vscode/blob/master/src/vs/base/node/extfs.ts#L30

There is one brave object (normalizedCache) holding onto every path that was ever read from a directory.

@joaomoreno
Copy link
Member

Now that is interesting.

@joaomoreno
Copy link
Member

Comparing two snapshots is equivalent to running a Heap Timeline and not touching the time interval. What you get out of it is all objects that can still be reached in memory. This includes every object that was reachable in the instant that the timeline started as well as every object allocated during the time span between the two snapshots. Given this, you will see many things in there which are irrelevant to the test in question.

I think we can all agree that a proper leak test is a repetition of a task, while minimising external influences. Additionally, the test should end in the same state as it begins (start with a full explorer, refresh it n times, end with a full explorer with the same number of files and folders as initially).

Thus, in my opinion, the test initially indicated in the first post of this issue is going to draw some inconclusive results. Here's why:

  • The first snapshot should've been taken not before the first call to ember serve, but after. This ensures that once we repeat the ember serve calls and we take a final snapshot, we are in the same state as the initial snapshot. Otherwise, many allocations will appear in the diff, from that initial ember serve. These are possible leaks but not certain.
  • The last snapshot should always be taken with a grain of salt. In this case especially, since the state in which we take the snapshot keeps a hold of the explorer items and renders them in the screen. Running a diff of the two snapshots will obviously show them as allocations that were never released; they will likely be released only in another eventual repetition. These are possible leaks but not certain.

One way I've found to detect certain leaks is to narrow the test time window disregarding the final repetition. And this can only be obtained when running a Heap Timeline. Here are the initial tests steps, revised by me:

  • find the tmp folder in the explorer and expand it (there is only 1 items)
  • A: run ember serve, wait for it start, notice the tmp folder being populated, and stop the server.
  • start the Heap Timeline
  • repeat A 10 times
  • stop the Heap Timeline

After this, narrow the time window of the Heap Timeline to disregard the last repetition. Logic states that if (and only if) there is a leak directly related to task _A, it will appear as allocated objects throughout those 9 repetitions_.

If objects do appear in that time window, it is a definite leak. Otherwise, there is no leak. There is no possibility in this test, there is only certainty.

@joaomoreno
Copy link
Member

On code deopt:

Holding references to code deoptimization is not a leak. It is a problem, sure, just not a leak. It just means that we need more memory to keep that function alive, when we could use less. We should definitely fix it.

@jrieken
Copy link
Member Author

jrieken commented Mar 17, 2016

Yes - code depot is a performance problem, not a leak.

Yes - the leak test should have been performed after first ember serve even though it is questionable why the tree caching holds on to old model objects.

Yes/No - it doesn't matter if you use the heap timeline or manual heap snapshots. The heap timeline from above (#4010 (comment)) shows leaks, the manual snapshots shows heap growth. Whatever you favour just don't stop at the complicating looking objects on the heap.

@bpasero bpasero added this to the Backlog milestone Mar 17, 2016
@bpasero bpasero removed this from the March 2016 milestone Mar 17, 2016
@bpasero bpasero changed the title Leakage with explorer file events Leak: extfs uses unbounded cache for unicode normalization Mar 17, 2016
@bpasero
Copy link
Member

bpasero commented Mar 17, 2016

Removing this cache will make many file system related operations slow, including search. I verified that this cache is not being used as long as tmp is not expanded in the explorer. I would not think that having this explorer expanded is typical because there are so many empty folders added here that scrolling becomes a nightmare. I suggest to leave this as is for GA.

@jrieken
Copy link
Member Author

jrieken commented Mar 18, 2016

disagree - just make the cache have an upper limit say 10000.

@jrieken jrieken modified the milestones: March 2016, Backlog Mar 18, 2016
@bpasero bpasero assigned jrieken and unassigned bpasero Mar 18, 2016
@jrieken jrieken added the verified Verification succeeded label Mar 21, 2016
@vscodebot vscodebot bot locked and limited conversation to collaborators Nov 18, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Issue identified by VS Code Team member as probable bug verified Verification succeeded
Projects
None yet
Development

No branches or pull requests

3 participants