Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The initial execution of a Python file is slow again #30

Closed
dbaeumer opened this issue Nov 28, 2022 · 23 comments
Closed

The initial execution of a Python file is slow again #30

dbaeumer opened this issue Nov 28, 2022 · 23 comments
Assignees
Milestone

Comments

@dbaeumer
Copy link
Member

Need to investigate why this is happening again.

@dbaeumer dbaeumer self-assigned this Nov 28, 2022
@dbaeumer dbaeumer added this to the November 2022 milestone Nov 28, 2022
@dbaeumer
Copy link
Member Author

@joyceerhl has something changed that can explain this. I saw this especially when debugging and @alexr00 saw it as well during testing. See microsoft/vscode#167399 (comment)

@dbaeumer
Copy link
Member Author

Here is a network trace when starting a debugging session.

capture

Interesting is that every file fetch takes again between 100-250 ms although I preload the workspace content using the loadWorkspaceContents

@dbaeumer
Copy link
Member Author

A second run in the same session is more or less instant.

@dbaeumer
Copy link
Member Author

dbaeumer commented Nov 29, 2022

I tried the same under Google Chrome after clearing all browser data and here debugging was instant with the following network trace:
capture

@joyceerhl
Copy link
Contributor

I was able to repro exactly what Alex saw in Edge, i.e. running a file is fast and debugging right after is very slow on the first try, but subsequently debugging is fast. I even ran Developer: Reset User Data which is a VS Code command that wipes all extension data and IndexedDB contents, but debugging continued to be fast afterwards, which is a bit mysterious.

@joyceerhl
Copy link
Contributor

Not able to repro the slowness at all in Chrome with https://insiders.vscode.dev/github/dbaeumer/python-sample?vscode-coi=.

@dbaeumer
Copy link
Member Author

@joyceerhl the difference might be that when debugging a lot more Python library files need to be pulled from GitHub than compared to running a Python file. However, I have no explanation why pulling the files needed for debugging should be slow since I assume they should be cached as well (they come from the same GitHub repository).

@dbaeumer
Copy link
Member Author

I tried various things today to reproduce this but wasn't able to do so. I will keep logging enabled for my testing in case I see this again so that we know more why this could have happened.

@dbaeumer
Copy link
Member Author

For some reason after refreshing the page I got a slow run and debug.

Trace from the Python WASM extension

[Info - 4:03:10 PM] Workspace content loaded successful.
[Info - 4:03:10 PM] python.wasm cached successful.

And here the trace of the remote repository channel

trace-30.txt

Ignore the timestamp difference. I think remote repositories uses UTC and I use local time which is UTC+1.

@dbaeumer
Copy link
Member Author

What is interesting about the trace is that all stat calls are fast but all readFile calls to vscode-vfs://github/dbaeumer/python-3.11.0rc are slow (e.g. they take >= 300ms)

@joyceerhl Could it be that some throttling happens. If this can happen even in the load workspace content case could we fin that out in the response header and log it as well.

@joyceerhl
Copy link
Contributor

Do you happen to have the GitHub Repositories log as well? That log above is for Remote Repositories, and I'm interested in whether we are fulfilling the reads from the tar or from network.

@dbaeumer
Copy link
Member Author

dbaeumer commented Dec 1, 2022

Actually no. Will capture it the next time

@dbaeumer
Copy link
Member Author

dbaeumer commented Dec 5, 2022

@joyceerhl here are the two outputs from a slow debug session in the web

remote-repository.txt
github-repositories.txt

The vscode-vfs://github/microsoft/vscode-python-web-wasm repository actually contains the Python wasm file and library code. The vscode-vfs://github/dbaeumer/python-sample contain the sample code.

The content of the vscode-vfs://github/microsoft/vscode-python-web-wasm changes quite frequently since it contains the extension code and the Python WASM and libs. I saw in the trace some sort of merge conflict stuff. Could this cause the slowdown?

@joyceerhl
Copy link
Contributor

Yeah, the github-repositories.txt log suggests that this is what happens:

  1. We load the sample code, this is not loaded from a tar
  2. We load the vscode-python-web-wasm repository via the loadWorkspaceContents API, i.e. download tar and extract it
    [2022-12-05 10:27:40:426] [  190] TarPit.getTar(github/microsoft/vscode-python-web-wasm/087c5ef9e56b42245ad47a8b38df764c4d7798ee): Not found • Getting .tar from disk took 46 ms • Getting .tar from network...
    [2022-12-05 10:27:40:448] [  191] GitHub.ensureAuthenticated
    [2022-12-05 10:27:40:448] [  191] GitHub.ensureAuthenticated took 0 ms
    [2022-12-05 10:27:40:462] [  186] [GITHUB] GET /repos/{owner}/{repo}/git/blobs/{file_sha} took 291 ms
    [2022-12-05 10:27:40:462] [  185] getFileContentsRetrieved sorter.py from web took 292 ms
    [2022-12-05 10:27:40:477] [  183] [GITHUB] GET /repos/{owner}/{repo}/git/blobs/{file_sha} took 310 ms
    [2022-12-05 10:27:40:477] [  182] getFileContentsRetrieved hello.py from web took 311 ms
    [2022-12-05 10:27:40:482] [  175] [GITHUB] GET /repos/{owner}/{repo}/git/blobs/{file_sha} took 325 ms
    [2022-12-05 10:27:40:483] [  174] getFileContentsRetrieved debug.py from web took 326 ms
    [2022-12-05 10:27:40:488] [  179] [GITHUB] GET /repos/{owner}/{repo}/git/blobs/{file_sha} took 328 ms
    [2022-12-05 10:27:40:494] [  178] getFileContentsRetrieved endless.py from web took 334 ms
    [2022-12-05 10:27:41:551] [  190] TarPit.getTar(github/microsoft/vscode-python-web-wasm/087c5ef9e56b42245ad47a8b38df764c4d7798ee) Getting .tar from network took 1124 ms • Unzipping .tar...
    [2022-12-05 10:27:42:033] [  190] TarPit.getTar(github/microsoft/vscode-python-web-wasm/087c5ef9e56b42245ad47a8b38df764c4d7798ee)Finished getting tar took 1653 ms (Unzipping .tar took 482 ms)
    [2022-12-05 10:27:42:033] Cleaning up search cache...
    [2022-12-05 10:27:42:033] [  190] TarPit.getTar(github/microsoft/vscode-python-web-wasm/087c5ef9e56b42245ad47a8b38df764c4d7798ee) took 0 ms • Got .tar from network
    [2022-12-05 10:27:42:035] Sucessfully wrote tarball to disk in 1655ms
    
  3. We then realize that we're actually 1 commit behind and perform a fast forward
    [2022-12-05 10:28:05:466] [  208] GitHub.getAheadBehindState(vscode-vfs://github/microsoft/vscode-python-web-wasm, base=087c5ef9e56b42245ad47a8b38df764c4d7798ee, head=f973023d19ba8201f0ed452c3c6556c82624d005) took 631 ms • 1 commits behind
    [2022-12-05 10:28:05:466] [  206] GitHub.fetch(vscode-vfs://github/microsoft/vscode-python-web-wasm, force=true, to=undefined) took 941 ms (Fetching upstream changes took 632 ms) • Changes detected; 1 commits behind
    [2022-12-05 10:28:05:467] Fast forward to f973023d19ba8201f0ed452c3c6556c82624d005; rootUri=vscode-vfs://github/microsoft/vscode-python-web-wasm
    
  4. This invalidates the cached tar and all subsequent requests again go to network and do not benefit from caching

@dbaeumer
Copy link
Member Author

dbaeumer commented Dec 8, 2022

@joyceerhl thanks a lot for the investigation. Is there anything I can do here?

@joyceerhl
Copy link
Contributor

  • Do you anticipate the vscode-python-web-wasm repo to change frequently? If it will remain mostly untouched, users are unlikely to run into this scenario (in the short run).
  • Nevertheless, this exposes a problem that if a repository has been fast forwarded, not only do users potentially get unlucky with poor performance, but consumers of loadWorkspaceContents may also continue to read files greedily under the assumption that the reads are coming out of the tar, which could cause us to exhaust GH API rate limits or potentially offer up stale files for anycode language server references depending on how anycode handles file change events today (cc @jrieken). I can think of at least three different ways to address this, the first one would not require adoption from anycode or vscode-python-web-wasm and the other two would:
    1. Change the behavior of loadWorkspaceContents such that we automatically re-run loadWorkspaceContents for any repository uri that it has been invoked on before if the repository gets synced
      • This lets us avoid exposing more information on the extension API surface but could be unexpected/unintuitive
    2. Expose an extension API event onDidSyncRepository which fires when the repository is synced and the tar is invalidated. vscode-python-web-wasm and anycode would then subscribe to this event and re-invoke loadWorkspaceContents when the event fires
      • This requires exposing and adopting a new extension API which hardens the requirement to run all these extensions in the same extension host
    3. Consumers of loadWorkspaceContents to pass in the specific SHA they are interested in (i.e. make it explicit which version of the tar we're fetching)
      • This doesn't seem like a great option for consumers who must now know what the latest SHA is

@dbaeumer
Copy link
Member Author

dbaeumer commented Dec 9, 2022

Actually, vscode-python-web-wasm will change frequently since it host currently both the default python installation and the extension code. We could move the default python installation into another repository but that would look strange from a repository point of view.

I am actually for option (ii). I do for example load the wasm file into shared memory (and cache it there) so that I don't need to copy it into the worker on every run. Since a sync can happen at any time I would need to drop the WASM file again and reload it into shared memory. For that I would need the event.

@joyceerhl
Copy link
Contributor

joyceerhl commented Dec 9, 2022

@dbaeumer is copying the wasm file into shared memory still necessary/beneficial if we have shared array buffers and cross origin isolation enabled (I assume so but just want to confirm)? If it is then I'd propose to add

readonly onDidSyncRepository: Event<{ repositoryUri: vscode.Uri }>;

@joyceerhl
Copy link
Contributor

Actually, RemoteHub implements watch and onDidChangeFile, so I think a fourth option is for your extension to call vscode.workspace.createFileSystemWatcher on the relevant repository URI and subscribe to FileSystemWatcher.onDidChange.

@dbaeumer
Copy link
Member Author

@joyceerhl I think it depends on how we handle these buffers now in memory. To my understanding VS Code's FS API still uses no shared buffers (@jrieken need to clarify). As a result the extension still profits from copying the WASM into shared memory since it avoid copying it into the worker over and over again.

@jrieken
Copy link
Member

jrieken commented Dec 12, 2022

We have no GA for shared memory yet

@jrieken
Copy link
Member

jrieken commented Dec 12, 2022

fourth option is for your extension to call vscode.workspace.createFileSystemWatcher on the relevant repository URI and subscribe to FileSystemWatcher.onDidChange.

I think that's the best, most API'ish, option and it doesn't exclude any future optimisations

@dbaeumer
Copy link
Member Author

Listening now to the event in version 0.9.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants