Add vscode.workspace.fs.createWriteStream(). #84515

ericsnowcurrently · 2019-11-11T19:14:54Z

(see #84175)

For the Python extension, we currently use node's fs.createWriteStream() for a variety of purposes:

downloading multi-MB files (and tracking progress)
- calling into a third-party library that expects an fs.WriteStream
writing generated data out to a file a chunk at a time
writing log data out to a file, one entry at a time

For example:

@jrieken

The text was updated successfully, but these errors were encountered:

ericsnowcurrently · 2019-11-11T19:15:00Z

From #84175 (comment):

we have some proposed api but nothing is final yet. @ericsnowcurrently What is the used for? Real big files or just because?

See above.

jrieken · 2019-11-27T11:38:10Z

Let's use this issue to track the open, close, read, write API. That quite low-level but the building block for stream.

vscode/src/vs/vscode.proposed.d.ts

Lines 152 to 158 in ffe3749

    
           export interface FileSystemProvider { 
        
           	open?(resource: Uri, options: { create: boolean }): number | Thenable<number>; 
        
           	close?(fd: number): void | Thenable<void>; 
        
           	read?(fd: number, pos: number, data: Uint8Array, offset: number, length: number): number | Thenable<number>; 
        
           	write?(fd: number, pos: number, data: Uint8Array, offset: number, length: number): number | Thenable<number>; 
        
           }

bpasero · 2019-12-05T09:54:23Z

From my recent experiences from implementing faster read/write IO for remote connections, I believe that the primitives are not a good choice and would rather swap them by a stream solution that has similar support (e.g. start offset + length).

jrieken · 2020-07-09T14:31:43Z

Adding July for discussion

jrieken · 2020-07-17T13:15:06Z

This won't happen so soon. Discussion topics are mostly around streams vs chunks. The latter is the building block of the former and I believe that providers will have an easier job with chucks and consumer will have an easier times with streams. Streams might be become more interesting when we decide to investigate into supporting a fetch-compatible API which also works around streams...

jrieken · 2020-07-17T13:49:40Z

Good read on stream APIs targeted for browsers: https://developer.mozilla.org/en-US/docs/Web/API/Streams_API

jrieken · 2021-06-25T13:34:12Z

JS supports async generator functions and async iterators and they will offer a neat way to implement this. Much more simple than stream, yet equally powerful. Tho, the primitives that we have today are usually at the bottom of all stream/async-iterator solutions.

export interface FileSystemProvider { 
    readFile(uri: Uri, token: CancellationToken): AsyncIterable<Uint8Array>
 }

connor4312 · 2021-12-16T14:56:36Z

We'd like to take advantage of the consumer side of this API in the hex editor, and we've already added code, that uses the Node.js native filesystem API when possible, since the extension host API doesn't provide this yet. Getting a read(2)/write(2) analog would be quite useful 🙂

For the hex editor, a stream-only interface would not be sufficient, or at least the one proposed in #84515 (comment) which lacks a starting offset. We want to load data in incrementally, and if the user scrolls from byte 0 to byte 2GB, we don't want to have to read and discard everything in between.

Likewise for writing -- although we do need to rewrite if the file length changes, if a single byte in a 2GB file is edited there's no need to rewrite the whole thing.

Going with a more primitive approach, I would not have a simple fd: number and instead have an API something like how we do for bulk operations and edits:

export interface FileSystemProvider { 
    open(uri: Uri, options: { create: boolean; writable: boolean }, (handle: FileHandle) => Thenable<void> | void): Thenable<void>;
}

export interface FileHandle {
  read(pos: number, // ...)
  // ...

The native open(2) as well as fs.promises.open() lock the file on a filesystem level while changes are being made to provide consistency. It would make sense for this API to have the same guarantees. Wrapping the file handle in a bulk-style callback like this ensures disposal and discourages long-lived use of the FileHandle in a way that would cause locking problems for users. We could also provide a warning if a lock is held too long. Its nicely encapsulated lifecycle means that appropriate progress indicators can be shown on the file while writes happen if options.writable is set.

bpasero · 2021-12-16T15:00:07Z

The native open(2) as well as fs.promises.open() lock the file on a filesystem level while changes are being made to provide consistency.

Are you sure? On all platforms?

It is unsafe to use fs.write() multiple times on the same file without waiting for the callback. For this scenario, fs.createWriteStream() is recommended.

https://nodejs.org/docs/v14.16.0/api/fs.html#fs_fs_write_fd_buffer_offset_length_position_callback

I just recently added write locking for the node.js based file system provider via Barrier: 9df559f#diff-43c296d8b387331e4b0756c4f496d4f6102b1808554f598c748cbc4d8b33096d

connor4312 · 2021-12-16T15:06:26Z

Are you sure? On all platforms?

At least when I tried it on Windows in O_RDWR | O_CREAT mode. It was for that reason I added code to release the file descriptor when not in use.

It is unsafe to use fs.write() multiple times on the same file without waiting for the callback

I don't think is contradictory, it's saying that if your application calls open() and gets the FileHandle back, multiple write operations should not happen on that file at the same time. The file is still locked for other processes.

bpasero · 2021-12-16T15:37:34Z

👍 didn't know that.

I like the idea of providing an API that would reduce the chances of an extension forgetting to close the file handle because that would only ask for trouble. Now that the disk provider locks from the open call, we do not allow any more writes until close is called. There is no warning printed either.

jrieken · 2021-12-16T15:45:09Z

For the hex editor, a stream-only interface would not be sufficient, or at least the one proposed in #84515 (comment) which lacks a starting offset.

Yeah, that's the shortened variant of the existing read-function which does take an offset, like write. I do like enforcing the "transaction" character by the callback, but I also like async-iterables...

connor4312 · 2021-12-16T15:48:20Z

dbaeumer · 2022-06-03T11:58:14Z

Stupid question: how would I as an extension access the chunk API ?

The current proposed API allows that a file system provider implementor can provide these but I didn't find any API how I would call that API. I can't get to an individual FileSystemProvider and the FileSystem interface can not offer these methods unless we add a default implementation or throw if they are not available on the underlying provider.

jrieken · 2022-06-07T07:51:13Z

Yeah, the proposal doesn't expose them on the "consumer" side yet

AlbertoPdRF · 2022-08-04T11:10:21Z

Hey!

I think this is the right place, so I'm going to drop the question: as of now, is it possible to read a large file in chunks using vscode.workspace.fs? I believe not, but I may have missed something from the conversation above. If that's the case, will it/when will it be possible?

Thanks!

AlbertoPdRF · 2022-08-10T14:56:13Z

By the way, I also found the related #41985 in the backlog, which can maybe be closed to avoid duplication as per #84515 (comment).

bpasero · 2022-08-10T15:13:10Z

I think they are different proposals. One for POSIX primitives and one for a method to write via a stream.

AlbertoPdRF · 2022-08-23T15:01:19Z

Is there any chance that the proposed fsChuncks API finalization gets included to September's iteration plan?

AlbertoPdRF · 2023-02-10T11:23:17Z

Hey! Is there any information on this? Could the proposed fsChuncks API finalization get included in any of the coming monthly iteration plans? If any help is needed, I would be happy to help!

AlbertoPdRF · 2023-05-10T10:33:25Z

Hi @bpasero @jrieken, is there any chance that the finalization of the fsChunks API is included in any of the coming monthly iteration plans? Is there anything I can do to help move this forward? Or any chance to use the API even if it's not finalized?

I'd like to at least know the status on this, as I've had a PR (AlbertoPdRF/root-file-viewer#20) solving two issues blocked by this for almost a year now.

AlbertoPdRF · 2023-06-13T07:02:38Z

Hi @bpasero @jrieken, is there any chance that the finalization of the fsChunks API is included in any of the coming monthly iteration plans? Is there anything I can do to help move this forward? Or any chance to use the API even if it's not finalized?

I'd like to at least know the status on this, as I've had a PR (AlbertoPdRF/root-file-viewer#20) solving two issues blocked by this for almost a year now.

@mjbvz, maybe you can reply?

zm-cttae · 2023-06-13T07:08:36Z

As its backlogged, they have other work they are presumably looking at atm. Although big +1 to this feature

rbuckton · 2024-02-02T17:12:18Z

@jrieken: I'm running into an issue with DeoptExplorer (https://github.com/microsoft/deoptexplorer-vscode) when attempting to parse very large log files due to workspace.fs.readFile having a 2GiB memory limit. I'd originally used NodeJS's fs to open a readable stream to read a single line at a time, but I lost that capability when I switch to using workspace.fs. Unfortunately, V8 isolate logs can get fairly large and there's no easy way to trim them down.

A streaming API would be a useful building block, but a workspace.fs.readLines async generator method that can yield a line at a time without needing to load the whole file into memory would also serve my needs.

Lramseyer · 2024-03-28T08:12:41Z

Since this seems to be the go-to discussion thread for the fsChunks API proposal, I'd also like to +1 this. I'm building a digital waveform viewer extension. Those files can be on the order of gigabytes, so being able to read individual chunks of a file would greatly improve performance and make it useful for a lot more projects.

Here's my extension:
https://github.com/Lramseyer/vaporview

This was referenced Nov 11, 2019

Add more functionality to the "vscode.workspace.fs" API. #84175

Closed

Use VS Code FS API microsoft/vscode-python#6911

Closed

rebornix assigned jrieken Nov 12, 2019

jrieken added api feature-request Request for new features or functionality file-io File I/O labels Nov 12, 2019

ericsnowcurrently mentioned this issue Nov 12, 2019

[meta] Use the vscode filesystem API for *all* filesystem operations. microsoft/vscode-python#8533

Closed

25 tasks

bpasero self-assigned this Nov 14, 2019

bpasero added this to the Backlog milestone Nov 14, 2019

jrieken added the api-proposal label Nov 27, 2019

jrieken mentioned this issue Jul 9, 2020

Support for chunk reading / writing in vscode.fs #102000

Closed

jrieken modified the milestones: Backlog, July 2020 Jul 9, 2020

jrieken modified the milestones: July 2020, Backlog Jul 17, 2020

bpasero removed their assignment Jun 7, 2021

connor4312 mentioned this issue Sep 22, 2021

Support readMemory/writeMemory from debuggers #126268

Closed

This was referenced Aug 4, 2022

Use JSROOT FileProxy to access files from WebView AlbertoPdRF/root-file-viewer#20

Open

Better performance when exploring big files #152492

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vscode.workspace.fs.createWriteStream(). #84515

Add vscode.workspace.fs.createWriteStream(). #84515

ericsnowcurrently commented Nov 11, 2019

ericsnowcurrently commented Nov 11, 2019

jrieken commented Nov 27, 2019

bpasero commented Dec 5, 2019

jrieken commented Jul 9, 2020

jrieken commented Jul 17, 2020

jrieken commented Jul 17, 2020

jrieken commented Jun 25, 2021 •

edited

connor4312 commented Dec 16, 2021

bpasero commented Dec 16, 2021 •

edited

connor4312 commented Dec 16, 2021 •

edited

bpasero commented Dec 16, 2021

jrieken commented Dec 16, 2021

connor4312 commented Dec 16, 2021

dbaeumer commented Jun 3, 2022

jrieken commented Jun 7, 2022

AlbertoPdRF commented Aug 4, 2022

AlbertoPdRF commented Aug 10, 2022

bpasero commented Aug 10, 2022

AlbertoPdRF commented Aug 23, 2022

AlbertoPdRF commented Feb 10, 2023

AlbertoPdRF commented May 10, 2023

AlbertoPdRF commented Jun 13, 2023

zm-cttae commented Jun 13, 2023 •

edited

rbuckton commented Feb 2, 2024

Lramseyer commented Mar 28, 2024

Add vscode.workspace.fs.createWriteStream(). #84515

Add vscode.workspace.fs.createWriteStream(). #84515

Comments

ericsnowcurrently commented Nov 11, 2019

ericsnowcurrently commented Nov 11, 2019

jrieken commented Nov 27, 2019

bpasero commented Dec 5, 2019

jrieken commented Jul 9, 2020

jrieken commented Jul 17, 2020

jrieken commented Jul 17, 2020

jrieken commented Jun 25, 2021 • edited

connor4312 commented Dec 16, 2021

bpasero commented Dec 16, 2021 • edited

connor4312 commented Dec 16, 2021 • edited

bpasero commented Dec 16, 2021

jrieken commented Dec 16, 2021

connor4312 commented Dec 16, 2021

dbaeumer commented Jun 3, 2022

jrieken commented Jun 7, 2022

AlbertoPdRF commented Aug 4, 2022

AlbertoPdRF commented Aug 10, 2022

bpasero commented Aug 10, 2022

AlbertoPdRF commented Aug 23, 2022

AlbertoPdRF commented Feb 10, 2023

AlbertoPdRF commented May 10, 2023

AlbertoPdRF commented Jun 13, 2023

zm-cttae commented Jun 13, 2023 • edited

rbuckton commented Feb 2, 2024

Lramseyer commented Mar 28, 2024

jrieken commented Jun 25, 2021 •

edited

bpasero commented Dec 16, 2021 •

edited

connor4312 commented Dec 16, 2021 •

edited

zm-cttae commented Jun 13, 2023 •

edited