Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add File API proposal design document #3101

Merged
merged 3 commits into from
Jun 26, 2023
Merged

Add File API proposal design document #3101

merged 3 commits into from
Jun 26, 2023

Conversation

oleiade
Copy link
Member

@oleiade oleiade commented May 30, 2023

This Pull Request contains a design document for a FIle API proposal that intends to tack #2977 as part of the #2974 epic.

There is a POC branch available containing a very very rough, but functional implementation of it.

Looking forward to your feedback and suggestions 🙇🏻

Copy link
Collaborator

@mstoykov mstoykov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general!

tl;dr:

  • I am for only async functions even at the cost of some ergonomity shortterm
  • Smaller API will be easier to stabilize and rewrite as we need and expand on the current code. As such I am for dropping the majority of the "porcelain".
  • While this code improves stuff it is not very useful until we get streams and support for streams across k6.

docs/design/019-file-api.md Outdated Show resolved Hide resolved
docs/design/019-file-api.md Show resolved Hide resolved
The initial API will mostly be asynchronous, except for the `open` functionality which will be synchronous due to the current lack of support for `await` operations within the init context.

The API will have the following characteristics:
- Load file content exactly once into memory.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well .. the underlying implementation already loads it once and if it over 100kb it likely loads a couple of more times. This is blocked on removing afero.

But hopefully this will be a lot better

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

Copy link
Collaborator

@codebien codebien Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So removing afero is part of the implementation? Should we have a rude but working PoC with it? We should also link the issue here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't argue removing afero is part of the implementation.

afero adds +1 on the times we load it in memory and there is +1-2x if it is fairly big file.

This second memory though is temporarily and will be reused. You can see a more detailed explanation here

So we will have at least 2 copies of the file in memory, which is far from ideal, but definitely a lot better than having a copy per VU.

The whole proposal currently while having some values does not make it a lot more useful to open big files, as the moment you have to use them you still need to start copy. So dropping the 2 copies of this part is likely negligible to getting streams and not copying on the "other side" where this data will be used by something.

I am not against removing afero entirely and would like to prioritise it again, but I don't think this is our biggest problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also keep removing afero out of scope for this design.

Thanks for the heads-up on afero indeed, I didn't have the full context in mind regarding how it handles memory itself 🙇🏻 I'll try to rephrase to be more accurate.

docs/design/019-file-api.md Outdated Show resolved Hide resolved
docs/design/019-file-api.md Outdated Show resolved Hide resolved
docs/design/019-file-api.md Outdated Show resolved Hide resolved
docs/design/019-file-api.md Show resolved Hide resolved
* Resolves to a `FileInfo` describing the file.
*/
stat(): Promise<FileInfo>
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No close ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether we need one or not depends a lot on the underlying implementation, so I was on the verge. As a safety net, and for the API to fill intuitive indeed, it's probably better to have one indeed. Adding it 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think File.close() would be confusing with the current implementation, considering open() returns a slice with the file data.

So what would close() actually do? It certainly wouldn't close the file, as the comment here states. If we have the concept of a FileHandle, then close() could make it unusable from that point on? Not sure what the purpose of that would be, since FileHandles shouldn't have any expensive resources to free.

I get that a counterpart to open() would be intuitive, but since we're not dealing with traditional file handles here, a close() doesn't make sense to me.

Maybe instead of a top-level open() function, a File constructor would be more intuitive? E.g.:

let f = new File('/path');
f.read() ...

This would make it clear that File handles are cheap and disposable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding close I had left it out of the proposal and POC implementation as they were not necessary with the way I approached solving the problem on the implementation side. But this discussion is a good opportunity to mention that, yet, I didn't want to assume what the precise underlying implementation would be. However, I had thought of one potential use of close: the file registry I used in the POC to keep files content in memory, and provide a pointer to the data to VUs, could keep a reference count for each file->vu number, and once a file is not referenced anymore (because all VUs closed it), it could explicitly drop it. But I'm not entirely convinced that's really useful in practice.

Regarding using a constructor as opposed to open, I'm not opposed to it, but I would personally prefer to stick to an API that ressembles what you find in OSes and other languages. On another note that maybe proves interesting somehow: In my initial implementation of the POC, the File struct was actually called FileView to reflect that it's a view of the file content, cannot be modified, and is a "cheap and disposable" handle.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think File.close() would be confusing with the current implementation

implementation is the key word here.

And this is an API proposal. If after making this better we need to break the API because now close will need to be called in order for it to work that will be bad.

I would expect that any file you open - you will be able to close. Whether that has a significant impact is a good question.

If there was a different way to tell us that "I no longer need this file" I will be fine. But at this point this is not possible AFAIK. There are some resource management tc39 proposals that I am not going to discuss as they are just proposal and will require that we have a close like method to tell us that a file is no longer needed. So ... not really a solution.

I am okay with this not having close for now - I do expect we will likely need to iterate on all the changes around the epic "support big files in k6" quite a lot. But wanted to lay my reasoning on wanting a close.

Maybe instead of a top-level open() function, a File constructor would be more intuitive? E.g.:

While I am not against it in principal, I don't see why that will be better..

If anything the current API is closer to what is in deno and in other places and consequently might be reusable across platforms. Or more likely easier to support when someone wants to reuse a library. So ... 👎 on my side.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that open() and close() only make sense if the file itself is opened and closed. But since we'll only open the file and read all its data once, and each subsequent call to open() will return a view of the data, calling close() on this view would be confusing if people are expecting the file to be closed.

This behavior will not change in a different implementation, since it's the core of what we want to accomplish. So the naming doesn't quite align with the behavior, and repurposing the name used in other frameworks to forcefully align it wouldn't make it more intuitive--quite the contrary. As a user, I'd rather have the API clearly reflect the behavior, than have to remap my existing assumptions of what open() and close() do.

I suggested using a constructor instead of open() since it doesn't imply that a close() would be needed. But I'm not sold on it either, and we could consider alternatives like FileView.

Anyway, this is not a blocker from my side. So if you strongly feel we should keep it as is, I'm fine with it.

docs/design/019-file-api.md Outdated Show resolved Hide resolved
docs/design/019-file-api.md Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Jun 1, 2023

Codecov Report

Merging #3101 (2347573) into master (ecca789) will increase coverage by 0.06%.
The diff coverage is 78.51%.

❗ Current head 2347573 differs from pull request most recent head 15be818. Consider uploading reports for the commit 15be818 to get more accurate results

@@            Coverage Diff             @@
##           master    #3101      +/-   ##
==========================================
+ Coverage   73.70%   73.76%   +0.06%     
==========================================
  Files         239      241       +2     
  Lines       18258    18457     +199     
==========================================
+ Hits        13457    13615     +158     
- Misses       3933     3970      +37     
- Partials      868      872       +4     
Flag Coverage Δ
ubuntu 73.76% <78.51%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
output/cloud/expv2/metrics_client.go 58.00% <ø> (ø)
output/cloud/expv2/mapping.go 45.07% <45.07%> (ø)
output/cloud/expv2/flush.go 83.87% <83.87%> (-16.13%) ⬇️
output/cloud/expv2/output.go 84.14% <84.21%> (+2.43%) ⬆️
output/cloud/expv2/sink.go 92.30% <92.30%> (+42.30%) ⬆️
output/cloud/expv2/hdr.go 95.83% <95.83%> (ø)
cmd/run.go 72.76% <100.00%> (ø)
js/common/util.go 79.54% <100.00%> (+1.49%) ⬆️
js/jsmodules.go 100.00% <100.00%> (ø)
js/modules/k6/data/data.go 88.00% <100.00%> (+0.76%) ⬆️
... and 3 more

... and 2 files with indirect coverage changes

Copy link
Contributor

@imiric imiric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @oleiade! 👏

I glanced at the PoC, and it looks reasonable. 👍

docs/design/019-file-api.md Show resolved Hide resolved
docs/design/019-file-api.md Outdated Show resolved Hide resolved
docs/design/019-file-api.md Outdated Show resolved Hide resolved
docs/design/019-file-api.md Outdated Show resolved Hide resolved
@oleiade oleiade merged commit 2041cc7 into master Jun 26, 2023
21 checks passed
@oleiade oleiade deleted the docs/design/file-api branch June 26, 2023 07:46
oleiade added a commit that referenced this pull request Jun 26, 2023
* Add File API proposal design document

* File API design document revision 1

* File API design document revision 2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants