Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement copy_to and move_to for S3_File #9054

Merged
merged 28 commits into from
Feb 16, 2024
Merged

Conversation

radeusgd
Copy link
Member

Pull Request Description

Important Notes

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • Screenshots/screencasts have been attached, if there are any visual changes. For interactive or animated visual changes, a screencast is preferred.
  • All code follows the
    Scala,
    Java,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • All code has been tested:
    • Unit tests have been written where possible.
    • If GUI codebase was changed, the GUI was tested when built using ./run ide build.

@radeusgd
Copy link
Member Author

image

I mentioned that the tests are a bit slow (running for almost 3 minutes).
After further analysis of timings of individual methods I don't see any obvious outliers - it's just that every S3 operation seems to take about 0.7s and we do about 200 hundred of those to handle testing all the various edge cases.

We could try re-using existing objects a bit more to make it more efficient, but that could complicate the structure of the tests so much that I'm not sure if it's worth it. After all 3 minutes is not that bad, although not ideal if it means 3 minutes added to every CI run...

@radeusgd
Copy link
Member Author

I mentioned that the tests are a bit slow...

As a sanity check, I tried using the AWS CLI to compare the performance:
image

There, an upload took about 1.5s and download about 2.2s. Of course that includes also the whole warmup of the CLI (which IIRC is in Python which does not have horrible startup times). Overall, it seems that the S3 transfers are not terribly fast and I do not think that the 0.7s per operation that we get is in any way 'anomalous' and suggesting we have some issues with our AWS SDK integration / approach.

@radeusgd
Copy link
Member Author

To better understand the timing of the operations, I temporarily added measurement facilities (a1577dc) and checked various operations in the REPL:

> Duration.time_execution <| "abc".write (S3_File.new "s3://enso-ci-s3-test-stage/foo.txt")
read_bucket     01.3754308s     foo.txt
copy_object     00.7509514s     foo.txt -> foo.txt.bak
put_object      00.882127s      foo.txt
>>> (Pair.Value PT3.5453924S (S3_File.Value 'enso-ci-s3-test-stage' 'foo.txt' Nothing))
> Duration.time_execution <| (S3_File.new "s3://enso-ci-s3-test-stage/foo.txt").exists
read_bucket     00.7141655s     foo.txt
>>> (Pair.Value PT0.7232116S True)
> Duration.time_execution <| "abc".write (S3_File.new "s3://enso-ci-s3-test-stage/foo.txt") on_existing_file=Existing_File_Behavior
.Overwrite
put_object      00.8693344s     foo.txt
>>> (Pair.Value PT0.8857875S (S3_File.Value 'enso-ci-s3-test-stage' 'foo.txt' Nothing))
> Duration.time_execution <| "abc".write (S3_File.new "s3://enso-ci-s3-test-stage/foo-new.txt")
read_bucket     00.7033565s     foo-new.txt
put_object      00.8489154s     foo-new.txt
>>> (Pair.Value PT1.5753775S (S3_File.Value 'enso-ci-s3-test-stage' 'foo-new.txt' Nothing))
> Duration.time_execution <| "abc".write (S3_File.new "s3://enso-ci-s3-test-stage/foo-new.txt")
read_bucket     00.7075899s     foo-new.txt
copy_object     00.7284499s     foo-new.txt -> foo-new.txt.bak
put_object      00.8774334s     foo-new.txt
>>> (Pair.Value PT2.3323846S (S3_File.Value 'enso-ci-s3-test-stage' 'foo-new.txt' Nothing))
> Duration.time_execution <| (S3_File.new "s3://enso-ci-s3-test-stage/foo.txt").size
raw_head        00.7136096s     foo.txt
>>> (Pair.Value PT0.7184278S 3)
> Duration.time_execution <| (S3_File.new "s3://enso-ci-s3-test-stage/foo.txt").last_modified_time
raw_head        00.6900763s     foo.txt
>>> (Pair.Value PT0.6991006S 2024-02-15T13:10:30+01:00[Europe/Warsaw])
> Duration.time_execution <| "abc".write (S3_File.new "s3://enso-ci-s3-test-stage/foo.txt") on_existing_file=Existing_File_Behavior.Error
read_bucket     00.7050382s     foo.txt
>>> (Pair.Value PT0.7303273S (Error: (Already_Exists (File_Like.Value (S3_File.Value 'enso-ci-s3-test-stage' 'foo.txt' Nothing)))))
>

We can see that usually a single operation seems to take around 0.7s. Backup (default...) strategy is the worst case because it requires 3 'primitive' operations - exists+copy+upload if the file already exists and 2 if it doesn't (exists+upload). Overwrite is much faster as it just does 1 primitive op: upload without any checks.

@radeusgd
Copy link
Member Author

I tried using the VisualVM for profiling (well, polyglot sampling to be exact) but tbh I cannot quite interpret the results.

image

The profile above is generated with operation:

> Duration.time_execution <| "abc".write (S3_File.new "s3://enso-ci-s3-test-stage/foo.txt") on_existing_file=Existing_File_Behavior.Overwrite
put_object      01.4250147s     foo.txt
>>> (Pair.Value PT1.9046111S (S3_File.Value 'enso-ci-s3-test-stage' 'foo.txt' Nothing))
  1. I do not clearly see the polyglot-Java calls there - as on screenshot the tree ends with Request_Body::from_local_file and File_Error.type::handle_java_exceptions - but these call into the Java interop and I cannot see these calls there. But these are the most important calls as I'd like to see the timings of the requests done by AWS SDK.
  2. The timings seem completely off. As we can see the 'wall clock time' measured in Enso for the whole operation was 1.9s and 1.42s for the put_object primitive S3 op being part of it. But the polyglot sampler only shows 613ms of runtime, so only 30% seems to be 'counted'. I imagine this may be due to the fact that the 90% of that 1.42s of upload run-time is spent in the main thread WAITing for the response from AWS, so it is not counted as CPU-time. But this is also time that I'm interested in.

@JaroslavTulach do you see any obvious mistakes I may have made here in the approach to sampling the application? Can I do something to also reasonably measure the wait times and Java-polyglot calls?

@radeusgd
Copy link
Member Author

As far as I could have measured using the available tools, I think that the ~0.7s per request timings are just due to S3 being slow and requiring long round-trips, and not due to some anomalies in our implementation.

As for the tests taking a lot of time - I guess that is quite expected from tests that need to phone external services and I think we need to think of implementing the strategy that we were considering for additional DB backends - of running these tests on the nightly builds + on-demand on CI only when we modify related logic (e.g. triggered by adding a label to the PR). This way these tests will not clog our main CI process too much.

I don't think we can make the tests less resource intensive, as we just need good coverage to avoid bugs and being more 'frugal' about re-using S3 resources will risk making the tests harder to understand and maintain.

@JaroslavTulach
Copy link
Member

1. I do not clearly see the polyglot-Java calls there - 

I believe that's the expected behavior. The thing you call polyglot-Java isn't polyglot from the Polyglot Sampler perspective.

Copy link
Member

@jdunkerley jdunkerley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

distribution/lib/Standard/AWS/0.0.0-dev/src/S3/S3.enso Outdated Show resolved Hide resolved
destination file already exists. Defaults to `False`.
move_to : Writable_File -> Boolean -> Nothing ! File_Error
move_to self (destination : Writable_File) (replace_existing : Boolean = False) =
if Context.Output.is_enabled.not then Error.throw (Forbidden_Operation.Error "File moving is forbidden as the Output context is disabled.") else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this one, but I feel like we need a better construct for this. Something like

Suggested change
if Context.Output.is_enabled.not then Error.throw (Forbidden_Operation.Error "File moving is forbidden as the Output context is disabled.") else
Context.Output.when_enabled disabled_message="File moving is forbidden as the Output context is disabled.") <|

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I created a ticket so that we can pick it up whenever we can: #9080

@radeusgd radeusgd added the CI: Ready to merge This PR is eligible for automatic merge label Feb 16, 2024
@mergify mergify bot merged commit 642d5a6 into develop Feb 16, 2024
26 of 27 checks passed
@mergify mergify bot deleted the wip/radeusgd/8833-s3-copying branch February 16, 2024 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

File.copy_to should allow to upload/download files across 'remote' backends
4 participants