Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INT-674] Spark compatibility in the S3 gateway #8115

Merged
merged 26 commits into from Aug 30, 2022
Merged

Conversation

lukemarsden
Copy link
Contributor

@lukemarsden lukemarsden commented Aug 24, 2022

These are the changes needed to the S3 Gateway to make it work with Spark, and an example of how to use it.

  • Handle paths with a trailing slash as if they didn't have one.
  • Don't error on trying to read the timestamp on new files in open commits in the S3 Gateway. Spark needs to be able to read them.
  • Don't fail GetObject for paths that are simultaneously an imaginary directory and a real file. Instead, read the file. This requires dancing around the fact that InspectFile doesn't work correctly in this case, but ListFile[0] works just fine.

This is important for a customer, so it would be great if we could get it into 2.3.2 please :-)

Master PR - #8129

@codecov
Copy link

codecov bot commented Aug 24, 2022

Codecov Report

Merging #8115 (7488d93) into 2.3.x (77f598a) will decrease coverage by 0.00%.
The diff coverage is 72.22%.

@@            Coverage Diff             @@
##            2.3.x    #8115      +/-   ##
==========================================
- Coverage   22.24%   22.23%   -0.01%     
==========================================
  Files         415      415              
  Lines      112184   112189       +5     
==========================================
- Hits        24955    24949       -6     
- Misses      82940    82942       +2     
- Partials     4289     4298       +9     
Impacted Files Coverage Δ
src/server/pfs/s3/bucket.go 45.73% <66.66%> (ø)
src/server/pfs/s3/object.go 43.08% <73.33%> (+4.95%) ⬆️
src/server/pfs/s3/error.go 21.42% <0.00%> (-14.29%) ⬇️
src/server/pfs/pfs.go 44.36% <0.00%> (-3.53%) ⬇️
src/internal/collection/postgres_listener.go 76.75% <0.00%> (-2.64%) ⬇️
src/server/pfs/server/compaction.go 69.06% <0.00%> (-1.66%) ⬇️
src/internal/task/etcd_service.go 81.49% <0.00%> (-1.58%) ⬇️
src/internal/storage/fileset/postgres_store.go 82.65% <0.00%> (-1.03%) ⬇️
src/internal/collection/postgres_collection.go 76.92% <0.00%> (-0.84%) ⬇️
... and 16 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@lukemarsden lukemarsden marked this pull request as ready for review August 26, 2022 06:00
Copy link
Contributor

@albscui albscui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all LGTM! Just a couple comments about not losing err values in debug message. My comments about the ListFile vs InspectFile is mainly for my own understanding.

src/server/pfs/s3/bucket.go Outdated Show resolved Hide resolved
src/server/pfs/s3/object.go Outdated Show resolved Hide resolved
src/server/pfs/s3/object.go Show resolved Hide resolved
src/server/pfs/s3/object.go Show resolved Hide resolved
src/server/pfs/s3/object.go Outdated Show resolved Hide resolved
@albscui
Copy link
Contributor

albscui commented Aug 29, 2022

Also, is the plan to merge this into master later?

@lukemarsden
Copy link
Contributor Author

Also, is the plan to merge this into master later?

I'll make a PR into master now.

@lukemarsden
Copy link
Contributor Author

Here's the master PR: #8129

@lukemarsden lukemarsden merged commit 3e9b4c2 into 2.3.x Aug 30, 2022
@lukemarsden lukemarsden deleted the luke-spark-s3g-fresh branch August 30, 2022 14:08
@msteffen msteffen changed the title Spark compatibility in the S3 gateway [INT-674] Spark compatibility in the S3 gateway Sep 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants