Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove access to public s3 buckets #6830

Closed
leshikus opened this issue Dec 19, 2023 · 7 comments · Fixed by #6829 or #6863
Closed

Remove access to public s3 buckets #6830

leshikus opened this issue Dec 19, 2023 · 7 comments · Fixed by #6829 or #6863
Labels
new feature/request 💬 Requests and pull requests for new features

Comments

@leshikus
Copy link
Contributor

Is your feature request related to a problem? Please describe.
This is a compliance issue which comes from a principle of least privilege.

@leshikus leshikus added new feature/request 💬 Requests and pull requests for new features Triage 🩹 Issues that need triage labels Dec 19, 2023
leshikus added a commit to leshikus/modin that referenced this issue Dec 19, 2023
@leshikus
Copy link
Contributor Author

Thinking of PR for test_io.py I've come to the following question.

Is it important for test_read_parquet_s3 that we use 1) s3, 2) generic file system? Or can we just use https URL as read_csv argument? @YarShev @anmyachev

leshikus added a commit to leshikus/modin that referenced this issue Dec 20, 2023
Signed-off-by: Alexei Fedotov <alexei.fedotov@gmail.com>
anmyachev pushed a commit that referenced this issue Dec 20, 2023
Signed-off-by: Alexei Fedotov <alexei.fedotov@gmail.com>
@leshikus
Copy link
Contributor Author

This feature requires an additional PR for test_io.py and test_io_exp.py - reopening

@YarShev YarShev reopened this Dec 20, 2023
@anmyachev anmyachev removed the Triage 🩹 Issues that need triage label Dec 20, 2023
@anmyachev
Copy link
Collaborator

Thinking of PR for test_io.py I've come to the following question.

Is it important for test_read_parquet_s3 that we use 1) s3, 2) generic file system? Or can we just use https URL as read_csv argument? @YarShev @anmyachev

I think we should keep using s3.
However, let's try to transfer the files to the Modin repository, if they are small, then put them on the local S3 server and read from it.

For example: test_to_parquet_s3, which uses s3_resource and s3_storage_options fixtures.

@leshikus
Copy link
Contributor Author

My take on this.

  1. While having tests for multiple protocols is potentially a good thing.

  2. I think this is out of scope of this particular feature request.

This test does not test how modin implementation handles s3 protocol. It copies file using s3fs (which is not a part of modin). Modin implementation then works with a file object.

            fs = s3fs.S3FileSystem(anon=True)
            with fs.open(dataset_url, "rb") as file_obj:
                eval_io("read_parquet", path=file_obj, engine=engine)

I cannot say if other tests do not test s3 protocol, needs further investigation.

  1. Adding an s3 server will impact CI complexity and stability. I don't think the decision like this should be taken lightly.

  2. Moving some test csv files to modin repo is a good thing due to the same reason. Less servers involved in testing means better CI stability.

@anmyachev
Copy link
Collaborator

Adding an s3 server will impact CI complexity and stability. I don't think the decision like this should be taken lightly.

We are already setting up the s3 server in Modin's tests:

def s3_base(worker_id, monkeysession):

@anmyachev
Copy link
Collaborator

I think this is out of scope of this particular feature request.

Agree

@anmyachev
Copy link
Collaborator

This test does not test how modin implementation handles s3 protocol. It copies file using s3fs (which is not a part of modin). Modin implementation then works with a file object.

This is actually an implementation of how Modin handles the s3 protocol - using s3fs library. We need to be sure that s3fs is used correctly.

anmyachev added a commit to anmyachev/modin that referenced this issue Jan 17, 2024
…kets

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
anmyachev added a commit to anmyachev/modin that referenced this issue Jan 17, 2024
…kets

Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
YarShev pushed a commit that referenced this issue Jan 18, 2024
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
anmyachev added a commit to anmyachev/modin that referenced this issue Jan 19, 2024
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
YarShev pushed a commit that referenced this issue Jan 19, 2024
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
anmyachev pushed a commit that referenced this issue Jan 19, 2024
Signed-off-by: Alexei Fedotov <alexei.fedotov@gmail.com>
anmyachev added a commit that referenced this issue Jan 19, 2024
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
anmyachev added a commit that referenced this issue Jan 19, 2024
Signed-off-by: Anatoly Myachev <anatoly.myachev@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature/request 💬 Requests and pull requests for new features
Projects
None yet
3 participants