Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make S3 support optional #1252

Merged
merged 4 commits into from Apr 27, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/install.rst
Expand Up @@ -109,6 +109,11 @@ To find the JDK, Rally expects the environment variable ``JAVA_HOME`` to be set

If you have Rally download, install and benchmark a local copy of Elasticsearch (i.e., the `default Rally behavior <http://esrally.readthedocs.io/en/stable/quickstart.html#run-your-first-race>`_) be sure to configure the Operating System (OS) of your Rally server with the `recommended kernel settings <https://www.elastic.co/guide/en/elasticsearch/reference/master/system-config.html>`_

Optional dependencies
---------------------

S3 support is optional and can be installed using the ``s3`` extra. If you need S3 support, install ``esrally[s3]`` instead of just ``esrally``, but other than that follow the instructions below.

Installing Rally
----------------

Expand Down
7 changes: 6 additions & 1 deletion docs/track.rst
Expand Up @@ -298,7 +298,12 @@ The ``corpora`` section contains all document corpora that are used by this trac

Each entry in the ``documents`` list consists of the following properties:

* ``base-url`` (optional): A http(s), S3 or Google Storage URL that points to the root path where Rally can obtain the corresponding source file. Rally can also download data from private S3 or Google Storage buckets if access is properly configured:
* ``base-url`` (optional): A http(s), S3 or Google Storage URL that points to the root path where Rally can obtain the corresponding source file.

* S3 support is optional and can be installed with ``python -m pip install esrally[s3]``.
* http(s) and Google Storage are supported by default.

Rally can also download data from private S3 or Google Storage buckets if access is properly configured:

* S3 according to `docs <https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html#configuration>`_.
* Google Storage: Either using `client library authentication <https://cloud.google.com/storage/docs/reference/libraries#setting_up_authentication>`_ or by presenting an `oauth2 token <https://cloud.google.com/storage/docs/authentication>`_ via the ``GOOGLE_AUTH_TOKEN`` environment variable, typically done using: ``export GOOGLE_AUTH_TOKEN=$(gcloud auth print-access-token)``.
Expand Down
16 changes: 14 additions & 2 deletions esrally/utils/net.py
Expand Up @@ -70,10 +70,22 @@ def finish(self):
self.p.finish()


def _fake_import_boto3():
# This function only exists to be mocked in tests to raise an ImportError, in
# order to simulate the absence of boto3
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite ugly, but the alternative is to use del sys.modules["boto3"], which is an ugly and global hack, and likely to break things. While we use this hack succesfully in urllib3 to test support for Python versions without ssl, this:

So I would suggest avoiding touching sys.modules.

pass


def _download_from_s3_bucket(bucket_name, bucket_path, local_path, expected_size_in_bytes=None, progress_indicator=None):
# pylint: disable=import-outside-toplevel
# lazily initialize S3 support - we might not need it
import boto3.s3.transfer
# lazily initialize S3 support - it might not be available
try:
_fake_import_boto3()
import boto3.s3.transfer
except ImportError:
console.error("S3 support is optional. Install it with `python -m pip install esrally[s3]`")
raise


class S3ProgressAdapter:
def __init__(self, size, progress):
Expand Down
18 changes: 11 additions & 7 deletions setup.py
Expand Up @@ -71,12 +71,6 @@ def str_from_file(name):
# License: MPL 2.0
"certifi",
# License: Apache 2.0
# transitive dependencies:
# botocore: Apache 2.0
# jmespath: MIT
# s3transfer: Apache 2.0
"boto3==1.10.32",
# License: Apache 2.0
"yappi==1.2.3",
# License: BSD
"ijson==2.6.1",
Expand All @@ -88,6 +82,15 @@ def str_from_file(name):
"google-auth==1.22.1"
]

s3_require = [
# License: Apache 2.0
# transitive dependencies:
# botocore: Apache 2.0
# jmespath: MIT
# s3transfer: Apache 2.0
"boto3==1.10.32",
]

tests_require = [
"ujson",
"pytest==5.4.0",
Expand Down Expand Up @@ -147,7 +150,8 @@ def str_from_file(name):
test_suite="tests",
tests_require=tests_require,
extras_require={
"develop": tests_require + develop_require
"develop": tests_require + develop_require + s3_require,
"s3": s3_require
},
entry_points={
"console_scripts": [
Expand Down
10 changes: 10 additions & 0 deletions tests/utils/net_test.py
Expand Up @@ -36,6 +36,16 @@ def test_download_from_s3_bucket(self, download, seed):
download.assert_called_once_with("mybucket.elasticsearch.org", "data/documents.json.bz2",
"/tmp/documents.json.bz2", expected_size, progress_indicator)

@mock.patch("esrally.utils.console.error")
@mock.patch("esrally.utils.net._fake_import_boto3")
def test_missing_boto3(self, import_boto3, console_error):
import_boto3.side_effect = ImportError("no module named 'boto3'")
with pytest.raises(ImportError, match="no module named 'boto3'"):
net.download_from_bucket("s3", "s3://mybucket/data", "/tmp/data", None, None)
console_error.assert_called_once_with(
"S3 support is optional. Install it with `python -m pip install esrally[s3]`"
)

@pytest.mark.parametrize("seed", range(1))
@mock.patch("esrally.utils.net._download_from_gcs_bucket")
def test_download_from_gs_bucket(self, download, seed):
Expand Down