Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to easily mock Athena result CSV #7702

Open
dmarra opened this issue May 17, 2024 · 3 comments
Open

Unable to easily mock Athena result CSV #7702

dmarra opened this issue May 17, 2024 · 3 comments

Comments

@dmarra
Copy link

dmarra commented May 17, 2024

We are able to mock Athena query results via:

   resp = requests.post(
        "http://motoapi.amazonaws.com/moto-api/static/athena/query-results",
        json=query_results,
    )
    assert resp.status_code == 201

But there is no way to mock the output of get_execution_results. this poses a few problems:

  1. unable to have any control over request status (in cases where you want to test that retries are happening for instance)
  2. unable to have any control over the output location

I am more interested in the latter. This would be a lot more useful if we could somehow assign results via a CSV, and have the query-results mocked from that. But in the simplest case, being able to control that datapoint to point to a file we have put to a mock bucket
would go a long way in the case where the code under test needs to access the file directly.

Here is an example of how I had to hack this into my tests:

from moto.athena import athena_backends
from moto.athena.models import Execution
from moto.core import DEFAULT_ACCOUNT_ID

@pytest.fixture
@mock_aws
def mock_athena_results():
    # place the mocked athena results in "s3"
    s3 = boto3.client("s3", region_name=os.getenv("AWS_DEFAULT_REGION"))
    bucket_name = 'test-bucket'
    key = f"query_results/output.csv"
  
    with open('some/csv/file.csv', "r") as f:
        contents = f.read()
        s3.put_object(Bucket=bucket_name, Key=key, Body=contents)

    # Hack moto to mock the execution results to have the correct path to our csv results
    exec = Execution(
        query="SELECT * FROM notused",
        config={
            "OutputLocation": f"s3://test-bucket/query_results/output.csv"
        },
        context={
            'Database': 'default'
        },
        workgroup={},
        execution_parameters={}
    )
    # moto tries to mock the file location to _look_ correct, but we need it to be an actual result
    exec.config['OutputLocation'] = f"s3://test-bucket/query_results/output.csv"

    # This is a hack to get moto to return the correct execution object by having a factory that
    # pretends to be a list of executions
    class MockExecutionFactory:
        def __get__(self, instance, owner):
            return [
                exec
            ]

        def __set__(self, instance, value):
            pass

        def __setitem__(self, key, value):
            pass

        def __getitem__(self, key):
            return exec

    athena_backends[DEFAULT_ACCOUNT_ID]["us-east-1"].executions = MockExecutionFactory()
    return

The above works, but its pretty janky. I suppose this is more of a feature request than anything. Or even better yet, having a way to control execution id (not with seeding, thats still too unpredictable and fragile) such that it can be explicitly set.

@bblommers
Copy link
Collaborator

Hi @dmarra, thanks for raising this.

My assumption is that the application takes whatever OutputLocation is given, downloads the result from S3 and does whatever it needs to do with the result.

Would it work if Moto would actually write the configured query results to S3? Then the actual OutputLocation doesn't matter, and the only test setup is to configure the expected results via the static/athena/query-results endpoint.

@dmarra
Copy link
Author

dmarra commented May 20, 2024

That could work! The only issue with that is that the Athena output JSON is.... perhaps a bit overengineered. It's a pain to mock many rows of data in that format. But that being said, if this is the easiest way to get the enhancement in, then I think its way better than nothing. Makes sense too, since it works with the current design.

@dmarra
Copy link
Author

dmarra commented May 20, 2024

To clarify:

  1. It would have to respect the bucket given as output (but the file name could be the auto-generated execution ID)
  2. The output format is CSV. I don't think Athena has a way to change that, so it should be fine to just always have the output in CSV format in the bucket

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants