Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3: Implement restore object with select job type #7504

Conversation

tsugumi-sys
Copy link
Contributor

  • Implement select job for restore object request.
  • Adding validation for request parameters.
  • Adding unittest for the parameter validation and select job.

boto3 documantation: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/restore_object.html

I'm not sure we should use put_object or deep copy when create an object to restore. What do you think @?

@tsugumi-sys tsugumi-sys changed the title impl restore object for select operation S3: Implement restore object with select job type Mar 21, 2024
Copy link

codecov bot commented Mar 22, 2024

Codecov Report

Attention: Patch coverage is 96.42857% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 94.06%. Comparing base (9aef694) to head (ad15bca).
Report is 133 commits behind head on master.

Files Patch % Lines
moto/s3/models.py 95.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7504      +/-   ##
==========================================
- Coverage   95.88%   94.06%   -1.82%     
==========================================
  Files         843     1070     +227     
  Lines       82578    91829    +9251     
==========================================
+ Hits        79178    86381    +7203     
- Misses       3400     5448    +2048     
Flag Coverage Δ
servertests 29.57% <21.42%> (-2.95%) ⬇️
unittests 94.03% <96.42%> (-1.82%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@bblommers bblommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsugumi-sys I think using put_object makes most sense here, as opposed to deepcopy.

Do you know what the default filename is that AWS uses? If I read the docs correctly, it sounds like they create a new file with some sort of standardized name, maybe something like {optional_prefix_}output_of_restore_request_{uuid}. But it would be nice to validate that.

I'd be happy to test this in AWS as well at some point and try it out, just to validate the theory - but that may take me some time.

restored_key.set_storage_class(_storage)

# set value
contents = self.select_object_content(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The select_object_content should be done first, or at least after the validation. This operation can fail because of an invalid query, so we should only try to create new S3 objects after we know that succeeds

_bucket_name = output_location_s3.get("BucketName")
if _bucket_name is not None:
# check if bucket is exists here.
bucket = self.get_bucket(_bucket_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check should be done as early as possible as well

@tsugumi-sys
Copy link
Contributor Author

tsugumi-sys commented Mar 23, 2024

@bblommers

I'd be happy to test this in AWS as well at some point and try it out, just to validate the theory - but that may take me some time.

Hummm, I cannot execute a request for restore object with select job type :(

First, I putted the following json file into "Glacier Deep Archive" storage class.
(s3://mybucket/sample.json)

{  
    "employee": {  
        "name":       "sonoo",   
        "salary":      56000,   
        "married":    true  
    }  
}  

Then, I execute the following script;

import boto3

client = boto3.client("s3")
bucket_name = "mybacket"
filekey = "sample.json"

resp = client.restore_object(
    Bucket=bucket_name,
    Key=filekey,
    RestoreRequest={
        "Type": "SELECT",
        "SelectParameters": {
            "ExpressionType": "SQL",
            "Expression": "SELECT * FROM S3Object",
            "InputSerialization": {
                "JSON": {"Type": "Document"},
            },
            "OutputSerialization": {"JSON": {}},
        },
        "OutputLocation": {
            "S3": {
                "BucketName": bucket_name,
                "Prefix": "restored.json",
                "StorageClass": "STANDARD",
            }
        },
    },
)
print(resp)

Now I got MalformedXML error;

botocore.exceptions.ClientError: An error occurred (MalformedXML) when calling the RestoreObject operation: The XML you provided was not well-formed or did not validate against our published schema

I couldn't figure out why this causes error....

memo:
The select_object_content request works (You need to set the json file to STANDARD storage class).

resp = client.select_object_content(
    Bucket=bucket_name,
    Key=filekey,
    ExpressionType="SQL",
    Expression="SELECT * FROM S3Object",
    InputSerialization={"JSON": {"Type": "Document"}},
    OutputSerialization={"JSON": {}},
)
print(resp)

@bblommers
Copy link
Collaborator

@tsugumi-sys I've contacted AWS support to figure out why it's throwing the MalformedXML-exception.

TBC

@bblommers
Copy link
Collaborator

@tsugumi-sys AWS confirmed this is currently broken, but do not have a timeline to fix this. Reading between the lines, it sounds like they don't intend to fix this at all. So I don't think we should add this to Moto either.

It's unfortunate, because it is a really cool feature!

@bblommers bblommers closed this Apr 20, 2024
@tsugumi-sys
Copy link
Contributor Author

tsugumi-sys commented Apr 21, 2024

@bblommers
Ah, I see. Thank you for letting me know :)

It's unfortunate, because it is a really cool feature!

Thats' true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants