Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve download reliability with files bigger than 1MB #10

Merged
merged 5 commits into from
Feb 21, 2018

Conversation

askainet
Copy link
Contributor

@askainet askainet commented Feb 17, 2018

Motivation

This is an attempt to add some client-side robustness to the buggy AWS API call DownloadDBLogFilePortion and be able to successfully download RDS log files of any size and content.

I've been struggling to download several RDS Postgresql log files of sizes between a few MBs to a few GBs until I applied the following fixes. I hope this helps other people out there struggling with these issues!

Issues found and proposed solutions

Only 2 portions downloaded for files bigger than 1MB

The maximum size of the downloaded portion for every API call is 1MB. According to both Boto[1] and AWS[2] documentation, Marker parameter must be set to "0" in the first request in order to properly download all the portions of files bigger than 1MB.

When this was not set, the second API call was missing AdditionalDataPending in the response, therefore stopping the download loop after writing only 2MB, no matter how big the log file was.

Last line truncated for portions bigger than 1MB

The default value of NumberOfLines parameter is 10000, which is also the maximum number of lines allowed to download per portion. This adds to the limit of 1MB of data per portion.

When downloading a portion, if the maximum size is reached before fetching 10000 lines, the data in the API response is truncated exactly at byte 1024² (most probably in the middle of a line) and has the string \n [Your log message was truncated]\n appended.

We should search for [Your log message was truncated] at the end of the response to know if data for the current portion is truncated and therefore the last line is broken.

If truncated, we should retry downloading this portion again, starting at the same Mark, but this time setting NumberOfLines equal or less than the number of lines fetched on the truncated response. This is where I found that requesting 10 lines less helps avoiding excessive subsequent truncated responses for the same log file, but that number is pretty arbitrary. The important thing is to retry the same truncated portion without exceeding the fetched lines.

API sometimes lies about AdditionalDataPending

With some log files, I experienced downloads stopping before getting all the data, because the API response did not include AdditionalDataPending anymore. Those same files were successfully downloaded using the web console (well, sometimes even the console downloads fail randomly!)

Then I found that you could keep sending portion download requests, no matter if the API does not explicitly tell you that there is AdditionalDataPending. Just check if the LogFileData in the response is empty (warning: it contains \n) to find if the download is complete.

[1] http://boto3.readthedocs.io/en/latest/reference/services/rds.html#RDS.Client.download_db_log_file_portion
[2] https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_DownloadDBLogFilePortion.html

@@ -126,15 +131,26 @@ def write_log(client, dbinstance_id, filename, logfilename):
raise
with open(filename, "a") as logfile:
if 'LogFileData' in response:
logfile.write(response["LogFileData"])
downloaded_lines = response["LogFileData"].count("\n")
if response["AdditionalDataPending"] and downloaded_lines < max_number_of_lines:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (96 > 79 characters)

downloaded_lines = response["LogFileData"].count("\n")
if response["AdditionalDataPending"] and downloaded_lines < max_number_of_lines:
if downloaded_lines == 0:
raise Exception("Not a single line was downloaded in last portion!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (92 > 79 characters)

raise Exception("Not a single line was downloaded in last portion!")
max_number_of_lines = max(downloaded_lines - 10, 1)
logger.warning(
"Log was truncated, retrying previous portion with NumberOfLines = {0}".format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (103 > 79 characters)

@@ -114,7 +114,8 @@ def get_all_logs(dbinstance_id, output,
def write_log(client, dbinstance_id, filename, logfilename):
response = client.download_db_log_file_portion(
DBInstanceIdentifier=dbinstance_id,
LogFileName=logfilename
LogFileName=logfilename,
Marker="0"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have trouble to see how it can have bad side effect not setting it, but I can't see the harm of being explicit 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out to be required to make downloads bigger than 1MB to work, see PR description.

if response["AdditionalDataPending"] and downloaded_lines < max_number_of_lines:
if downloaded_lines == 0:
raise Exception("Not a single line was downloaded in last portion!")
max_number_of_lines = max(downloaded_lines - 10, 1)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the -10 here, why 10 specifically?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty arbitrary, see PR description.

downloaded_lines = response["LogFileData"].count("\n")
if (response["AdditionalDataPending"] and
downloaded_lines < max_number_of_lines):
if truncated_string in response["LogFileData"][-slice_length:]:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason not testing with == here?

Copy link
Contributor Author

@askainet askainet Feb 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't work as the data ends with \n (can't say if this is always true). One could use the slice expression [-slice_length:-1] or even add .strip("\n") to make the == comparison work, but I thought it looked simpler that way.

@fpietka
Copy link
Owner

fpietka commented Feb 20, 2018

First of all, thank you for your work.

So if I understand correctly, log files above 1MB reproduce issue #5 ?

@askainet
Copy link
Contributor Author

Hey @fpietka I'm sorry, I kept adding stuff to this PR as I found more and more issues with the buggy API call, let me update the description and explain the details about my findings.

@askainet askainet changed the title Add Mark="0" to first download request to work properly with files bigger than 1MB Improve download reliability with files bigger than 1MB Feb 20, 2018
@askainet
Copy link
Contributor Author

askainet commented Feb 20, 2018

First of all, thank you for your work.

@fpietka you're welcome! Please take a look at my explanations, I hope they make sense.

So if I understand correctly, log files above 1MB reproduce issue #5 ?

Hopefully yes! So far, using this fixes I've been able to download several files of different sizes (up to a few GBs) that were miserably failing before. Despite of the improvements, this API is very unstable and still fails from time to time with random errors... Trying again a few minutes after usually works.

Copy link
Owner

@fpietka fpietka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very thorough I must say!

Besides a minor comment it looks good to be released. I'll then amend the README to link this PR.

Thanks

if downloaded_lines == 0:
raise Exception(
"No line was downloaded in last portion!")
max_number_of_lines = max(downloaded_lines - 10, 1)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just create a variable with this arbitrary number at the same level of marker and max_number_of_lines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes perfect sense 👍

@fpietka fpietka merged commit 5a5dbe5 into fpietka:master Feb 21, 2018
@fpietka
Copy link
Owner

fpietka commented Feb 21, 2018

Thanks again for the great work here!
I'll prepare the next release ASAP

@cbandy
Copy link

cbandy commented Feb 23, 2018

I've found that it's most reliable to use DownloadCompleteDBLogFile to fetch the entire file at once. The drawback is that boto3 is very awkward about signing requests.

import botocore.awsrequest
request = botocore.awsrequest.create_request_object({
    'method': 'GET', 'url': client.meta.endpoint_url + '/v13/downloadCompleteLogFile/{DBInstanceIdentifier}/{LogFileName}', 'body': None,
    'headers': {'User-Agent': client.meta.config.user_agent},
    'context': {'client_config': client.meta.config},
})
client.meta.events.emit('request-created.rds', request=request)
response = client._endpoint.http_session.send(request.prepare())

len(response.content)  # 1544237
response.content[:100] # b'2018-02-17 00:03:35 UTC::@:[17944]:LOG:  checkpoint starting: time\n2018-02-17 00:05:24 UTC::@:[17944'

@fpietka
Copy link
Owner

fpietka commented Feb 23, 2018

I think I may have to build a test bench for all those different solutions to see if one is a better fit.

@fpietka
Copy link
Owner

fpietka commented Feb 23, 2018

@cbandy I can't see the awkwardness in signing the request in your example. Can you explain a bit further?

@cbandy
Copy link

cbandy commented Feb 25, 2018

Some event handler signs the request, but it's not obvious from the emit call. Reaching into _endpoint to send the request might not be a stable API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants