Improve download reliability with files bigger than 1MB #10

askainet · 2018-02-17T17:40:36Z

Motivation

This is an attempt to add some client-side robustness to the buggy AWS API call DownloadDBLogFilePortion and be able to successfully download RDS log files of any size and content.

I've been struggling to download several RDS Postgresql log files of sizes between a few MBs to a few GBs until I applied the following fixes. I hope this helps other people out there struggling with these issues!

Issues found and proposed solutions

Only 2 portions downloaded for files bigger than 1MB

The maximum size of the downloaded portion for every API call is 1MB. According to both Boto[1] and AWS[2] documentation, Marker parameter must be set to "0" in the first request in order to properly download all the portions of files bigger than 1MB.

When this was not set, the second API call was missing AdditionalDataPending in the response, therefore stopping the download loop after writing only 2MB, no matter how big the log file was.

Last line truncated for portions bigger than 1MB

The default value of NumberOfLines parameter is 10000, which is also the maximum number of lines allowed to download per portion. This adds to the limit of 1MB of data per portion.

When downloading a portion, if the maximum size is reached before fetching 10000 lines, the data in the API response is truncated exactly at byte 1024² (most probably in the middle of a line) and has the string \n [Your log message was truncated]\n appended.

We should search for [Your log message was truncated] at the end of the response to know if data for the current portion is truncated and therefore the last line is broken.

If truncated, we should retry downloading this portion again, starting at the same Mark, but this time setting NumberOfLines equal or less than the number of lines fetched on the truncated response. This is where I found that requesting 10 lines less helps avoiding excessive subsequent truncated responses for the same log file, but that number is pretty arbitrary. The important thing is to retry the same truncated portion without exceeding the fetched lines.

API sometimes lies about `AdditionalDataPending`

With some log files, I experienced downloads stopping before getting all the data, because the API response did not include AdditionalDataPending anymore. Those same files were successfully downloaded using the web console (well, sometimes even the console downloads fail randomly!)

Then I found that you could keep sending portion download requests, no matter if the API does not explicitly tell you that there is AdditionalDataPending. Just check if the LogFileData in the response is empty (warning: it contains \n) to find if the download is complete.

[1] http://boto3.readthedocs.io/en/latest/reference/services/rds.html#RDS.Client.download_db_log_file_portion
[2] https://docs.aws.amazon.com/AmazonRDS/latest/APIReference/API_DownloadDBLogFilePortion.html

…gger than 1MB

stickler-ci · 2018-02-19T13:33:42Z

package/rdspgbadger.py

@@ -126,15 +131,26 @@ def write_log(client, dbinstance_id, filename, logfilename):
                    raise
        with open(filename, "a") as logfile:
            if 'LogFileData' in response:
-                logfile.write(response["LogFileData"])
+                downloaded_lines = response["LogFileData"].count("\n")
+                if response["AdditionalDataPending"] and downloaded_lines < max_number_of_lines:


E501 line too long (96 > 79 characters)

stickler-ci · 2018-02-19T13:33:43Z

package/rdspgbadger.py

+                downloaded_lines = response["LogFileData"].count("\n")
+                if response["AdditionalDataPending"] and downloaded_lines < max_number_of_lines:
+                    if downloaded_lines == 0:
+                        raise Exception("Not a single line was downloaded in last portion!")


E501 line too long (92 > 79 characters)

stickler-ci · 2018-02-19T13:33:43Z

package/rdspgbadger.py

+                        raise Exception("Not a single line was downloaded in last portion!")
+                    max_number_of_lines = max(downloaded_lines - 10, 1)
+                    logger.warning(
+                        "Log was truncated, retrying previous portion with NumberOfLines = {0}".format(


E501 line too long (103 > 79 characters)

fpietka · 2018-02-20T18:17:42Z

package/rdspgbadger.py

@@ -114,7 +114,8 @@ def get_all_logs(dbinstance_id, output,
 def write_log(client, dbinstance_id, filename, logfilename):
    response = client.download_db_log_file_portion(
        DBInstanceIdentifier=dbinstance_id,
-        LogFileName=logfilename
+        LogFileName=logfilename,
+        Marker="0"


I have trouble to see how it can have bad side effect not setting it, but I can't see the harm of being explicit 👍

It turns out to be required to make downloads bigger than 1MB to work, see PR description.

fpietka · 2018-02-20T18:21:28Z

package/rdspgbadger.py

+                if response["AdditionalDataPending"] and downloaded_lines < max_number_of_lines:
+                    if downloaded_lines == 0:
+                        raise Exception("Not a single line was downloaded in last portion!")
+                    max_number_of_lines = max(downloaded_lines - 10, 1)


I don't understand the -10 here, why 10 specifically?

Pretty arbitrary, see PR description.

fpietka · 2018-02-20T18:29:26Z

package/rdspgbadger.py

-                downloaded_lines = response["LogFileData"].count("\n")
-                if (response["AdditionalDataPending"] and
-                        downloaded_lines < max_number_of_lines):
+                if truncated_string in response["LogFileData"][-slice_length:]:


Any particular reason not testing with == here?

It won't work as the data ends with \n (can't say if this is always true). One could use the slice expression [-slice_length:-1] or even add .strip("\n") to make the == comparison work, but I thought it looked simpler that way.

fpietka · 2018-02-20T18:32:38Z

First of all, thank you for your work.

So if I understand correctly, log files above 1MB reproduce issue #5 ?

askainet · 2018-02-20T19:05:26Z

Hey @fpietka I'm sorry, I kept adding stuff to this PR as I found more and more issues with the buggy API call, let me update the description and explain the details about my findings.

askainet · 2018-02-20T20:18:40Z

First of all, thank you for your work.

@fpietka you're welcome! Please take a look at my explanations, I hope they make sense.

So if I understand correctly, log files above 1MB reproduce issue #5 ?

Hopefully yes! So far, using this fixes I've been able to download several files of different sizes (up to a few GBs) that were miserably failing before. Despite of the improvements, this API is very unstable and still fails from time to time with random errors... Trying again a few minutes after usually works.

fpietka

Very thorough I must say!

Besides a minor comment it looks good to be released. I'll then amend the README to link this PR.

Thanks

fpietka · 2018-02-21T08:20:40Z

package/rdspgbadger.py

+                    if downloaded_lines == 0:
+                        raise Exception(
+                            "No line was downloaded in last portion!")
+                    max_number_of_lines = max(downloaded_lines - 10, 1)


I would just create a variable with this arbitrary number at the same level of marker and max_number_of_lines

Makes perfect sense 👍

fpietka · 2018-02-21T20:00:56Z

Thanks again for the great work here!
I'll prepare the next release ASAP

cbandy · 2018-02-23T05:42:58Z

I've found that it's most reliable to use DownloadCompleteDBLogFile to fetch the entire file at once. The drawback is that boto3 is very awkward about signing requests.

import botocore.awsrequest
request = botocore.awsrequest.create_request_object({
    'method': 'GET', 'url': client.meta.endpoint_url + '/v13/downloadCompleteLogFile/{DBInstanceIdentifier}/{LogFileName}', 'body': None,
    'headers': {'User-Agent': client.meta.config.user_agent},
    'context': {'client_config': client.meta.config},
})
client.meta.events.emit('request-created.rds', request=request)
response = client._endpoint.http_session.send(request.prepare())

len(response.content)  # 1544237
response.content[:100] # b'2018-02-17 00:03:35 UTC::@:[17944]:LOG:  checkpoint starting: time\n2018-02-17 00:05:24 UTC::@:[17944'

fpietka · 2018-02-23T13:57:45Z

I think I may have to build a test bench for all those different solutions to see if one is a better fit.

fpietka · 2018-02-23T14:31:12Z

@cbandy I can't see the awkwardness in signing the request in your example. Can you explain a bit further?

cbandy · 2018-02-25T20:38:12Z

Some event handler signs the request, but it's not obvious from the emit call. Reaching into _endpoint to send the request might not be a stable API.

askainet added 2 commits February 17, 2018 18:32

Add Mark="0" to first download request to work properly with files bi…

a0a0453

…gger than 1MB

Implement a retrying strategy when portions are truncated

f89341b

stickler-ci reviewed Feb 19, 2018

View reviewed changes

askainet added 2 commits February 19, 2018 14:39

Fix long lines

8eeedbb

More robust detection of end of download and truncated portion

d626175

fpietka reviewed Feb 20, 2018

View reviewed changes

askainet changed the title ~~Add Mark="0" to first download request to work properly with files bigger than 1MB~~ Improve download reliability with files bigger than 1MB Feb 20, 2018

fpietka reviewed Feb 21, 2018

View reviewed changes

Define the number of lines to subtract when retrying as a variable

71166a5

fpietka merged commit 5a5dbe5 into fpietka:master Feb 21, 2018

cbandy mentioned this pull request Feb 23, 2018

downloaded large log file fail #5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve download reliability with files bigger than 1MB #10

Improve download reliability with files bigger than 1MB #10

askainet commented Feb 17, 2018 •

edited

Loading

stickler-ci Feb 19, 2018

stickler-ci Feb 19, 2018

stickler-ci Feb 19, 2018

fpietka Feb 20, 2018

askainet Feb 20, 2018

fpietka Feb 20, 2018

askainet Feb 20, 2018

fpietka Feb 20, 2018

askainet Feb 20, 2018 •

edited

Loading

fpietka commented Feb 20, 2018

askainet commented Feb 20, 2018

askainet commented Feb 20, 2018 •

edited

Loading

fpietka left a comment

fpietka Feb 21, 2018

askainet Feb 21, 2018

fpietka commented Feb 21, 2018

cbandy commented Feb 23, 2018

fpietka commented Feb 23, 2018

fpietka commented Feb 23, 2018

cbandy commented Feb 25, 2018 •

edited

Loading

Improve download reliability with files bigger than 1MB #10

Improve download reliability with files bigger than 1MB #10

Conversation

askainet commented Feb 17, 2018 • edited Loading

Motivation

Issues found and proposed solutions

Only 2 portions downloaded for files bigger than 1MB

Last line truncated for portions bigger than 1MB

API sometimes lies about AdditionalDataPending

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

askainet Feb 20, 2018 • edited Loading

Choose a reason for hiding this comment

fpietka commented Feb 20, 2018

askainet commented Feb 20, 2018

askainet commented Feb 20, 2018 • edited Loading

fpietka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fpietka commented Feb 21, 2018

cbandy commented Feb 23, 2018

fpietka commented Feb 23, 2018

fpietka commented Feb 23, 2018

cbandy commented Feb 25, 2018 • edited Loading

askainet commented Feb 17, 2018 •

edited

Loading

API sometimes lies about `AdditionalDataPending`

askainet Feb 20, 2018 •

edited

Loading

askainet commented Feb 20, 2018 •

edited

Loading

cbandy commented Feb 25, 2018 •

edited

Loading