New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unterminated string
error during rethinkdb restore
#3859
Comments
@brandon-beacher, have you checked the version of your RethinkDB Python driver with pip freeze? The import/export and dump/restore scripts rely on the Python driver installed. If you're running an older version of the driver (< 1.16.0-2), you should upgrade the driver:
@Tryneus, is there anything else that could be causing this error? |
Upgrading the driver - was at |
I think I see the problem. This didn't actually occur on line '2' - we just iteratively parse JSON rows from the file (to keep track of progress), and it happened on the second line of a parse. The problem itself appears to be that a single row is larger than 16 MB, and the import script fails in that case. This could be solved by increasing the maximum size of the buffer in the script, but that will have some performance implications. I'll look into solving this without killing performance on very large rows. |
Ah, I just remembered one of the reasons we limited the maximum buffer size - some users were getting OOM-killed due to the import script using too much memory. We can't keep arbitrarily reading more data into memory until the parse works - the system will run out of memory (and if the import is running on the same machine as the server, there's a good chance the OOM killer will target the server). So we need some upper limit on the buffer. Otherwise, on bad JSON input, we would keep buffering the file until we reach the EOF. |
This makes sense - since they're JSON representations of emails - the attachments are represented as base64 strings. I bet the document it's failing on just has a large attachment. |
Ok, a fix is up in review 2859. This bumps up the maximum row size to 128 MB and implements a scaling buffer size so it is much faster for larger rows. It may be useful to add a command-line argument for setting this value in the future, but with any luck this should do the job until then. |
Nice! I will test this against our dump here and report back. |
@brandon-beacher Note that the fix isn't released yet. Maybe @Tryneus can give you the branch he has implemented it in, so you could build the python driver (which contains the |
Thanks Daniel - I thought 2859 was a pull request - but realized it would have auto-linked via Github if it were. I've got a workaround for now but will be happy to test if needed. |
Ah, sorry about the confustion @brandon-beacher. The fix has been approved and merged to |
@brandon-beacher Since you have a work-around, would it be enough if we released this together with the next server version? ETA is about two weeks from now. |
Sounds great @danielmewes - also just wanted to point out that you all are awesome! |
The fix was released in version 1.16.0-3 of the Python driver. Please re-open if there is something left to do in this issue. |
Our app - http://gatherhere.com/platform - uses RethinkDB as it's primary data store.
rethinkdb restore
is currently failing if we attempt to restore a dump of our database.The failure occurs for a table containing JSON from Postmark's Inbound email webhook.
I suspect the wide range of characters which occur in email may have landed on something which
rethinkdb restore
does not yet handle?Here is the full error message:
I tried to take a look at the character referenced in the error - but if I open the file in an editor - line 2 does not have that many characters.
Anything I can do to help diagnose this one?
The text was updated successfully, but these errors were encountered: