Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unterminated string error during rethinkdb restore #3859

Closed
brandon-beacher opened this issue Mar 2, 2015 · 13 comments
Closed

Unterminated string error during rethinkdb restore #3859

brandon-beacher opened this issue Mar 2, 2015 · 13 comments
Assignees
Milestone

Comments

@brandon-beacher
Copy link

Our app - http://gatherhere.com/platform - uses RethinkDB as it's primary data store.

rethinkdb restore is currently failing if we attempt to restore a dump of our database.

The failure occurs for a table containing JSON from Postmark's Inbound email webhook.

I suspect the wide range of characters which occur in email may have landed on something which rethinkdb restore does not yet handle?

Here is the full error message:

rethinkdb restore ~/Downloads/rethinkdb_dump_2015-03-02T11:00:01.tar.gz --force
Unzipping archive file...
  Done (8 seconds)
Importing from directory...
[                                        ]   2% 
412586 rows imported in 26 tables
Unterminated string starting at: line 2 column 15988747 (char 15988748)
In file: /var/folders/n0/fqys4_ns6kl2wl990nhd4nf40000gn/T/tmpImjL5r/gather/inbound_emails.json
Errors occurred during import
Error: rethinkdb-import failed

I tried to take a look at the character referenced in the error - but if I open the file in an editor - line 2 does not have that many characters.

Anything I can do to help diagnose this one?

@mglukhovsky
Copy link
Member

@brandon-beacher, have you checked the version of your RethinkDB Python driver with pip freeze?

The import/export and dump/restore scripts rely on the Python driver installed. If you're running an older version of the driver (< 1.16.0-2), you should upgrade the driver:

sudo pip install -U rethinkdb

@Tryneus, is there anything else that could be causing this error?

@brandon-beacher
Copy link
Author

Upgrading the driver - was at 1.15.0-0 - will report back!

@Tryneus
Copy link
Member

Tryneus commented Mar 2, 2015

I think I see the problem. This didn't actually occur on line '2' - we just iteratively parse JSON rows from the file (to keep track of progress), and it happened on the second line of a parse. The problem itself appears to be that a single row is larger than 16 MB, and the import script fails in that case. This could be solved by increasing the maximum size of the buffer in the script, but that will have some performance implications.

I'll look into solving this without killing performance on very large rows.

@Tryneus
Copy link
Member

Tryneus commented Mar 2, 2015

Ah, I just remembered one of the reasons we limited the maximum buffer size - some users were getting OOM-killed due to the import script using too much memory. We can't keep arbitrarily reading more data into memory until the parse works - the system will run out of memory (and if the import is running on the same machine as the server, there's a good chance the OOM killer will target the server). So we need some upper limit on the buffer. Otherwise, on bad JSON input, we would keep buffering the file until we reach the EOF.

@brandon-beacher
Copy link
Author

This makes sense - since they're JSON representations of emails - the attachments are represented as base64 strings. I bet the document it's failing on just has a large attachment.

@Tryneus
Copy link
Member

Tryneus commented Mar 2, 2015

Ok, a fix is up in review 2859. This bumps up the maximum row size to 128 MB and implements a scaling buffer size so it is much faster for larger rows. It may be useful to add a command-line argument for setting this value in the future, but with any luck this should do the job until then.

@brandon-beacher
Copy link
Author

Nice! I will test this against our dump here and report back.

@danielmewes
Copy link
Member

@brandon-beacher Note that the fix isn't released yet. Maybe @Tryneus can give you the branch he has implemented it in, so you could build the python driver (which contains the rethinkdb-restore script) from source?

@danielmewes danielmewes added this to the 1.16.x milestone Mar 2, 2015
@brandon-beacher
Copy link
Author

Thanks Daniel - I thought 2859 was a pull request - but realized it would have auto-linked via Github if it were. I've got a workaround for now but will be happy to test if needed.

@Tryneus
Copy link
Member

Tryneus commented Mar 2, 2015

Ah, sorry about the confustion @brandon-beacher. The fix has been approved and merged to next in commit 0b08f09, and cherry-picked into v1.16.x in commit adf4e7e. Will be in the next python driver release, which I am terribly unsuccessful at predicting the version numbers of.

@danielmewes
Copy link
Member

@brandon-beacher Since you have a work-around, would it be enough if we released this together with the next server version? ETA is about two weeks from now.
Feel free to let us know if you need the fix earlier. In that case we can push out a new version of the Python driver earlier.

@brandon-beacher
Copy link
Author

Sounds great @danielmewes - also just wanted to point out that you all are awesome!

@AtnNn AtnNn modified the milestones: 2.0, 1.16.x, 1.16.3 Mar 26, 2015
@AtnNn
Copy link
Member

AtnNn commented Mar 26, 2015

The fix was released in version 1.16.0-3 of the Python driver. Please re-open if there is something left to do in this issue.

@AtnNn AtnNn closed this as completed Mar 26, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants