New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inserting non JSON serializable object corrupts database #89

Closed
danlamanna opened this Issue Jan 9, 2016 · 3 comments

Comments

3 participants
@danlamanna

danlamanna commented Jan 9, 2016

I would consider this a pretty severe issue as it corrupts the database on unexpected user input. I would instead expect this to act as defensive as possible.

Example code:

from tinydb import TinyDB

db = TinyDB('some-db.json')

db.insert({'foo': 'bar'})
db.insert({'bar': set([1])})

Results in the following in some-db.json:
{"_default": {"1": {"foo": "bar"}, "2": {"bar":

This is the traceback:

Traceback (most recent call last):
  File "test-tinydb.py", line 6, in <module>
    db.insert({'bar': set([1])})
  File "/usr/lib/python2.7/site-packages/tinydb/database.py", line 299, in insert
    self._write(data)
  File "/usr/lib/python2.7/site-packages/tinydb/database.py", line 266, in _write
    self._storage.write(values)
  File "/usr/lib/python2.7/site-packages/tinydb/database.py", line 45, in write
    self._storage.write(data)
  File "/usr/lib/python2.7/site-packages/tinydb/storages.py", line 105, in write
    json.dump(data, self._handle, **self.kwargs)
  File "/usr/lib64/python2.7/json/__init__.py", line 189, in dump
    for chunk in iterable:
  File "/usr/lib64/python2.7/json/encoder.py", line 434, in _iterencode
    for chunk in _iterencode_dict(o, _current_indent_level):
  File "/usr/lib64/python2.7/json/encoder.py", line 408, in _iterencode_dict
    for chunk in chunks:
  File "/usr/lib64/python2.7/json/encoder.py", line 408, in _iterencode_dict
    for chunk in chunks:
  File "/usr/lib64/python2.7/json/encoder.py", line 408, in _iterencode_dict
    for chunk in chunks:
  File "/usr/lib64/python2.7/json/encoder.py", line 442, in _iterencode
    o = _default(o)
  File "/usr/lib64/python2.7/json/encoder.py", line 184, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: set([1]) is not JSON serializable

This was on a linux box running python 2.7.10 and tinydb 3.1.0.

@msiemens msiemens closed this in 7ad6793 Jan 9, 2016

@msiemens

This comment has been minimized.

Show comment
Hide comment
@msiemens

msiemens Jan 9, 2016

Owner

Thanks for reporting this problem.

As far as I can see, the main problem was, that serialization and file access were interleaved (due to using json.dump(obj, fh)). If this failed with an exception, the file was left in a broken state. Now I the JSON serialization is separate from the file system access. This imposes a slightly larger memory consumption as the serialized string has to be stored in memory instead of writing it to the file handle on the go.

Owner

msiemens commented Jan 9, 2016

Thanks for reporting this problem.

As far as I can see, the main problem was, that serialization and file access were interleaved (due to using json.dump(obj, fh)). If this failed with an exception, the file was left in a broken state. Now I the JSON serialization is separate from the file system access. This imposes a slightly larger memory consumption as the serialized string has to be stored in memory instead of writing it to the file handle on the go.

@eugene-eeo

This comment has been minimized.

Show comment
Hide comment
@eugene-eeo

eugene-eeo Jan 10, 2016

Contributor

@msiemens another solution is atomic file writes- write to another temporary file then rename that file to the original file. This is atomic because the move operation in windows and *nix is guaranteed to be atomic. Example:

import os
import json
from tempfile import mkstemp

fd, path = mkstemp()
with os.fdopen(fd, 'w') as fp:
    json.dump(data, fp)
os.rename(path, real_database_filename)

See:

Contributor

eugene-eeo commented Jan 10, 2016

@msiemens another solution is atomic file writes- write to another temporary file then rename that file to the original file. This is atomic because the move operation in windows and *nix is guaranteed to be atomic. Example:

import os
import json
from tempfile import mkstemp

fd, path = mkstemp()
with os.fdopen(fd, 'w') as fp:
    json.dump(data, fp)
os.rename(path, real_database_filename)

See:

@msiemens

This comment has been minimized.

Show comment
Hide comment
@msiemens

msiemens Jan 11, 2016

Owner

@eugene-eeo That's certainly a possible solution. But I think it complicates the matter more than needed (here's an insightful article on atomic writes from MSDN: http://blogs.msdn.com/b/adioltean/archive/2005/12/28/507866.aspx). I'll keep this solution in mind, in case I might need it somewhere else.

Owner

msiemens commented Jan 11, 2016

@eugene-eeo That's certainly a possible solution. But I think it complicates the matter more than needed (here's an insightful article on atomic writes from MSDN: http://blogs.msdn.com/b/adioltean/archive/2005/12/28/507866.aspx). I'll keep this solution in mind, in case I might need it somewhere else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment