Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inserting non JSON serializable object corrupts database #89

Closed
danlamanna opened this issue Jan 9, 2016 · 3 comments
Closed

Inserting non JSON serializable object corrupts database #89

danlamanna opened this issue Jan 9, 2016 · 3 comments

Comments

@danlamanna
Copy link

I would consider this a pretty severe issue as it corrupts the database on unexpected user input. I would instead expect this to act as defensive as possible.

Example code:

from tinydb import TinyDB

db = TinyDB('some-db.json')

db.insert({'foo': 'bar'})
db.insert({'bar': set([1])})

Results in the following in some-db.json:
{"_default": {"1": {"foo": "bar"}, "2": {"bar":

This is the traceback:

Traceback (most recent call last):
  File "test-tinydb.py", line 6, in <module>
    db.insert({'bar': set([1])})
  File "/usr/lib/python2.7/site-packages/tinydb/database.py", line 299, in insert
    self._write(data)
  File "/usr/lib/python2.7/site-packages/tinydb/database.py", line 266, in _write
    self._storage.write(values)
  File "/usr/lib/python2.7/site-packages/tinydb/database.py", line 45, in write
    self._storage.write(data)
  File "/usr/lib/python2.7/site-packages/tinydb/storages.py", line 105, in write
    json.dump(data, self._handle, **self.kwargs)
  File "/usr/lib64/python2.7/json/__init__.py", line 189, in dump
    for chunk in iterable:
  File "/usr/lib64/python2.7/json/encoder.py", line 434, in _iterencode
    for chunk in _iterencode_dict(o, _current_indent_level):
  File "/usr/lib64/python2.7/json/encoder.py", line 408, in _iterencode_dict
    for chunk in chunks:
  File "/usr/lib64/python2.7/json/encoder.py", line 408, in _iterencode_dict
    for chunk in chunks:
  File "/usr/lib64/python2.7/json/encoder.py", line 408, in _iterencode_dict
    for chunk in chunks:
  File "/usr/lib64/python2.7/json/encoder.py", line 442, in _iterencode
    o = _default(o)
  File "/usr/lib64/python2.7/json/encoder.py", line 184, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: set([1]) is not JSON serializable

This was on a linux box running python 2.7.10 and tinydb 3.1.0.

@msiemens
Copy link
Owner

msiemens commented Jan 9, 2016

Thanks for reporting this problem.

As far as I can see, the main problem was, that serialization and file access were interleaved (due to using json.dump(obj, fh)). If this failed with an exception, the file was left in a broken state. Now I the JSON serialization is separate from the file system access. This imposes a slightly larger memory consumption as the serialized string has to be stored in memory instead of writing it to the file handle on the go.

@eugene-eeo
Copy link
Contributor

@msiemens another solution is atomic file writes- write to another temporary file then rename that file to the original file. This is atomic because the move operation in windows and *nix is guaranteed to be atomic. Example:

import os
import json
from tempfile import mkstemp

fd, path = mkstemp()
with os.fdopen(fd, 'w') as fp:
    json.dump(data, fp)
os.rename(path, real_database_filename)

See:

@msiemens
Copy link
Owner

@eugene-eeo That's certainly a possible solution. But I think it complicates the matter more than needed (here's an insightful article on atomic writes from MSDN: http://blogs.msdn.com/b/adioltean/archive/2005/12/28/507866.aspx). I'll keep this solution in mind, in case I might need it somewhere else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants