Inserting non JSON serializable object corrupts database #89

danlamanna · 2016-01-09T01:11:59Z

I would consider this a pretty severe issue as it corrupts the database on unexpected user input. I would instead expect this to act as defensive as possible.

Example code:

from tinydb import TinyDB

db = TinyDB('some-db.json')

db.insert({'foo': 'bar'})
db.insert({'bar': set([1])})

Results in the following in some-db.json:
{"_default": {"1": {"foo": "bar"}, "2": {"bar":

This is the traceback:

Traceback (most recent call last):
  File "test-tinydb.py", line 6, in <module>
    db.insert({'bar': set([1])})
  File "/usr/lib/python2.7/site-packages/tinydb/database.py", line 299, in insert
    self._write(data)
  File "/usr/lib/python2.7/site-packages/tinydb/database.py", line 266, in _write
    self._storage.write(values)
  File "/usr/lib/python2.7/site-packages/tinydb/database.py", line 45, in write
    self._storage.write(data)
  File "/usr/lib/python2.7/site-packages/tinydb/storages.py", line 105, in write
    json.dump(data, self._handle, **self.kwargs)
  File "/usr/lib64/python2.7/json/__init__.py", line 189, in dump
    for chunk in iterable:
  File "/usr/lib64/python2.7/json/encoder.py", line 434, in _iterencode
    for chunk in _iterencode_dict(o, _current_indent_level):
  File "/usr/lib64/python2.7/json/encoder.py", line 408, in _iterencode_dict
    for chunk in chunks:
  File "/usr/lib64/python2.7/json/encoder.py", line 408, in _iterencode_dict
    for chunk in chunks:
  File "/usr/lib64/python2.7/json/encoder.py", line 408, in _iterencode_dict
    for chunk in chunks:
  File "/usr/lib64/python2.7/json/encoder.py", line 442, in _iterencode
    o = _default(o)
  File "/usr/lib64/python2.7/json/encoder.py", line 184, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: set([1]) is not JSON serializable

This was on a linux box running python 2.7.10 and tinydb 3.1.0.

The text was updated successfully, but these errors were encountered:

msiemens · 2016-01-09T17:27:51Z

Thanks for reporting this problem.

As far as I can see, the main problem was, that serialization and file access were interleaved (due to using json.dump(obj, fh)). If this failed with an exception, the file was left in a broken state. Now I the JSON serialization is separate from the file system access. This imposes a slightly larger memory consumption as the serialized string has to be stored in memory instead of writing it to the file handle on the go.

eugene-eeo · 2016-01-10T00:05:21Z

@msiemens another solution is atomic file writes- write to another temporary file then rename that file to the original file. This is atomic because the move operation in windows and *nix is guaranteed to be atomic. Example:

import os
import json
from tempfile import mkstemp

fd, path = mkstemp()
with os.fdopen(fd, 'w') as fp:
    json.dump(data, fp)
os.rename(path, real_database_filename)

See:

os.fdopen
tempfile
Alternatively you can choose from the many modules that provide "atomic" files:

msiemens · 2016-01-11T10:39:03Z

@eugene-eeo That's certainly a possible solution. But I think it complicates the matter more than needed (here's an insightful article on atomic writes from MSDN: http://blogs.msdn.com/b/adioltean/archive/2005/12/28/507866.aspx). I'll keep this solution in mind, in case I might need it somewhere else.

msiemens closed this as completed in 7ad6793 Jan 9, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inserting non JSON serializable object corrupts database #89

Inserting non JSON serializable object corrupts database #89

danlamanna commented Jan 9, 2016

msiemens commented Jan 9, 2016

eugene-eeo commented Jan 10, 2016

msiemens commented Jan 11, 2016

Inserting non JSON serializable object corrupts database #89

Inserting non JSON serializable object corrupts database #89

Comments

danlamanna commented Jan 9, 2016

msiemens commented Jan 9, 2016

eugene-eeo commented Jan 10, 2016

msiemens commented Jan 11, 2016