Graceful fail when cache index cannot be loaded #1097

jgosmann · 2016-06-16T16:28:12Z

Nengo raises an exception when the cache index cannot be read, but should handle this case graceful.

Here is a backtrace from a particular case where an incomplete index was written due to interrupting the process:

---------------------------------------------------------------------------
EOFError                                  Traceback (most recent call last)
<ipython-input-3-0f9dc08c14fc> in <module>()
----> 1 with nengo.Simulator(model) as sim:
      2     pass

/home/jgosmann/Documents/projects/nengo/nengo/simulator.pyc in __init__(self, network, dt, seed, model, shrink_cache)
    130             cache = model.decoder_cache
    131 
--> 132         with cache:
    133             if model is None:
    134                 dt = float(dt)  # make sure it's a float (for division

/home/jgosmann/Documents/projects/nengo/nengo/cache.pyc in __enter__(self)
    199             self._remove_legacy_files()
    200             self._index = CacheIndex(os.path.join(self.cache_dir, self._INDEX))
--> 201             self._index.__enter__()
    202         except TimeoutError:
    203             warnings.warn(

/home/jgosmann/Documents/projects/nengo/nengo/cache.pyc in __enter__(self)
    108     def __enter__(self):
    109         with self._lock:
--> 110             self._index = self._load_index()
    111         return self
    112 

/home/jgosmann/Documents/projects/nengo/nengo/cache.pyc in _load_index(self)
    118         try:
    119             with open(self.filename, 'rb') as f:
--> 120                 return pickle.load(f)
    121         except IOError as err:
    122             if err.errno == errno.ENOENT:

EOFError:

Besides handling an error on read graceful, it might also be good to do the writing in a more atomic way (e.g. write to a different file first and than rename the file; keep in mind that the files have to be on the same file system for this to be atomic and work without copying data).

The text was updated successfully, but these errors were encountered:

tbekolay · 2016-06-23T13:45:21Z

This is currently in the 2.1.1 milestone. How close are you to a fix @jgosmann? Do you want me to attempt to fix this or should we punt to 2.2.0?

jgosmann · 2016-06-23T14:28:58Z

I haven't looked at this at all yet. It would probably be nice to at least fail gracefully and recreate the index if it can't be loaded in 2.1.1 (or have a 2.1.2 release). Writing the index in a safer way could probably put of to 2.2.0, but might actually be not that complicated to implement.

jgosmann · 2016-06-23T14:29:56Z

If you want to take a stab at it, go ahead.

tbekolay · 2016-06-23T14:35:03Z

OK, I'll try a fix for failing gracefully, but yeah, do the atomic write in 2.2.0. Still waiting on a last review for #1088 so probably ok for 2.1.1.

If the process is killed while writing the pickle, we can't load it anymore and thus lose all index information. Thus, we write it to another file first, so we still have the old index in case the process gets killed. Once we're done writing we rename the file (which should be atomic on most file systems and operating systems, but will at least be a much faster operation reducing the likelihood of ending up with an invalid index). Addresses #1097.

jgosmann · 2016-06-23T17:23:38Z

Actually got a pretty simple PR for the atomic part: #1107 if we want to include it in 2.1.1.
I'll be working on #1088 when I have time during my debugging of my TCM model. When I'm done with this I can take a look at the graceful fail if you haven't come up with something by then.

jgosmann · 2016-06-24T15:41:46Z

I'll take a stab at this now

Addresses #1097.

Fix #1097.

jgosmann · 2016-06-24T16:20:02Z

See #1110

If the process is killed while writing the pickle, we can't load it anymore and thus lose all index information. Thus, we write it to another file first, so we still have the old index in case the process gets killed. Once we're done writing we rename the file (which should be atomic on most file systems and operating systems, but will at least be a much faster operation reducing the likelihood of ending up with an invalid index). Addresses #1097.

If the process is killed while writing the pickle, we can't load it anymore and lose all index information. Thus, we write it to another file first, so we still have the old index in case the process gets killed. Once we're done writing we rename the file (which should be atomic on most file systems and operating systems, but will at least be a much faster operation reducing the likelihood of ending up with an invalid index). Addresses #1097.

jgosmann added the bug label Jun 16, 2016

jgosmann added this to the 2.1.1 release milestone Jun 16, 2016

jgosmann self-assigned this Jun 16, 2016

jgosmann mentioned this issue Jun 23, 2016

Make writing of cache index atomic. #1107

Merged

5 tasks

jgosmann added a commit that referenced this issue Jun 24, 2016

Test for loading of corrupted cache index.

485a20b

Addresses #1097.

jgosmann added a commit that referenced this issue Jun 24, 2016

squash! Test for loading of corrupted cache index.

1088b2d

Fix #1097.

jgosmann mentioned this issue Jun 24, 2016

Graceful handing of corrupted cache index #1110

Merged

5 tasks

jgosmann removed their assignment Jun 24, 2016

tbekolay closed this as completed in da87e14 Jun 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graceful fail when cache index cannot be loaded #1097

Graceful fail when cache index cannot be loaded #1097

jgosmann commented Jun 16, 2016

tbekolay commented Jun 23, 2016

jgosmann commented Jun 23, 2016

jgosmann commented Jun 23, 2016

tbekolay commented Jun 23, 2016

jgosmann commented Jun 23, 2016

jgosmann commented Jun 24, 2016

jgosmann commented Jun 24, 2016

Graceful fail when cache index cannot be loaded #1097

Graceful fail when cache index cannot be loaded #1097

Comments

jgosmann commented Jun 16, 2016

tbekolay commented Jun 23, 2016

jgosmann commented Jun 23, 2016

jgosmann commented Jun 23, 2016

tbekolay commented Jun 23, 2016

jgosmann commented Jun 23, 2016

jgosmann commented Jun 24, 2016

jgosmann commented Jun 24, 2016