Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graceful fail when cache index cannot be loaded #1097

Closed
jgosmann opened this issue Jun 16, 2016 · 7 comments
Closed

Graceful fail when cache index cannot be loaded #1097

jgosmann opened this issue Jun 16, 2016 · 7 comments
Labels
Milestone

Comments

@jgosmann
Copy link
Collaborator

Nengo raises an exception when the cache index cannot be read, but should handle this case graceful.

Here is a backtrace from a particular case where an incomplete index was written due to interrupting the process:

---------------------------------------------------------------------------
EOFError                                  Traceback (most recent call last)
<ipython-input-3-0f9dc08c14fc> in <module>()
----> 1 with nengo.Simulator(model) as sim:
      2     pass

/home/jgosmann/Documents/projects/nengo/nengo/simulator.pyc in __init__(self, network, dt, seed, model, shrink_cache)
    130             cache = model.decoder_cache
    131 
--> 132         with cache:
    133             if model is None:
    134                 dt = float(dt)  # make sure it's a float (for division

/home/jgosmann/Documents/projects/nengo/nengo/cache.pyc in __enter__(self)
    199             self._remove_legacy_files()
    200             self._index = CacheIndex(os.path.join(self.cache_dir, self._INDEX))
--> 201             self._index.__enter__()
    202         except TimeoutError:
    203             warnings.warn(

/home/jgosmann/Documents/projects/nengo/nengo/cache.pyc in __enter__(self)
    108     def __enter__(self):
    109         with self._lock:
--> 110             self._index = self._load_index()
    111         return self
    112 

/home/jgosmann/Documents/projects/nengo/nengo/cache.pyc in _load_index(self)
    118         try:
    119             with open(self.filename, 'rb') as f:
--> 120                 return pickle.load(f)
    121         except IOError as err:
    122             if err.errno == errno.ENOENT:

EOFError:

Besides handling an error on read graceful, it might also be good to do the writing in a more atomic way (e.g. write to a different file first and than rename the file; keep in mind that the files have to be on the same file system for this to be atomic and work without copying data).

@jgosmann jgosmann added the bug label Jun 16, 2016
@jgosmann jgosmann added this to the 2.1.1 release milestone Jun 16, 2016
@jgosmann jgosmann self-assigned this Jun 16, 2016
@tbekolay
Copy link
Member

This is currently in the 2.1.1 milestone. How close are you to a fix @jgosmann? Do you want me to attempt to fix this or should we punt to 2.2.0?

@jgosmann
Copy link
Collaborator Author

I haven't looked at this at all yet. It would probably be nice to at least fail gracefully and recreate the index if it can't be loaded in 2.1.1 (or have a 2.1.2 release). Writing the index in a safer way could probably put of to 2.2.0, but might actually be not that complicated to implement.

@jgosmann
Copy link
Collaborator Author

If you want to take a stab at it, go ahead.

@tbekolay
Copy link
Member

OK, I'll try a fix for failing gracefully, but yeah, do the atomic write in 2.2.0. Still waiting on a last review for #1088 so probably ok for 2.1.1.

jgosmann added a commit that referenced this issue Jun 23, 2016
If the process is killed while writing the pickle, we can't load it
anymore and thus lose all index information. Thus, we write it to
another file first, so we still have the old index in case the process
gets killed. Once we're done writing we rename the file (which should
be atomic on most file systems and operating systems, but will at least
be a much faster operation reducing the likelihood of ending up with
an invalid index).

Addresses #1097.
@jgosmann
Copy link
Collaborator Author

Actually got a pretty simple PR for the atomic part: #1107 if we want to include it in 2.1.1.
I'll be working on #1088 when I have time during my debugging of my TCM model. When I'm done with this I can take a look at the graceful fail if you haven't come up with something by then.

@jgosmann
Copy link
Collaborator Author

I'll take a stab at this now

jgosmann added a commit that referenced this issue Jun 24, 2016
@jgosmann jgosmann removed their assignment Jun 24, 2016
@jgosmann
Copy link
Collaborator Author

See #1110

tbekolay pushed a commit that referenced this issue Jul 27, 2016
If the process is killed while writing the pickle, we can't load it
anymore and thus lose all index information. Thus, we write it to
another file first, so we still have the old index in case the process
gets killed. Once we're done writing we rename the file (which should
be atomic on most file systems and operating systems, but will at least
be a much faster operation reducing the likelihood of ending up with
an invalid index).

Addresses #1097.
tbekolay pushed a commit that referenced this issue Jul 29, 2016
If the process is killed while writing the pickle, we can't load it
anymore and lose all index information. Thus, we write it to another
file first, so we still have the old index in case the process gets
killed. Once we're done writing we rename the file (which should be
atomic on most file systems and operating systems, but will at least
be a much faster operation reducing the likelihood of ending up with
an invalid index).

Addresses #1097.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants