Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move cache context management to the builder #1112

Merged
merged 3 commits into from
Jun 27, 2016
Merged

Conversation

tbekolay
Copy link
Member

Description:
This makes the cache more robust in a few ways. First, if the cache is used when not in the cache's context, it no longer crashes, just warns and does nothing. Second, I moved the cache context management (i.e., the with model.decoder_cache block) from Simulator.__init__ to build_network.

Motivation and context:
@PeterSuma noticed that his model would fail with nengo_ocl 1.0 and nengo 2.1.1. This is because nengo_ocl.Simulator.__init__ hasn't updated to include the with model.decoder_cache block. However, there isn't really any reason why backends should have to update on account of our cache changes; this should be localized to the builder. So I moved it to the builder.

In doing so, I also noticed that we don't do anything defensive to ensure that the cache isn't used outside of a with block, so I added those things in too. I don't believe that invalidate or the other public methods require early exits, but let me know if I'm wrong about that.

How has this been tested?
I ran all of the cache tests with

py.test nengo/tests/test_cache.py --slow --plots --analytics --logs

in both Python 2 and 3. The benchmarking test uses the actual cache so this should be a sufficient test.

Where should a reviewer start?
I would read through this commit-by-commit; each commit is pretty small and understandable. The first commit just switches the order or some things according to our style guide; no other changes were made, so that commit can be mostly ignored.

How long should this take to review?

  • Average (neither quick nor lengthy)

Note that, if people are okay with these changes, I'd like to do a quick 2.1.2 release after merging to fix compatibility with nengo_ocl 1.0. So prioritize this if you can!

Types of changes:

  • Bug fix (non-breaking change which fixes an issue)

Checklist:

  • I have read the CONTRIBUTING.rst document.
  • I have updated the documentation accordingly (not necessary).
  • I have included a changelog entry.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

Still to do:

Not a todo per se, but a discussion point:

In my initial version of this PR, I did this with finer granularity:

  1. Do with cache in builder/connection.py right before wrap_solver.
  2. Do with cache in builder/network.py at the end (if it's the toplevel network) to shrink the cache.

In the end I switched to the current solution, which should act exactly the same as what is in master now, only without requiring changes to backends that use the builder. However, would it be better to do with this finer granularity, as just described? I figured that continually reacquiring the index lock would end up causing more conflicts when two builds are happening simultaneously, and take more time when building one model. Does that make sense or are my assumptions wrong?

@tbekolay
Copy link
Member Author

Just to emphasize since the original message is pretty long:

If this looks OK to people I'd like to get it in into master then immediately do a 2.1.2 release, as this will fix compatibility with nengo_ocl 1.0.

@@ -321,6 +321,11 @@ def shrink(self, limit=None):
Maximum size of the cache in bytes.
"""
if self.readonly:
warnings.warn("Cannot shrink a readonly cache.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might give more warnings than we want. There are basically two situations where the cache might be readonly:

  1. The lock cannot be acquired. In that we already give a warning.
  2. The user explicitly configures the cache to be readonly. In that case the simulator (or builder with this PR I assume) will still call shrink and the user cannot deactivate this call (and doesn't need to), but will get an irrelevant warning.

However, it might be a useful warning for developers when debugging?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, what do you think about changing it to a logger.info in that case? The benefit of the warning is that it'll only go off once, but you're right that it's not something that the user necessarily needs to see.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. Not sure if I would go with logger.info or logger.debug, so either of those is fine with me. (Cache misses/hits are debug.)

@jgosmann
Copy link
Collaborator

Added some discussion points inline.

Another thing to consider: For spaopt I'll (probably) need¹ some way to deactivate the shrink for individual Simulator or builder instances. My preliminary solution was to add an argument to the Simulator (which might not be the best solution considering that other backends have to do the same). Any thoughts on how this could be done with this PR?

Regarding the granularity: I think one with block per model (build) is best. That way all the decoders get written to the same file instead of creating a lot of smaller files (which has been an issue before). Also, with multiple builds in parallel performance should be better as you said. It also requires less loading and syncing of the index.

Note: I only looked at the code so far. I have yet to test it (running other stuff at the moment).


¹: Not really need, but it's pretty inefficient otherwise.

@tbekolay
Copy link
Member Author

some way to deactivate the shrink for individual Simulator or builder instances

The two ways I could see it would be to store something on a Model instance that you pass in (as you would do to turn off the decoder cache programmatically), or in network.config. The builder is set up so that you have access to it, but we haven't used network.config for anything yet. Of course, that's not really Simulator local since it modifies the network, but you could modify the network before constructing the Simulator?

@jgosmann
Copy link
Collaborator

Because I'm constructing an entirely new network that seems like a good approach. :)

@tbekolay
Copy link
Member Author

Made the discussed changes in 41bb52a

@jgosmann
Copy link
Collaborator

LGTM 🍰

@jgosmann jgosmann removed their assignment Jun 27, 2016
It is possible that the cache can be used improperly; i.e., not
in a `with cache` context. If that happened, certain methods --
namely `shrink` and `wrap_solver` -- would fail.

This commit makes it so that we don't crash in those cases.
Instead, we warn the user and do nothing.
Previously, the cache was entered in Simulator.__init__. This was
causing the current nengo_ocl's Simulator to fail as it calls the
builder without entering the cache. There's no reason why the cache
couldn't be entered in build_network instead, so I moved it there
so that nengo_ocl doesn't have to modify its Simulator.
@tbekolay
Copy link
Member Author

@drasmuss can you do the second review and merge if it looks good? I'll do a 2.1.2 release once it's merged (but no pressure / rush).

@drasmuss drasmuss assigned drasmuss and unassigned drasmuss Jun 27, 2016
@drasmuss drasmuss merged commit e369625 into master Jun 27, 2016
@drasmuss drasmuss deleted the cache-in-builder branch June 27, 2016 18:46
@jgosmann jgosmann mentioned this pull request Jun 29, 2016
@jgosmann jgosmann mentioned this pull request Aug 30, 2016
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

None yet

3 participants