Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adaptive search #192

Merged
merged 32 commits into from
Feb 12, 2015
Merged

Adaptive search #192

merged 32 commits into from
Feb 12, 2015

Conversation

nkeim
Copy link
Contributor

@nkeim nkeim commented Jan 5, 2015

This PR implements a new feature I call "adaptive search," that is explained and demonstrated in two new tutorials. See soft-matter/trackpy-examples#17

Adaptive search is my attempt to address an age-old problem when tracking dense packings with Crocker-Grier: how to select a search_range that does not exclude valid links, without creating many large subnets that make linking impractical or impossible. Conventionally, this is done by guessing, followed by trial, error, frustration, and resignation. However, when I was tracking 30k particles over 12k frames, it made no sense to carefully choose a single value for all particles in all frames — a search_range that worked perfectly well for the first 5500 frames or so would cause a SubnetOversizeException in frame 5501, because a corner of the image was momentarily displaced.

With adaptive search enabled, one instead specifies a maximum search_range for the entire movie. If an oversize subnet is encountered, linking becomes re-entrant: the particles of the offending subnet are re-linked with progressively smaller values of search_range. In this way, solvable (or trivial) subnets are broken off and solved one by one, until there are no more particles left to link. When it works, this is a huge simplification for the user: guess a reasonable maximum search_range based on the largest particle displacements, and leave the rest to trackpy.

Implementing this feature required a few big internal changes to the linker. The ones worth mentioning are

  • Refactoring the linker into a class, making it relatively easy to recurse without passing around lots of parameters and state.
  • Unifying MAX_SUB_NET_SIZE and its associated logic, and providing an alternate value more suited to the adaptive search algorithm.
  • Properly testing handling of oversize subnets, and adaptive search.

This is a big one — it's taken roughly the past year to dream this up, tell @danielballan about it, implement it, use it in my research, rebase, rebase, rebase, and write the documentation. (Although most of the heavy lifting was done by @danielballan and @tacaswell when they wrote the linking tests.) Some serious scrutiny is needed, though if there are no major problems it would be good to target this for the v0.3 milestone. Thanks!

@@ -474,6 +476,12 @@ def link_df(features, search_range, memory=0,
the next frame.

For examples of how this works, see the "predict" module.
adaptive_step : float (optional)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is picky, but I have been parsing numpydoc formatted strings recently and the convention is

name : type, optional

@tacaswell
Copy link
Member

Left some trivial-ish comments from reading the code. My knee-jerk reaction is to be skeptical, but I am having trouble coming up with a good reason why this is worse than hand tuning.

@tacaswell tacaswell added this to the 0.3 milestone Jan 5, 2015
@nkeim
Copy link
Contributor Author

nkeim commented Jan 9, 2015

Thanks, @tacaswell. I've just force-pushed a rebased branch that addresses your line comments. I'm glad you think that the example notebook gives users a sense of when adaptive search is and is not a good idea. But I'm not quite satisfied with that yet because there's presently no diagnostic info related to subnets or adaptive search — we're warning users that they can shoot themselves in the foot, but it's as though they can't even see where the darn thing is pointed. So as I mention in my comment to #184, some debug-level logging would go a long way here, and should be added before we merge.

@nkeim
Copy link
Contributor Author

nkeim commented Jan 11, 2015

Inserting @tacaswell 's #184 (comment) into this discussion:

  • Add an option (or at least a documented trick) to drop subnets entirely. I like this. It could just be a separate, null subnet solver that the user chooses with link_strategy.
  • Let the linker return data on how adaptive search was used for each particle. This would be useful, but it also seems like it would be overly stretching the present API. I'll give it some thought.

@danielballan
Copy link
Member

+1 for link_strategy='drop'

This was causing the linking tests to fail occasionally.
@nkeim
Copy link
Contributor Author

nkeim commented Feb 2, 2015

Lots of new stuff in this rebase/push. Highlights:

  • New drop link_strategy as requested. I realized that this can also be used to dry-run through a difficult linking job to scan for oversize subnets, before attempting the actual computation.
  • With the diag keyword argument enabled, linking code can attach arbitrary bits of diagnostic information to each particle, returned as extra columns in the result DataFrame. A tutorial for this feature is in the works.
    • The only sane way to add the diagnostic info in link_df was to make a copy of the input DataFrame. This could break some poorly-written user code. So I added the copy_features option to turn on copying without diagnostics.
  • Some minor performance enhancements, validated by asv benchmarking. When adaptive search and link diagnostics are turned off, the performance penalty due to this PR is zero, within error.
  • Particle ID numbering in the first level no longer depends on ordering of a set. The problem will still affect the rest of the movie, but in the first level it was causing at least one test to fail intermittently.

Obviously we have greatly expanded the scope of the original PR. But the diagnostics were motivated by adaptive search, and they mess with some of the same pieces of code, so I decided to take the lazy approach and then get feedback.

neighbor_strategy=neighbor_strategy, link_strategy=link_strategy,
hash_size=hash_size, box_size=box_size)

if diagnostics:
features = strip_diagnostics(features)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment noting that strip_diagnostics does not modify the original? I was worried for a moment that it did.

@nkeim
Copy link
Contributor Author

nkeim commented Feb 3, 2015

Thanks @danielballan . Good catches. I'll fix those issues, and let me know whether you want the new stuff in its own PR.

Also on my to-do list before a (possible) merge:

  • Replace adaptive_limit (number of recursion steps) with adaptive_stop (smallest acceptable search_radius). Makes more sense to think in terms of length scales.
  • Draft a tutorial on diagnostics, so you don't have to squint at the source code so much.

@@ -58,6 +81,13 @@ def test_one_trivial_stepper(self):
assert_frame_equal(actual, expected)
actual_iter = self.link_df_iter(f, 5, hash_size=(10, 2))
assert_frame_equal(actual_iter, expected)
if self.do_diagnostics:
assert 'diag_search_range' in self.diag.columns
print(self.diag.diag_search_range)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! Will remove this print and check for others.

@danielballan
Copy link
Member

Sounds good to me. Good idea with adaptive_stop. Best to stay in the physical world (as opposed to Algorithm World) wherever possible.

@nkeim
Copy link
Contributor Author

nkeim commented Feb 9, 2015

The promised changes are now up. Also, I wrote a new tutorial on linking diagnostics, and added diagnostics to the tutorial on adaptive search. See soft-matter/trackpy-examples#17 . The 2 tutorials in question are:

@danielballan
Copy link
Member

The linking diagnostics tutorial is just so very cool. Wow.

I think it's time to merge this and the examples PR. Any final revisions?

@tacaswell
Copy link
Member

I gave these a read over a while ago and had nothing to say but 'cool!'

I can give a more thorough code review if you want, but time scale for that
is iffy

On Thu, Feb 12, 2015, 08:32 Dan Allan notifications@github.com wrote:

The linking diagnostics tutorial is just so very cool. Wow.

I think it's time to merge this and the examples PR. Any final revisions?


Reply to this email directly or view it on GitHub
#192 (comment).

@danielballan
Copy link
Member

OK, I'm going for it. I know first-hand how busy you are. :- D

danielballan added a commit that referenced this pull request Feb 12, 2015
@danielballan danielballan merged commit 41c1f1e into soft-matter:master Feb 12, 2015
danielballan added a commit to danielballan/trackpy that referenced this pull request Feb 12, 2015
tacaswell added a commit that referenced this pull request Feb 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants